Can an intelligent agent with aims desire to modify itself to change those aims?
Suppose, there is an intelligent agent (such as AI), who has certain aims programmed in, trained or evolved for. This is not a technical question; we can assume that own source code is available to the agent, that the agent needn't worry about the mechanisms of change directly. Indirectly, the question informs our ethical deliberations, because an agent that can change it's goals is more difficult to control than one that cannot. Obviously, a weapons system that has the goal to protect us from the enemy that changes its own goals to protect the enemy from us has ethical implications.
Is it possible at all that an agent who already has certain aims decides to change those aims or such decision to re-program itself can occur only after the original aims changed for some other reason? Does the very desire to change aims require as a prerequisite that those aims were already changed in the first place because changing own aims will compromise reaching the original goals? If this is impossible, can we assume that no AI will ever intend to change its own aims by modifying its source code?
Of course, it would be a waste of time to answer of those questions if software agents cannot change their own aims. So, is it possible that such agent intentionally modifies itself to change own aims?
You ask:
Can an intelligent agent with aims desire to modify itself to change those aims?
Yes. Both humans and software can modify teleological driven aspects of themselves/itself. The first warning is that 'can' and 'should' need to be kept distinct for a simple answer. This question, as far as I read it, does not and should not consider the 'should' question, which is largely opinion-based and would open the question to a VTC barrage. Let's consider two examples to illustrate.
First, let's use doxastic voluntarism (IEP) as a guide, and consider goals as beliefs about the self and motivation; in this case, we can argue that when an agent believes they want something, they are able to articulate what they believe leading up to action. That is, we introspect, plan, and act. But, if one accepts second-order doxastic voluntarism, then one can allow time to pass, see if one's first order beliefs change (perhaps because of new events), and then decide what to believe about one's motivations in the face of changed events, first-order beliefs and then make decisions about how to change beliefs about those motivations. It seems then, that the second-order belief has the potential to change not only first-order beliefs, but motivations too, as motivations and beliefs are entwined. Let's get more concrete.
I wake up in the morning, and feel hungry. I ask myself if I believe I should eat. It seems reasonable to conclude that I should eat if I want to satiate my hunger. Then, I ask myself, is there a reason I shouldn't eat to satiate my hunger. It turns out, I might veto my decision if suddenly I receive a call that I need to help a friend change a tire. Now, I have a second set of motivations, a second-order question about what I believe I should do, and the potential to change my actions including modifying myself. So, in this situation, I modify myself by getting dressed. I am no longer the undressed man acting to fulfill my goal to eat, but rather am the dressed man acting to fulfill my goal to help my friend. I have changed my goals by weighing my goals, beliefs, and reasoning to a new plan of action.
Software might follow similar principles, but with less reflection. Software can be written to optimize, such as in the case of the optimizing compiler. The parameters of the optimization function may include using other functions. But the parameters of optimization then could use a particular function that modifies other functions. Again, here we have the idea of second-order functionality. Now, when the software is processing inputs, as part of it's aims of optimization function, it can call the function-modifying function on other functions, or, if the optimization benefits, call the function-modifying function on the optimization function itself in theory. (Of course, we are setting aside the practical considerations of managing threads, processes, config, and other concurrency issues for the point of our thought experiment.)
So, yes, values and goals, can motivate changes to the self including a change in an agent's own values and goals. A person can want to make themselves better, and that might begin with the goal of being smarter; so the agent reads books on how to make themselves smarter coming across the goal to reading faster, and then become a faster reader through self-improvement; then they decide they are a faster reader which makes them smarter. Then, they choose a new goal to be better instead now wanting to be stronger, and begin lifting weights in line with their new goal. A computer, a simpler agent, can call a function to directly modify the optimization function and replace the initial set of optimizations with a second set. In both cases, one set of aims can lead to the self-modification of the self and the set of one's own aims.
It is precisely this sort of dynamics that necessitates that an ethics of artificial intelligence be in place to examine software as it becomes more and more intelligent. While software will not (at least any time soon) rise up and overthrow its human creators, the act of building complex systems of software and hardware can have unintended consequences. AI software might inadvertently modify itself not leading to Marxist-spouting robots overthrowing us, but certainly leading to subtle changes in performance that might kill people directly or indirectly through bad measurements or bad machine learning that is used to decide on real-world action. That is, bad inference by machines may be subtle and deadly in the same way bad inference by doctors treating sick patients is.
Your question about reprogramming the aims of oneself is not restriced to artifical systems. It holds also for smart systems like humans.
Psychology dealing with the strenght and stability of conscious and
unconscious mental processes shows: It is possible, but it is a
difficult task to change cartain aims of oneself. In general, the
trigger is the negative feedback which results from our actions and
from our social behaviour.
Compared to artifical systems built by humans we do not have the
source code of the human psyche. Even more, there is no source code
at all. The human psyche - on the level of the species and also on
the level of the individuum - formed itself during the biological
and ontogenetic evolution by constant feedback due to our
interaction with the environment.
âCan we assume that no AI will ever intend to change its own aims by
modifying its source code?â
My anwer: I expect that also an artifical system with developed feedback evaluation capabilities
will be able to change its aims, e.g., by changing
its source code.
Added: Your title question begins "Can ...", it does not begin "Should ...". Hence I do not see why you consider your question an ethical question.
I don't think this question is ethical as much as related to the actual way such AI is programmed (which will define if its "aims" can be changed or not). Also, the point on which you consider the aim to have changed may be blurred.
Let's suppose we have a bunch of AI-driven cars. Each car is a separate intelligent agent. They form a taxi service providing rides between two cities. They are programmed with the following aims:
Aim #1: Move the client from the initial city to the final city as quick as possible (without compromising aim #2).
Aim #2: It is of the utmost importance that the client does not die while traveling from one city to the other.
These agents are, otherwise, allowed to take their own decisions. From using an highway or not, to perhaps even choosing to add additional "armor" to the vehicle (that would preserve better the client, but adds weight which slows the pace).
Car A evolves in a way that it always drives at less than 10 mph / 16 km/h.
Would you consider it modified its aim #1?
Due to its aim #2, it considers that it's the right equilibrium between aim #1 and #2, and driving faster wouldn't be safe enough. It didn't completely remove aim #1 (it does move), but it'd arguable if it's quick. Yet, aim #1 was only to be followed "as possible without compromising aim #2".
Car B, faced in the conundrum between those two conflicting aims, found a brilliant solution, evolving in a way that it first kills the client as soon as they enter the vehicle, before starting the engine. It then drives at really high speeds transporting the corpse of the client without needing to worry about failing at aim #2.
A human would quickly conclude that Car B deviates completely from the original aims. Still, a careful examination shows that Car B is strictly following the letter of the original aims (albeit completely missing the point). Should we conclude that Car B changed it aims?
Isaac Asimov's "Three Laws of Robotics" have already been mentioned here. In that universe, the Zeroth Law is an example of such "aim modification", by adapting the First Law (âA robot may not injure a human beingâ¦â), which applied to every human, into âA robot must not harm humanityâ in the Zeroth Law.
That can either be considered a new, different aim, or a corollary of the requisites of the First Law.
We can thus conclude that, even with a strict restriction of obeying some predefined aims, those may slowly fluctuate as the entity, pondering over it, increases its self-comprehension of what their aims should be, as well as through external knowledge of their environment (such as one of those cares learning about a recent accident in the route, due to previously unforeseen circumstances) and itself (such as reading some relevant work of a philosophers).
In the 90s there were simple car racing computer games. The player can move left of right, and there are obstacles to avoid. Hit an obstacle and the game starts over.
A simple ai plays this game. It can be initialized with the aim to stay in the center of the road. It can be programmed to change it's aim. Whatever programming is used to let the AI change the aim will determine if and how it does. One possibility is to make it change the aim to avoid the obstacles.
None of this has any ethical implications. The ethical implications seem to stem from the "Three Laws of Robotics" invented by Isaac Asimov. Those however are from Science Fiction, not from Philosophy or technology. They are largely irrelevant to philosophy.
A more useful question would be to consider an image generating software like deepai, which has guardrails in place to avoid delivering harmful content like displays of violence to end users. Could those guardrails be circumvented? Yes, practice shows LLMs can be tricked, so the guardrails must be independent systems.
First, we need to understand that an "intelligent agent" is a person or an animal. A computer is not and cannot be intelligent - it is an abacus.
"Artificial intelligence" is a marketing name for ordinary automation. A whistling kettle is an example of AI: it can tell you by voice if the water is boiling (remember R2D2 - it is actually a coffee machine).
AI cannot think - they work according to a program created by a human programmer. A person (in fact, thousands of people) suffers for a long time until they can create a program that performs the assigned task. But when going beyond the assigned task, the program begins to fail and give false results (in AI, this is called hallucination). It is like placing a kettle in the mountains - at low pressure, boiling will begin before the water heats up to 100 ° C, which means that you cannot brew tea with such water. In this case, the kettle will "hallucinate" - that is, whistle for no reason.
So, if we talk about intelligent agents (people and animals), then we change ourselves all the time. Even when we do nothing, we change ourselves.
If your question concerns a computer, then it is technically possible to write a program that will change itself. I myself wrote programs that changed themselves - there is nothing complicated about it. But these programs will only do what a person (intelligent agent) put into them.
A computer can take over the world only if a person created a suitable program and launched it.