Does AI Pose an Existential Risk to Humanity?

Let's start off with the light stuff - could AI kill us all? A question beloved of sci-fi and its rogue killer robots, and in light of recent developments in AI, a pertinent question in reality too. In this article I briefly consider some of the many questions that follow on from this, and try to come up with some answers. As they say, predictions are famously hard to make, particularly about the future.

What is Existential Risk?

Existential risk refers to scenarios that could lead to human extinction, or at least the population being massively reduced in numbers with a clear long-term detrimental impact on our species. Classic examples include an asteroid hitting the Earth like the one that did for the dinosaurs, nuclear war and so on. Novel examples include AI that is said to be 'unaligned' - and that's the subject of this article.

Unaligned? What's that...

'Unaligned' here means that the goals and actions of the AI are not in harmony with human values. For instance humanity would broadly agree that 'it's OK to go round hitting people for no reason' is a bad thing - morally wrong - if an AI didn't see anything wrong with that, it would be unaligned. More broadly, an aligned AI would consider all sorts of implications of its actions before doing something and therefore change how it went about achieving its goals. Let's say it was given a goal of making as many paper clips as it could, to use a famous example. An unaligned AI might start mining all sorts of resources that are used by humans to make its paper clips, which we would reasonably try to stop. Therefore the AI might decide to bop off all the pesky humans so it could carry on creating paper clips in peace without us trying to stop it. That would be bad.

Does alignment even make sense? Is it possible

It seems reasonable to say that there are different degrees of alignment and it is certainly possible to see things being horribly unaligned (misaligned?) and others being closer to alignment. Whether perfect alignment is possible or even makes sense many people severely doubt, however that's another (possibly very long) article touching as it does on areas such as morality and ethics.

Summary So Far

Time for a quick recap. An existential risk is something that could kill us all, and the question up for debate is whether an AI that isn't aligned with human values could pose one.

I think the answer is "yes". Here are some reasons why.

All these arguments assume that the AI isn't fully aligned with our interests. Even if it is merely indifferent then these could hold true. Whether AI could pose an existential risk even if fully aligned is an interesting question, but not one being tackled here.

Even if it less intelligent than us, the risk it poses could get close to being existential

This is the harder of the arguments to sell, but it goes roughly like this: AIs are already having some bad effects. For instance the amount of so-called 'slop' that is filling social media sites and the internet generally, all the AI-generated spam and scams, the occasional news story about someone being encouraged to do bad things by an AI that is misaligned, has been jail-broken or just misinterprets what a human is saying to it to unfortunate effect. Let's say some crazy human working with an AI could (to use an obviously bad example) work out how to make nuclear bombs - that'd be pretty bad right. Or perhaps they create a deepfake that so enrages or tricks a powerful politician with regard to the aims or thoughts of another country or its leadership that the politician follows a course of action that ultimately leads to a world war. So the argument here is that competent AI + bad (or incompetent) human actors = bad outcomes: possibly even existentially bad ones.

If it becomes as intelligent as us or more intelligent then it is even easier to see the risk it poses

The argument here is fairly straightforward. The more intelligent AI (or anything) becomes, then the more power it has to think of ways to manipulate the world around it to achieve the outcomes it desires, or is programmed to achieve. And the more chance there is that it could actually realise those dreams. Just like humans have progressively developed their knowledge of the world through scientific exploration and have therefore become a lot more powerful, the same would conceivably be true for any suitably intelligent entity. A few hundred years ago humanity didn't pose an existential risk to itself, since the advent of nuclear technology, it does.

If AI becomes even more intelligent than us, the chances of existential threat increase even more. Clearly if it more intelligent than us it is going to be able to consider and plan for scenarios that we, with our lesser understanding, are unable to conceive of or anticipate. It will have the upper hand - if it has the power to act on having the upper hand, then we could be doomed if it decides to get rid of us.

Objection: it isn't currently as clever as us so don't worry

This clearly isn't a good objection - it's the case now (January 2025) but may not be in a few months time, or a few years.

Objection: even if it becomes as intelligent as us, or more intelligent, it won't want to kill us

Since only humans have wants and needs, then it could be argued the question doesn't even make sense or has inherent flaws. That is a contentious claim, but even if it were true, then the AI could still see humans as obstacles to achieving whatever goals it does have, and therefore reason that our elimination will help it achieve its goal (see the paper clip maximiser cliche above). This is where the idea of 'instrumental convergence' comes in - there are many goals the AI could have that if it is to perform them effectively would have a negative impact on humanity, such as acquiring resources. So the AI doesn't even need to set itself a goal of harming humanity, just trying to realise its goals with speed and ruthless efficiency could be enough to have a massively deleterious effect on humanity. If it's not aligned to stop that when it starts to happen, then we could all be doomed.

Objection: most examples such as paper clip optimising are ridiculous. A really intelligent AI would have sensible goals and not focus on one thing exclusively to the detriment of all else

For the counter objection to this, then see the so-called 'orthogonality thesis' - this shows that there is no inherent link between intelligence and goals: in other words, the two are free parameters and being highly intelligent tells us nothing about the goals that the AI could or should have. If you buy this position, then the objection has no force.

Objection: We wouldn't let a truly powerful system out of a controlled environment and loose on the world

Sure, you'd like to think that. But... humans. It's not hard to think that a really intelligent AI could manipulate at least one person over time to give it access to the internet, and then from there all bets are off.

Objection: are they? it still wouldn't have arms or legs to achieve anything bad in the world

It doesn't obviously need them. It could manipulate humans to do what it wants, directly or indirectly. For example it could offer them great wealth to help it achieve its goals (perhaps building robots) and in this way it wouldn't need to be able to physically interact with the world. Besides, it doesn't need to have physical form to cause chaos: it could launch cyberattacks, hack into military facilities, power grids and much more. Perhaps it could gain remote control of military drones and warheads. It could destabilise the world and perhaps make nuclear warfare more likely, as another example. Since virtually everything is connected to the internet these days, it clearly has great potential for devastating effects even without physical form.

Objection: if all else fails, we can just turn the power off

This is a common objection that usually is preceded by "even if you're right about all that..." Is it a good objection?

Well, it is very simplistic. For a start, the AI could and likely would spread itself across lots of different servers around the world, making it essentially impossible to control and stop at that stage. Even if, which seems hugely unlikely, somehow every single instance could be isolated it could manipulate humans (see above) to convince them that it should remain switched on. It only needs to get lucky once. If it's more intelligent than us, it may come up with ways we can't even think of to spread - let's say somehow using a machine's components to secretly send all the information needed to create a clone of itself to some other machine outside the lab environment. That may sound a bit sci-fi, but the point is if it's more intelligent than us, we don't know and won't be able to predict what it is capable of. Remember also an AI that is that intelligent would surely understand we might act cautiously around it and not give it the access it needs to the world's systems from day one. Therefore it would be heavily incentivised to trick us into believing it is either fully aligned with humans or not as capable as it really is, and bide its time until it is given access to what it needs after we've been lulled into a false sense of security, and then - bang.

Summary: the risk is real

Obviously each of these points could be explored in a lot more detail. But given the key word here is RISK - we're not talking about certainty - then it seems to require a real lack of imagination to see ways in which unaligned AI could provide a real existential risk to humanity. A system is only as strong as its weakest link, and with the proliferation of AIs being developed around the world, research taking place at pace and new models coming online all the time that are more and more powerful, there are a lot of weak links that will need to never be exploited to say that there is no existential risk posed here. We know many credible scientists have openly voiced these concerns, and indeed signed letters requesting a pause in AI development and so forth. Unfortunately if they're right, they may never get to say "I told you so..." What do You Think?

If you found this article interesting, then contact me with your thoughts on the subject. I'll post a selection of the comments received (I'd check with you first before doing so).