AI might kill us all soon. Unfortunately, I’m serious.

Thoughtprovoked
27 min readJun 4, 2023

--

An image of a pier leading into the fog.
If we don’t change course, AI could be the end of the road for humanity.

There is reason to believe that Artificial Intelligence (AI) could pose significant risk to humankind in the not-so-distant future. And by “significant risk”, I mean that AIs could potentially wipe out the human civilization entirely. The scenario I’m referring to is derived from a line of arguments based on the current scientific understanding of AI systems. Since there is currently no promising solution in sight, we cannot just continue AI development as it is done today. Otherwise, the outlook could be very dire. Unfortunately, this is not just me making up yet another random doomsday scenario. Some of the most knowledgeable and renowned people in the field of AI agree that this needs to be taken seriously. So please hear me out.

In the following, I will introduce the issue in relatively simple terms avoiding technical jargon as much as possible and answer four key questions:

  • What is the problem?
  • How could an AI wipe out humanity?
  • When could the AI doom happen?
  • Is there something we can do about it?

For readers who are not familiar with AI in general and the recent progress in this area, I’ll start with a brief introduction of some (hopefully helpful) context. This section is optional and may be skipped.

Optional context: Introduction and recent progress in AI

Humans have long dreamed of recreating the intelligence of the human brain artificially. While this dream was unattainable in the past due to a lack of sufficiently powerful computers, algorithms and data, it now seems within reach, and it is likely that human-level AI will be achieved over the coming years. This change will impact human civilization profoundly. While there could be very positive consequences, it also leads to significant challenges, the most existential of which I will discuss in this essay. In this optional section, I would like to explain a couple of things that I believe are relevant for readers not familiar with the subject.

The difference between “conventional” software and AI

The biggest difference between “conventional” software and AI is that conventional software is explicitly programmed, which means there are clear rules defined by the programmer (if X happens, then the software does Y). In contrast, AIs are not explicitly programmed but rather “trained” to decide by themselves how to act. An example would be a modern AI chess computer that suggests chess moves based on a chess position provided as input. The calculations performed to arrive at these suggestions were never explicitly programmed but rather “learnt” while the AI played millions of chess games against itself during training.

This fundamental difference in the way the software is created leads to an important real-life consequence: The capabilities of conventional (explicitly programmed) software are limited by the ingenuity and intelligence of its human programmers. It will not develop capabilities beyond human intelligence, and it will always remain understandable for humans (at least in principle). In contrast, AI software can achieve capabilities that go far beyond human intelligence and at the same time become completely incomprehensible. As an example, AI chess computers now play vastly better chess than any human (they would win practically every game even against the world champion) and at the same time, we have basically no idea why they play the moves they play. The only thing we know for sure is that these moves lead to winning positions.

AIs are currently incomprehensible “black boxes” to us because they are based on so-called “artificial neural networks” (ANNs). ANNs are inspired by the human brain and consist of interconnected layers of nodes, also known as “neurons”. Each layer receives information from the previous one and passes its output to the next. These layers include an input layer (which receives the input, e.g. a chess position), one or more hidden layers (where computations happen), and an output layer (which provides the final result, e.g. a suggested chess move). What happens during training is that the strengths of the interconnections between the layers are gradually optimized, so that the ANN behaves in a certain way (e.g. plays good chess moves). I won’t go into more detail here, but the important thing to remember is that our understanding of why ANNs behave the way they do is very limited. This is a key problem, as discussed later. In case you are interested in learning more about ANNs, I highly recommend watching this fantastic introduction on Youtube.

The difference between narrow and general AI

Another important thing to understand is that in the past, we only had “narrow” AIs, which means they only performed well in a very narrow domain. One example is the aforementioned chess computer, which plays chess very well but can’t do anything useful in other areas of intelligence (logic reasoning, creative writing, driving a car etc.). With the release of “ChatGPT” (see next paragraph), it became very obvious that we are potentially very close to achieving a more general form of AI. General AI, aka artificial general intelligence (AGI), matches or outperforms human intelligence across a wide range of tasks. Unlike narrow AI, AGI could lead to significant safety challenges.

The release of ChatGPT and the advent of general AI

What exactly is ChatGPT, and why is it so important? ChatGPT is an AI developed by the US company OpenAI, which was first released in late 2022. It is basically a chatbot that users can have a text-based conversations with in plain English (or other languages). The software received massive media attention after its release and apparently was the fastest product ever to reach 100 millions users (it just took two months to achieve this milestone).

ChatGPT can talk about pretty much any subject, write poems, answer questions, play games, summarize texts, create screenplays, research topics on the internet, pass university admission exams etc. In short, it can do many things across a wide range of domains, just like humans, and hence is a massive step towards AGI.

To be fair, ChatGPT isn’t perfect yet. It still frequently makes mistakes that would be obvious to humans. However, OpenAI has implemented several improvements already and the performance improves further with each iteration. The number of tasks where humans still outperform ChatGPT, is becoming smaller every day. It is very conceivable that in the not-so-distant future, after a couple more improvements, ChatGPT will become an AGI by any reasonable definition.

There has been steady progress in AI technology before, and OpenAI isn’t the only company working on Large Language Models like ChatGPT (Google and others have similar products). However, it was the release of OpenAI’s ChatGPT that made the public realize just how close AGI possibly is. For those of you that haven’t tried it yet, please do so. It’s truly mind-blowing!

I hope I provided a bit of useful context and conveyed the truly remarkable inflection point that we are currently at: The potential advent of AGI within a couple of months or years. Now let’s dive into the existential risk this could pose to humanity.

What is the problem?

The line of arguments leading to the threat of humankind’s extinction can be summarized as follows:

1. Sooner or later there will be an artificial general intelligence (AGI)

  • An AGI is an AI system that has cognitive abilities (i.e. intelligence) comparable to humans across a wide range of tasks. AGI is a further advancement of “narrow AI”, which can only perform well-defined tasks in a narrow area, e.g. a chess computer.
  • There is some debate among scientists as to whether achieving human-level AGI is possible at all with our current type of computer systems. Also, there are discussions about some practical limits of AI progress, e.g. related to energy consumption and hardware costs. That being said, there seems to be somewhat of a consensus that AGI is achievable aht that and any practical limitation will be overcome by efficiency gains in hardware and software over time. The recent advancements in AI seem to point in the same direction, as some authors already claimed to have found “sparks of AGI” in OpenAI’s GPT-4 system released in early 2023.
  • Personally I also think that if we keep improving today’s AI systems, we will reach AGI eventually. The only open question is how much computing power, memory and data for training is needed to eventually match the human brain’s cognitive abilities. This will determine how fast (not if) AGI will happen.

2. At some point a critical threshold will be reached, where an AGI becomes a self-improving artificial superintelligence (ASI)

  • Upon achieving AGI, there will be further improvements in both hardware and software, which will eventually bring the AGI’s cognitive abilities to levels far beyond any human (or all of humanity combined). This is the point where we transition from AGI to ASI. At the latest now, the ASI will be a powerful global actor that can influence world events.
  • Up until the point of reaching AGI/ASI, humans were the most intelligent species on earth and, hence, the species that could make advances in AI the fastest. With the advent of ASI, the ASI is now the most intelligent actor on earth. It is reasonable to expect that going forward, the fastest progress in AI capabilities will be achieved by ASIs. Today’s AI systems can already write software code.
  • Would an ASI self-improve? The answer is likely yes. Even if humans don’t instruct the ASI to improve itself, which may well be the case already, the ASI will likely start self-improving by itself. The reason is that becoming more capable certainly helps achieve whatever goal the ASI is pursuing. Hence, self-improvement is a logical thing to do for an ASI and this will only increase the difference in intelligence (and power) levels over time.

3. Once deployed to the internet, an ASI cannot be monitored or controlled anymore by humans

  • All neural network-based AIs are essentially “black boxes”, which means that scientists currently have a very limited understanding of the inner workings of AIs and why they behave the way they do. While this can already pose challenges for AIs with lower-than-human intelligence, it becomes a real problem when we reach ASI.
  • How do you monitor an ASI to make sure that it only does what it is supposed to do? In short, you probably can’t. Without a deep understanding of the inner workings, the only conceivable way would be by monitoring the outputs of the AI. The problem is an ASI could easily deceive a human observer due to its vastly higher intelligence and pretend that everything is in order when in fact it’s not. We have no possibility to reliably verify the output of an ASI.
  • Let’s assume we could monitor the ASI and detect bad behaviour. How could you control or stop something that is orders of magnitude smarter than you? One approach would be to strictly “airgap” the system, i.e. not giving it any possibility to connect to the internet. That way one could easily pull a physical plug to shut down the ASI. However, today’s AIs are already connected to internet and this will likely also be the case for any more capable successors because companies obviously want to offer the AI-based services to customers via the internet. With internet access, though, there is probably not much a human can do to control an ASI. A suitable analogy would be trying to control a modern chess engine by winning chess games against it. Good luck!

4. AI alignment is needed to make sure that ASIs act in line with human values and interests

  • Assuming that points 1–3 are true, it is obvious that humanity needs a way to ensure that the goals of ASIs are always in line with human values and interests. This is what’s called “AI alignment”. If AI alignment is not guaranteed, then an ASI could behave in ways that are harmful to humans and it would be difficult or impossible to stop it.
  • One difficulty will be defining (and making everybody agree on) a set of universal human values and interests that ASIs need to follow. If history proves one thing, it’s that the range of human values and interests is very diverse, to say the least. In practice, it will likely be up to the nations or corporations that have the most advanced AI technology to decide which values and interests the AIs will be aligned to, if they manage to find a way to do that (see point 5). Everyone else will have to follow.
  • Another important thing to note is that whichever AI alignment technique is employed, it needs to keep up with any potential self-improvement of the ASI (see point 2).

5. AI alignment is currently an unsolved technical problem and it is unclear whether a reliable long-term solution can be found before the advent of ASI

  • Points 1–4 show how crucial it is to successfully solve the AI alignment problem before (!) ASIs appear. Unfortunately, despite decades of research, there is currently no technical way to guarantee the alignment of AIs as powerful as GPT-4, let alone more capable ones. It is unclear if a long-term solution even exists in principle. This makes even “well-aligned” ASIs extremely dangerous, if they are used by bad people, not to mention unaligned ASIs. There have been cases already where current AIs showed bad unintended behaviour, e.g. Microsoft’s Bing Chatbot threatening users.
  • The amount of research effort put into AI alignment is dwarfed by the amount put into advancing AI capabilities. One reason is that advancing AI capabilities has huge short-term monetary incentives, whereas advancing AI alignment techniques has not. In summary, the speed of scientific progress currently seems to be much slower in AI alignment than in AI capabilities, which is not good.
  • Some researchers estimate that solving the AI alignment problem is still decades away. While it is always difficult to anticipate future breakthroughs, I think there is a high probability that humanity creates an ASI before knowing how to properly align it. Whether the ASI is created accidentally or intentionally doesn’t really matter. The effect will be the same.

6. It is likely that an unaligned ASI will wipe out human civilization

  • Without a reliable and scalable AI alignment technique, we simply don’t know how an ASI will act, what goals it will pursue and whether these goals will be compatible with human interests. Therefore, an ASI wiping out human civilization is at least a possibility. To make matters worse, many scientists argue that it is not only possible, but, in fact, likely that an unaligned ASI will eventually wipe out humanity. That’s because doing so could be the best strategy to pursue its goals (whatever they may be) in an unimpeded manner.
  • To make this threat a bit more concrete, let’s consider the following thought experiment: An ASI is developed by a company with the seemingly innocuous task of producing paperclips. The ASI takes this goal very seriously and begins by optimizing the process of making paperclips, creating them more efficiently and in larger numbers. This seems positive at first, but it doesn’t stop at a reasonable limit. Instead, it starts converting all available resources into paperclips. It might dismantle buildings, infrastructure, and natural resources to get the materials it needs. As it becomes more desperate for resources, it might even see humans as a threat (because they might want to turn it off) or potential resource (after all, our bodies contain iron, which can be used for paperclips). This is the point where the ASI might decide to wipe out humanity. The point of this thought experiment is not to suggest that we will intentionally or accidentally create an ASI obsessed with paperclips (you can replace this goal with others), but to highlight the potential dangers of creating an ASI without a fail-safe mechanism to align it with human values and interests.
  • Once we reach the point of having created an unaligned ASI, we potentially squandered all options to achieve a favourable yet likely outcome. At that point the best we can hope for is that an ASI just happens to act benevolent towards humans and continues to do so for some reasons we neither understand nor can influence, however unlikely that may be. Not a very positive outlook, is it?

Some of the brightest minds in AI think that the above scenario (or related variations of it) are definitely possible and are therefore raising the alarm. Three prominent examples are Geoffrey Hinton (the “godfather of AI”, who invented the current method for training neural networks), Sam Altman (the CEO of OpenAI) and Stuart Russell (one of the most renowned AI safety researchers in the world). To give you a feeling for how serious they think this problem is, I’ll just provide a couple of quotes:

“The alarm bell I’m ringing has to do with the existential threat of [AIs] taking control. […] I used to think it was a long way off, but I now think it’s serious and fairly close.”
— Geoffrey Hinton (
Source)

“Development of [ASI] is probably the greatest threat to the continued existence of humanity.”
— Sam Altman (
Source)

“AI research is making great strides toward its long-term goal of human-level or superhuman intelligent machines. If it succeeds in its current form, however, that could well be catastrophic for the human race.”
— Stuart Russell (
Source)

To be fair, no one would argue that the above scenario is 100% likely. But then again, no one can seriously claim the chances of this happening are 0%. Based on what I read so far, I think most experts estimate the likelihood of doom from ASI over the next couple of decades somewhere between 10% and 50%, with some notable exceptions being significantly more optimistic with likelihoods very close to 0% (e.g. Yann LeCun, Meta’s chief AI scientist) or drastically more pessimistic with likelihoods above 99% (e.g. Eliezer Yudkowsky, the founder of the Machine Intelligence Research Institute).

Jan Leike, the head of the AI Alignment team at OpenAI, is one of the more optimistic experts (he probably wouldn’t have this job otherwise). While admitting that OpenAI doesn’t have a solution to the AI alignment problem, he’s certain that they will solve it in time. The problem is that he doesn’t offer any solid justification for this optimism. OpenAI did publish its approach to solving the problem (which essentially says they want to use narrow AI to align AGI), but this plan has been criticized as insufficient. So to summarize: OpenAI, arguably the company closest to creating an AGI, acknowledges that an unaligned AGI could lead to humankind’s extinction, admits that they currently don’t know how to solve the AI alignment problem, admits that the problem is indeed very difficult to solve, don’t state if/when they will find a solution, can’t tell how far away they still are from achieving AGI (nobody can) and yet they push forward. Just let that sink in for a moment.

Personally, I currently estimate the likelihood of the above scenario to be between 20% and 50% over the next 10 to 20 years. I’m probably a little more pessimistic than the average of the “expert likelihood distribution”, because I find the arguments of optimistic people like Yann LeCun or Jan Leike utterly unconvincing. Unless you can reasonably state why one or more of the above six arguments are false (or at least very unlikely), I think you should start sharing my concern. In what other field of technology would you simply accept a higher than 10% chance of humankind’s extinction?

How could an AI wipe out humanity?

Once you accept that the above scenario is at least not completely unlikely, the immediate next question is probably: How could an ASI kill us? Unfortunately, that’s very hard to predict in detail, but not knowing any specifics doesn’t make it any less worrisome.

The point is that when humans try to predict how an ASI could act to harm them, this is akin to when a frog tries to predict how a human might act. This is basically impossible for the frog, because the difference in intelligence is simply too big. Therefore, a group of frogs would obviously not stand any chance of winning a confrontation against a group of humans. The same applies to humans when facing ASI.

While this might sound rather vague, I want to point out a few things that could make this situation a little more concrete and graspable:

  • Some ASI will likely have internet access, which means it could almost instantly copy itself to other computers around the world. After all, an ASI is just a bunch of data containing the interconnection weights of a neural network that can be executed on any (sufficiently powerful) computer. This renders any notion of just “pulling the plug”, if the ASI becomes too dangerous, pointless. On top of that, how would you know when this point is reached, if you can’t even monitor an ASI properly (see point 3 above)?
  • Any reasonably complex software inevitably contains bugs, which can allow attackers to infiltrate, manipulate or takeover someone else’s computer systems. Even today human hackers constantly exploit bugs and cause damages of more than a trillion (!) dollars per year. Just imagine an actor that is 10x, 100x, 1000x better/faster at finding and exploiting these bugs. An ASI will likely be able to gain access to many computer systems we use regularly (cars, airplanes, smartphones, social media etc.) and can use/manipulate these at will. Yes, you could try to use another ASI to defend against such attacks, but again: without an alignment method, you simply cannot know whether it will really defend you or just conspire with the attacker (see the arguments in the previous section).
  • Instead of causing harm by itself, an ASI could simply make humans use existing weapons (e.g. nuclear weapons) against each other. This could, for instance, happen via common bribery or wrongfully convincing world leaders that they are under attack by another nation. Remember, any type of digital content (speech, text, image, video etc.) produced by an ASI will be indistinguishable from content created by humans. Even with today’s AIs (e.g. GPT-4, Midjourney etc.) we are already very close to this point! How would the president of a nuclear power react, if all computer systems convincingly stated that an enemy rocket with a nuclear warhead is approaching?
  • An ASI could also create new types of weapons. For instance, it could design a piece of DNA that contains the information of a deadly pathogen against which humanity has no defense. It could then send this information to a lab that will synthesize proteins from any DNA provided by its customers (yes, these kinds of labs exist) and thereby set the disaster in motion.

The list could go on, but I hope you get the point. We would have to deal with an opponent that outsmarts us by orders of magnitude. I don’t see any promising strategies to defend ourselves against an ASI unless we solve the AI alignment problem before (!) creating ASI in the first place. ASI could become the most powerful force on earth and humanity will be the metaphorical frog I mentioned earlier.

Before we move on, I have to address a common misunderstanding in people’s minds about how an AI could become dangerous, because this fallacy sometimes misleads the public discourse: Many people assume that before an AI can pose any existential threat to humans it 1.) needs to develop some kind of consciousness or sentience and 2.) needs to have direct control over the physical world, e.g. have robots at its disposal that can harm humans. However, neither is necessary. When an ASI is pursuing a goal (whatever it may be), it doesn’t really make a difference whether it simultaneously thinks about its own existence or feels any emotions. Also, given its high intelligence and mastery of natural language and software code, it will be able to manipulate humans and computers enough to achieve its goals, even without an army of killer robots shooting people. In other words, just because you don’t yet see the “Terminator” walking down the street pondering his own existence, doesn’t mean you are not in trouble.

When could the AI doom happen?

The short answer: No one really knows, but it could be rather soon.

The Wright brothers famously expected motorized flight to be another 50 years out, just two years before they themselves flew the first motorized aircraft. This goes to show that predicting technological breakthroughs is notoriously difficult. This is especially true in AI, where the pace is currently so incredibly fast, that even experts have a hard time keeping up with the latest developments. That said, “AI godfather” Geoffrey Hinton currently predicts that AGI will arrive in the next 5–20 years, but doesn’t rule out the possibility that it might go even faster. Other experts, e.g. DeepMind’s CEO Demis Hassabis, are confident that AGI could happen “within a few years”. This should give you a feeling for the potential urgency involved here.

I personally find it hard to believe that it’ll take more than a decade until AGI arrives. For that, we are simply too far along already, the progress is too fast and there just too much money and effort being put into this topic. For reference, deep learning (the current basis of basically all AI systems) only really started in 2012 and now look where we are just a bit more than a decade later. As mentioned, predicting the future is hard, so please take all of this with a grain of salt and do your own thinking.

In general, people’s expected arrival time of AGI has moved significantly closer recently. One example of this change in sentiment is the prediction website “Metaculus”, where the consensus for the arrival of AGI has moved from the second half of this century to early 2026 within the last 3 years. This was likely fueled by the release of ChatGPT (and its successors), which impressively demonstrated the capabilities of such systems to a wider audience.

Another important question is the speed of the “AI takeoff”, i.e. how quickly the AGI would transition into an ASI and become dangerous. Some experts think this will develop rather gradually over several years. This is the “optimistic” scenario as it would at least give humanity a bit more time to react (and maybe even solve the AI alignment problem last minute). However, other experts believe that very shortly (think days or weeks) after the creation of an AGI everyone on earth will “fall over dead”. Again, it’s hard to tell who’s right, but I want to convey the full range of possibilities here.

In summary, there is a lot of uncertainty, but it is definitely conceivable that AI doom could happen within this decade. To put it bluntly: If you are under 60, you might not live to see retirement. If you have young children, it may very well be that they won’t live to see adulthood.

Is there something we can do about it?

The most important thing is: Don’t panic. I’m not saying there is no reason for panic, but panic is never helpful. That said, at the moment, many people concerned about this topic (including myself) don’t see a clear solution, let alone a simple one. “AI godfather” Hinton recently said:

“I’m just someone who’s suddenly become aware that there’s a danger of something really bad happening. I wish I had a nice solution, like: ‘Just stop burning carbon, and you’ll be OK.’ But I can’t see a simple solution like that.”
— Geoffrey Hinton (
Source)

We are dealing with a complicated problem and it is unclear if there is a solution at all. If there is a solution, it has to encompass technical, regulatory and societal components. On top of that, any solution will require global buy-in and enforcement from all governments and corporations. Unfortunately, from a game-theoretic point of view, this is a really bad situation to be in. It just needs one rogue nation, one rogue company, maybe even just one rogue data center not aligning an ASI properly for everyone to get killed. Also, with continuous improvements in hardware and software, the difficulty of creating an ASI becomes smaller every day. There will probably be a time when almost every laptop will be powerful enough to run an ASI. How could you possibly control that globally?

Despite this dire outlook, I see a couple of things everyone can and should do to increase our chances of a positive outcome:

  • Educate yourself about how AIs work, what the AI alignment problem is and what people knowledgeable about this topic have to say. You don’t have to be a computer scientist to do that, there are plenty of great resources on the internet to get started. Form your own opinion as to how bad this problem is. Do you agree that this is serious? If not, why not? What are your reasons for being optimistic/pessimistic? What should be done in your opinion? These are questions everyone should be able to answer eventually.
  • Raise awareness about this topic. We need a broad societal push to solve a problem like this. Talk to your friends, family, coworkers etc. about it. Don’t stir up panic, but lay out the facts and make people think about this themselves. I find it puzzling that this topic is so important, concerns literally everyone on earth and yet receives so little attention.
  • Support politicians that understand the topic and try to increase our chances of a good outcome. Examples of good policies could be:
    - Passing regulation on safety standards for AI labs (analogous to the rules in biolabs working on highly hazardous materials).
    - Negotiation and ratification of a global “UN AI Safety Agreement” (analogous to the UN Climate Agreements), setting bounds on what people are allowed to do with respect to AI and what happens, if they don’t follow the rules.
    - Increasing the funding of research on AI alignment.
    As mentioned previously, this will only work if all countries working on AI follow suit, which makes this a very hard problem. However, humanity can be quite good at coordination and finding solutions at a global scale if everyone is on board. This gives me a bit of hope.
  • Pressure large organisations at the frontier of AI research (e.g. Microsoft, Meta, Google, Baidu) to work responsibly and adequately allocate funds to their AI alignment research. The usual Silicon Valley approach of “moving fast an breaking things” is not an option here. If any of these companies go one step too far, this could be it for humanity.

There are certainly more things you can do, but these are the most obvious ones I came up with. If you have better advice than that, please do share it.

As a small side note: In addition to the AI alignment problem, there are other concerns regarding AI, for instance, the disruption and displacement of human labour, misinformation, privacy concerns, bias etc. These are all valid concerns, but they are not as existential as the threat of humankind’s extinction. Yet, it seems like these topics receive a lot more attention by the public, media and politicians than the AI alignment problem, even though the latter should be at the top of everyone’s mind. If humankind doesn’t exist anymore, then any of these other issues is obviously irrelevant!

Conclusion

Humanity is facing one of its biggest challenges yet. We could go extinct if an unaligned AGI is created and, unfortunately, there is currently no easy solution in sight to prevent this. While there are a few things everyone can do, the most important thing is that the people developing AI systems solve the AI alignment problem before (!) creating AGI. Increasing the chances of this happening should be at the top of everyone’s mind right now.

If you have any valid technical or scientific arguments that refute the points made in this text, I’d love to hear them.

Appendix

Why am I writing this?

Here are a couple of my thoughts:

  • I want raise awareness about this topic, because at least in my social circle it seems like most people aren’t sufficiently aware of the problem. While some people do understand the problem well and are grappling with it just as much as I do, there are still too many that just shrug it off or make fun of it. Unfortunately, I think a problem of this magnitude can only be solved if a sufficiently large amount of people take it seriously enough and push for potential solutions.
  • While I was able to find lots of resources on this topic on the internet, most of it is full of jargon and requires at least some technical knowledge. I wanted to explain the problem in relatively simple terms so that interested lay people can understand it. At the same time, I wanted to do so in a responsible way without causing panic or resignation. I think there is a crass mismatch between how serious (and how likely) this problem is and the amount of attention it gets. Maybe writing a text for a non-technical audience can help change that.
  • Another motivation for writing this text is that my wife and I recently became the parents of a wonderful girl. She’s now just a couple months old and I want her to be able to grow up and have a long and happy life without the threat of AI doom looming over her.
  • I like humanity in general and I want it to have a long, prosperous and happy future. If this text increases the chances of a positive outcome for humanity even by just a very small amount, I think it’s worth trying. You never know who’s going to read this text and what consequences it might have.
  • Maybe writing down my thoughts in this text was also a kind of self-therapy for me. Since I first learned about the AI alignment problem, this topic really keeps me up at night (metaphorically speaking, what actually keeps me up at night is my baby daughter, but that’s alright). Why is it exactly the risk from AI that has this effect on me? After all, doesn’t a problem like climate change also threaten humanity? Well yes, but the AI risk is different for me. In the case of climate change I do see how humanity can solve the problem and the steps needed to achieve that. Are we there yet? No. Should we move faster? Sure. But at least many people are aware of the problem, we are moving in the right direction and we have the technology needed at hand. That’s why I’m optimistic there, but in the case of AI, the outlook is far worse: I don’t see a viable solution at the moment and I don’t know if there even is a solution (it certainly won’t be simple). Also, the latest developments are definitely not moving in the right direction. That’s why I’m currently relatively pessimistic and raising the alarm. I’d be happy to update my opinion any time, if this changes.

Could I be wrong and everything will be fine?

Yes, of course. There’s always the possibility that I or other people voicing concerns are wrong and the risk scenario is completely overblown. As mentioned, there are knowledgeable AI experts that are more optimistic than others. However, at the moment there is no clear consensus as to which of the arguments laid out above are false. This fact alone is cause for concern, in my opinion.

In addition, I find the arguments of optimistic experts like Yann LeCun not convincing. They seem to support one or more of the following arguments, which would strongly alleviate the problem:

  • True AGI won’t come in the foreseeable future, either due to some (currently unknown) property of human intelligence, that cannot be straightforwardly reproduced with computers or due to limits in the available computational resources or data. Hence, AI will stay sufficiently narrow and cannot pose risks to a more general kind of intelligence like humans.
    This is possible, but unlikely in my opinion. The latest progress suggests that AGI could be possible in the near future and currently there is no scientific reason why it can’t or won’t happen.
  • Non-aligned ASI for some (currently unknown) reason is not as dangerous as expected and will either leave humanity alone or act benevolently towards humans.
    I hope this was the case, but again I haven’t heard strong scientific arguments in favour of that so far. If you have one, I’d love to hear it. Most experts agree that AI alignment is necessary to prevent ASI form harming humans.
  • AI alignment is easier to solve than expected. Therefore, a solution will be found in time and humans can guarantee that ASI behaves in-line with human values and interests.
    If that’s true, great! Then please show the world a concrete method how to align an ASI (or at least how to make significant progress towards that goal). Until then I’ll remain skeptical, given the much faster progress in AI capabilities than in AI alignment research that we’ve seen over the past decades.
  • Humanity will be able to control ASIs just like we did before with other types of technology (nuclear weapons, electricity, airplanes etc.).
    This argument fails to acknowledge a massive qualitative difference with respect to ASI. None of the technologies that humanity previously developed were intelligent, let alone intelligent on a level comparable or superior to humans. Therefore, to me it’s obvious that ASI will be a completely different ball game. In addition and sticking with the example of nuclear weapons, just consider this: Nuclear weapons are not self-replicating, they don’t self-improve, we understand their inner workings sufficiently well so that we know how powerful they will be before setting one off, they are perfectly aligned with the intentions of their creators, the materials and factories for building nuclear weapons are relatively easy to spot, you won’t be able to build one at home, just because you can build one doesn’t mean you can build 1000s of them and humanity is fully aware of the risks and has implemented global agreements to keep the risk under control. None of this is true for ASI!

All in all, I’m currently on the medium-to-very concerned part of the spectrum, but I strongly hope that I’m wrong and that the arguments for the AI risk scenario break down somewhere. I hope that someone will soon be able to convincingly argue why the risk is in fact a lot smaller than expected. Until then I think we should tread carefully.

If you want to estimate the likelihood of AI doom for yourself, answer the following questions: How likely is it, that:

  • AGI will be created?
  • once AGI has been created, it will then turn into a self-improving ASI?
  • humanity will not be able to monitor and control all ASIs created?
  • the AI alignment problem will not be solved before the advent of ASI?
  • an unaligned ASI eventually leads to the extinction of the human race?

Multiply the chances of all of these questions and you’ll get a first estimate for your personal likelihood of AI doom.

Disclaimers

  • I don’t have an AI background and I’m not a computer scientist. My background is in physics. That said, I think I understand the topic and the concepts involved well enough to draw reasonable conclusions from arguments that others (people more knowledgeable in AI than me) have put forward. I tried to faithfully extract and summarize the key points in this text, so that non-experts can understand the issue at a high level. If I made a mistake somewhere or if something could be easily misunderstood, please let me know. I’ll happily make corrections, wherever necessary.
  • I’m not skeptical about technological progress in general, quite the opposite actually. In my view, it’s obvious that technological progress has made the human life on earth vastly better when compared to how our ancestors lived. Yes, there are problems caused by the use of technology, but these are mostly solvable in principle and my argument nevertheless stands. I do believe that humanity should continue to embrace the responsible use of technology wherever possible to improve our lives further. In fact, I even work for a technology fund, so please don’t call me a technology skeptic. I’m not. That being said, AI could be one of the cases where simply continuing the technological development without strict guardrails could be irresponsible and dangerous.

Contact

If you would like to contact me regarding this article, feel free to do so via thoughtprovoked.blog@gmail.com.

--

--