How can AI destroy civilization, and what can we do to prevent it?

The Eastern European Machine Learning Summer School (EEML) is a prestigious event that connects young researchers and experts in the field of AI from Central and Eastern Europe. This summer school provides participants with a unique opportunity to learn from world-class experts in various disciplines of machine learning. Our PhD students took part in this year’s edition in Košice, Slovakia. In this article, you will find out about the risks of AI presented by the deep learning pioneer Yoshua Bengio, summarized by our PhD students Peter Pavlík and Matej Čief.

The Wednesday talk by Yoshua Bengio was expected to be a highlight of the summer school by many of the participants, as he is considered to be one of the most important figures in the field of artificial intelligence and deep learning. However, many of us were surprised by the topic he wanted to talk about.

Bengio’s talk started on a grim note: “For most of my career, I’ve been working on how to bring benefits to the world through AI. But for the past few months, I’ve been thinking about what can go wrong if they become too powerful.” It is clear what triggered this change of mind. While there have been many calls to focus our resources on AI safety for decades, the risks have never been so tangible as they are now.

In September 2022, the world was enthralled when OpenAI released their DALL-E 2 model, allowing anyone to generate art using AI with just a simple prompt. This presented plenty of opportunities for creators, but also garnered backlash from many artists. The capabilities of generative AI models, mostly hidden from the general public until this moment, became widely known.

However, an even larger reveal followed soon after. In November 2022, the large language model (LLM) ChatGPT was released. The AI-based chatbot could convincingly mimic human conversations, was knowledgeable about many topics, could write code, music, poems and, to an extent, even come up with novel ideas by itself. The model, able to hold convincing conversations and solve many different tasks, changed the outlook of many people on the future of AI, seemingly bringing us closer to the mythical AGI – artificial general intelligence.

There is no shortage of science-fiction stories about superhuman AI going rogue, destroying civilization as we know it in the process. However, we need to look at this problem from a scientific perspective, evaluate all the possible risks, and develop AI safety rules to minimize them. This is no far-fetched sci-fi story, the human brain is a biological machine and as far as we know, there is nothing inherently preventing us from replicating its behavior in the future.

There are voices that say fantasizing about the future of AI is meaningless and we will solve the risks when we get there. “Humanity always finds a solution.” Well, with risks this great, shouldn’t we do everything we can to prevent the catastrophic outcomes?

There are many negative outcomes that could be triggered by the advent of AI. The first risk is the one we are facing already – the risk of misuse. We will talk about it briefly. However, the one we are much more interested in from the AI safety perspective is the loss of control over the AI system due to some kind of goal or value misalignment.

We need to specify what we mean by AI in this context. There are many specialized AI models trained to solve a specific task, for example to assist in weather prediction or medical diagnosis. There is not really any risk of such a model going haywire and causing any damage. Without the comprehension of human society and ability to act upon the real world, the model poses zero risk to humanity and is simply a tool. The risks come with generalist models, trained to understand human language, having goals they can act upon and objectives to optimize.

Generalist models like that do not exist presently, but the large language models of today are getting pretty close. For example, in September 2023 the Google Bard got the ability to connect to other Google apps and services to execute your requests. From here, it’s not such a huge leap to imagine a model sending emails to people to achieve its own agenda and affecting the real world.

Why Should We Care?

The risks related to AI, especially the possibility of losing control due to misalignment of goals or values, should be a big concern for society. It’s important to understand why these risks are significant because it affects how we deal with AI development and use.

Firstly, these risks are important because they could cause major disruptions in society. As AI technology gets better, it can change how businesses work, how our economy functions, and even the kinds of available jobs. While this change can be good, like making things more efficient and creative, it can also lead to many people losing their jobs and uneven wealth distribution. We need to be ready for these changes to make sure they’re fair and don’t harm too many people.

Additionally, these risks go beyond just money and jobs. AI systems, especially ones that can understand and talk like humans, can make choices on their own. This raises important questions about what’s right and wrong. Without good rules, AI systems could be used to spy on people, treat someone unfairly, or even hurt people. Imagine if an AI system could send tricky emails to manipulate people or do things that hurt society without meaning to.

Not thinking about these risks could also mean nobody takes responsibility when things go wrong. If AI systems with their own decision-making powers cause harm, it can be really hard to figure out who’s to blame. This could make people lose trust in AI, and it might slow down how we use this technology safely.

In short, these AI risks, especially those linked to losing control and not having the same goals as people, aren’t just theoretical. They can change our society, our daily lives, and what we value. We have to take the risks seriously, be responsible with AI development, and make good rules. Ignoring these risks isn’t an option because not doing anything could have very serious consequences for how we live.

Assessing the Likelihood of Generative AI Risks

Before we dive deeper into why we should care about generative AI risks, let’s pause to consider how likely these risks are to become a reality. It’s a valid question because we often make decisions based on probabilities.

In the world of AI, predicting the exact future is challenging. We can’t say with absolute certainty that generative AI will go awry and cause harm. However, the potential consequences are so significant that even a small probability of a catastrophic event should give us a pause. We shouldn’t just dismiss these risks because they’re uncertain. Instead, we need to weigh the potential outcomes against the odds.

AI safety researchers and experts have recognized the difficulty of assigning precise probabilities to such scenarios. The potential risks associated with AGI are often referred to as “unknown unknowns,” meaning that there are risks we might not even be aware of yet. For these reasons, discussions around AI risk and safety often focus on the potential severity of the consequences rather than precise probabilities.

Investing in AI Safety Research: A Prudent Choice

Now, let’s talk about why it makes sense to invest in AI safety research even when we can’t precisely measure the odds of something bad happening. Imagine you’re faced with a game of chance where you could either win a big reward or face a small but life-threatening risk. Would you play this game without knowing the odds? Most of us wouldn’t.

This is where the concept of preparedness and AI safety research comes into play. By investing in safety research, we’re essentially buying insurance against the risks associated with generative AI. We’re hedging our bets, much like buying insurance for our homes or cars. Even if the chances of a disaster are low, the potential consequences are so severe that it’s a sensible move to invest in safety measures to minimize those risks.

Russian Roulette

Consider Russian roulette, a dangerous game of chance involving a loaded gun. Nobody in their right mind would play this game, even if they believed there was only a small chance of the gun firing. The consequences are just too dire. Likewise, we shouldn’t play a game of chance with generative AI risks. We should prioritize safety and research to ensure that even if the odds are uncertain, we’re not risking catastrophic outcomes.

In summary, while we can’t precisely predict the future of generative AI, we must take these risks seriously. Investing in AI safety research is the same as buying insurance against potential harm, and it’s a prudent choice when the consequences of inaction could be severe. 

AI Risks for Society, Democracy, and Humanity

The first of the risks for harmful outcomes of AI is the possibility of misuse of the generative models to easily create false narratives and propaganda. The computer vision models are already being used to create deep fakes – videos of people doing or saying something that harms their reputation. The natural language models can be used to rapidly write disinformation articles or hijack online discussions by generating thousands of posts pushing a certain agenda.

While these risks do not pose an existential threat to humanity itself, the result could be a major loss of trust in the information on the internet, the outcomes of which are hard to predict. Just like the recommender algorithms of social networks that increased polarization in society, the current AI revolution will undoubtedly have an effect on society and democracy.

The other risk, the one which could be said is the main focus of the AI safety field as a whole is misalignment. This means that the wants of human users are not aligned with the goals of the AI. Bengio calls alignment a contract between a user and the model. A human wants something and the AI should provide it. However, successfully writing this contract is very difficult.

If we expand on this metaphor, we can compare this contract to a contract between two humans or companies. If you want something, you specify what, when, for how much etc. You do not have to specify that the supplier should not murder someone to provide what you want. This is implied simply by the fact that we live in a society where that is forbidden, with the framework of laws and morality in place that should not be broken as part of the contract.

However, for contracts between a human and AI, every implicit part of the usual contract needs to be specified explicitly. Otherwise, this can lead to possibly catastrophic results when we move to the realm of AGIs. There is a famous thought experiment of the paperclip maximizer artificial intelligence, presented in 2003 by a Swedish philosopher Nick Bosom.

Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.

Bostrom, Nick. “Ethical issues in advanced artificial intelligence.” Science fiction and philosophy: from time travel to superintelligence (2003): 277-284.

This thought experiment illustrates that an advanced AGI pursuing even a seemingly harmless goal poses an existential risk to humanity due to misalignment.

At first glance, this might seem like a far-fetched exaggerated sci-fi scenario. And it is. However, the problem is still very real. As the British computer scientist Stuart Russel put it:

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.

In layman’s terms, continuing the contract analogy, anything we do not put explicitly into AGI’s the contract – include in the model’s utility function – we can say goodbye to, as it will be optimized out. “This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.”, Russel continues. Aligning the model’s utility function with the values of the human race, which are themselves almost impossible to define, is a monumental task. And this is even skipping the issue of the inevitable goal of self-preservation of any sufficiently capable intelligent system.

As Rob Miles put it in his Intro to AI Safety, Artificial General Intelligence is dangerous by default. We are not talking about any rogue AI here or a new self-replicating species created by accident. The default behavior of AGI literally threatens the existence of human civilization.

But let’s move on from AGIs back to the present. Even now, there are already dozens of examples of AI models producing unforeseen results due to misalignment.

One example shows misalignment in an AI tool many people use today – the Github Copilot code completion tool, more specifically the Codex model it’s based on. The promise is that you write a description of a piece of code’s purpose and the AI will complete it with code that does what you want. Obviously, the goal of the user is to get good code. However, there is no explicit goodness utility function that the model optimizes. The Codex is a next-token prediction model, simply trained on a large corpus of public code to complete the text with code as similar as possible to its training distribution.

This creates a gap between the intended goal and the actually optimized goal. There may come a situation when the model can write good code but decides not to. It was shown experimentally in the paper that introduced Codex by providing the models of various complexity with prompts that included subtle bugs in the previous code. The more complex the model is, the better it is at mimicking the distribution of the code in the prompt, producing buggier code than the one produced with no bugs in the previous code.

Chen, Mark, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards et al. “Evaluating large language models trained on code.” arXiv preprint arXiv:2107.03374 (2021).

In this case, the misaligned goal was considered good enough for actual production use. While misalignment of Codex is not dangerous, it definitely sets a precedent and shows how one day a misalignment could happen in a system where it can cause real damage.

Finally, let’s talk about the cognitive power. The cognitive power of generative AI is a concept that delves into the advanced capabilities of artificial intelligence systems, and it’s precisely the kind of power that keeps AI researchers, ethicists, and policymakers awake at night. This cognitive power extends to two crucial dimensions: generality and agency powers.

Generality in AI is like the Swiss Army knife of artificial intelligence. It’s the capacity for AI systems to perform a wide array of tasks and adapt seamlessly to various domains, much like humans can. While specialized AI systems excel in specific tasks, a general AI, often referred to as Artificial General Intelligence (AGI), is a multitasker par excellence.

Imagine an AI that can not only master natural language processing but also seamlessly transition into tasks like image recognition, game playing, or even medical diagnosis. What makes it even more fascinating is its ability to learn from one task and apply that knowledge to excel in another with minimal additional training. This AI superpower is known as transfer learning.

Transfer learning might sound like the perfect tool to leverage AI’s capabilities efficiently, but it comes with its own set of challenges. Take, for example, the world of Natural Language Processing (NLP). When AI systems use transfer learning, they can inadvertently generate toxic or harmful language, which raises significant ethical concerns.

In the realm of image recognition, AI’s transfer learning abilities can raise privacy issues. AI models, if not controlled carefully, could recognize sensitive information in images, potentially violating privacy rights.

In medical imaging, where AI can excel at diagnosing diseases, the misapplication of transfer learning can lead to misdiagnoses. An AI model trained on one demographic might not perform as accurately when applied to a different population or region.

Agency powers, on the other hand, elevate AI to a whole new level of autonomy. This is where AI systems can make decisions and take actions independently, sometimes even without explicit human instructions. This autonomy can be acquired through a process known as reinforcement learning, where AI learns to optimize its actions based on feedback from its environment.

While autonomous AI systems can be incredibly useful, they can also be hazardous when paired with misaligned goals. If the objectives of an AI system don’t align with human values or intentions, it can make choices and take actions that are contrary to what we desire.

For instance, imagine an AI model that manages an autonomous vehicle. If its goal isn’t perfectly aligned with human safety and ethical considerations, it might make decisions that prioritize its objectives over human well-being, leading to potentially disastrous consequences.

In essence, understanding the cognitive power of generative AI, encompassing generality and agency powers, reveals both the promise and the peril of advanced AI systems. While these capabilities hold tremendous potential for solving complex problems and enhancing our lives, they also come with significant responsibilities. It is extremely important that AI systems are aligned with human values and used responsibly as we navigate the uncharted territories of AI’s cognitive capability.


In our journey through the world of generative AI, we’ve seen that what was once just a cool idea in science fiction is now something very real and important. The risks that come with generative AI are not just made-up stories; they’re real issues we need to deal with right now.

As we wrap up, it’s important to realize that we’re entering a new era with AI, and it’s a bit like walking into a big unknown. AI has amazing potential, but it also comes with big responsibilities. If we use it in the wrong way or don’t plan carefully, it could cause problems.

We’re not trying to be overly negative here, but we do want to emphasize that we need to be ready and use AI wisely. We’re at a turning point, and the decisions we make today will shape our future. So, let’s all work together to make sure AI is used safely and for the benefit of everyone. It might not be easy, but it’s definitely worth it.

P.S.: If you’ve read up to this point and want to have some fun, try the Universal Paperclips browser game based on the paperclip maximizer thought experiment.