How Does Artificial Intelligence Think?

Explainable Artificial Intelligence: From Black Boxes to Transparent Models

Can artificial intelligence decide on who will be released from prison, who will advance to the next round of job interviews or who will be recommended preventive examinations for various diseases? Can it take full responsibility for driving in a snowstorm? 

Not sure how to answer such questions? If you knew that artificial intelligence justified its decision in detail and everyone could verify that it made the right decision – would that change your opinion?

We bring you the first part of a series of articles on the topic of explainable and transparent artificial intelligence.

In this article you will learn:

  • why we need artificial intelligence to be explainable and transparent,
  • how we search for balance between accuracy and interpretability.

Introduction to Explainable Artificial Intelligence

Although artificial neural networks have been known since 1943 (McCulloch and Pitts), the latest and most significant boom around them did not begin until 2012 with the research by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton1, who are pioneers in deep learning as the current dominant branch of artificial intelligence (AI). In their work, they showed that even a relatively simple deep neural network, which consisted of eight layers (compared to today’s networks, which can contain more than a thousand hidden layers, it is a relatively simple network), significantly surpassed the former approaches of the classification of images into thousanddifferent classes.

Figure: An example of a neural network (in this case it is a feedforward neural network).

A typical simple artificial neural network (figure) consists of an input layer, multiple hidden layers, and an output layer. The input layer represents the data that the neural network works with – it can be image pixel values, numerical word representation or values ​​measured by a sensor. Hidden layers then transform these inputs into a so-called hidden (or latent) representation – in the case of images, we can imagine such a representation as a result of filtering the image in a graphics editor (“photo shop”). The neurons in the output layer represent the final prediction of the neural network. In this figure, the neural network has exactly one output neuron.

For example, if its value for input X is close to 1, it may mean that the input image contains a car. Conversely, a value close to 0 may indicate that the car is not in the image. In this case, the neural network predicts that the car is most likely in the image, as the output neuron has reached the value of 0.93.

The principle of modern neural networks function is, simply put, based on the gradual filtering and transformation of input data, until such transformed information provides an answer to a question that the model is supposed to answer. For instance, if a model is to classify images, it can, for example, filter or highlight edges, blur an image, or highlight a color in the first hidden layer — or all at once! The second layer no longer works directly with the pixels of the original image, but with the output of the first layer.

So it already works with more abstract concepts than the original pixels. It turns out that the more hidden layers the model contains, i.e. the deeper it is, the better performance the model typically achieves (of course, even this rule does not always apply and it depends on the context and the task at hand).

Since the breakthrough of Alex Krizhevsky et al., deep neural networks are being used more and more and are constantly improving. They can help us analyze and gain knowledge from camera images, allowing, for example, self-driving cars to decide whether to turn the car or accelerate, or to analyze the sentiment and tone of social media discussions, which can help to cultivate them and reduce the amount of hate speech.

Neural networks and their applications already surpass people in several individual tasks. However, the statement “There ain’t no such thing as a free lunch“ also applies here. And the price for the growing usability of neural networks is, among other things, their complexity.

“And as the complexity grows, our ability to understand models and their decisions decreases at the same time.” 

It is true that regarding some types of applications it is not necessary to understand the decision making process in deep neural networks in detail. For example, it is not a big problem if artificial intelligence does not recommend the best phone in an e-shop. But then, there are areas such as medicine, security, energy, finance, or the aforementioned self-driving vehicles. Areas that have a direct impact on our lives, and where understanding and the ability to explain how artificial intelligence “thinks”, are as important as their results.

Although artificial intelligence systems and their use are on the rise, they do not yet decide in most sensitive cases. Until that happens, we will have to be able to answer questions such as:

  • Why does the recommendation system of an HR department recommend not to hire an employee?
  • What is the basis a model uses to decide whether or not you are likely to commit a crime repeatedly?
  • How do self-driving vehicles recognize the stop sign? Can they recognize it properly?
  • Is it important for us to understand what their decisions are based on and what knowledge neural networks contain?

The subject of our research is the explainable and interpretable artificial intelligence (AI). Explainability and interpretability are very important concepts, but they do not yet have a fully established meaning in the scientific literature. 

In our research, we understand them as follows:

  • The purpose of explainability is to provide a rationale for the artificial intelligence decision, which will be in a “language” that a human being is able to understand. In other words, we want to know why the model decided the way it did. When classifying images of animals, for example, we want to highlight those parts of them that have convinced the artificial intelligence about the presence of a cat or a dog.
  • Interpretability, compared to explainability, does not focus on a specific prediction, but its goal is to reveal and understand the knowledge encoded in the model, it means that we want to interpret the model. For example, we may be interested in how a model that classifies images of different animals “represents” a cat. We are talking about the so-called mapping of abstract concepts to a domain humans are able to understand.

The ability to explain how AI reached a particular prediction provides new opportunities to improve models, discover hidden biases, coded discrimination, and interpret knowledge within the model.

Accuracy vs. Interpretability

The efficiency of neural networks does not come for free. In order for networks to achieve high accuracy with relatively little effort (compared to other approaches), they must be complex. And as in other spheres of life, understanding complex things is much more difficult than understanding simple ones.

“The goal of our research is to bring the best of both worlds together. We want the accuracy of deep learning, but at the same time we want to achieve the possibly highest interpretability and explainability. We want to know why models decide the way they do and what knowledge is encoded in them.”

In the diagram below, we see some well-known machine learning models. On the X-axis, the models are ordered by to the degree of their interpretability and on the Y-axis by the relative accuracy of these models in comparison with other approaches. Looking at the extremes, at one end, there are the deep neural networks that achieve the highest accuracy, but at the same time we understand them the least. At the other end on the right, we find artificial intelligence models in which knowledge is represented most explicitly (and most intuitively to humans).

As a general rule, the more explicitly the knowledge is stored in the model, and the more the model is interpretable to humans, the less modeling capacity it has (among other things, because such coding of knowledge is labor intensive) and the less accurate it is. For example, in rule-based systems or simple decision trees, the model’s “thought processes” in the decision-making process are beautifully simple: “IF an animal has feathers, AND it doesn’t fly, AND it lives in cold areas, THEN it’s a penguin.”

However, such models are very difficult to use in solving complicated tasks, where, in addition, they can lose their interpretability very easily. If we wanted to use the same model, for example, to recognize the language in which a text is written, the number of conditions in the “IF… AND… THEN” rule could very quickly climb to hundreds or even to hundreds of thousands.

Figure: While complex models usually achieve higher accuracy, the degree to which they are interpretable tends to be lower. Research in explainable artificial intelligence examines the ways to create models that, in addition to high accuracy, achieve a high degree of transparency and interpretability (green area).2  

Conclusion of the First Part

In the first part of the series on explainable artificial intelligence, we talked about accuracy and explainability, their relationship and what we need explanation for.

Despite the fact that complex models such as deep neural networks are currently used in research and industry, we must remember that their unprecedented performance comes at a price.

The decision-making process for complex models is highly non-transparent and one may not be able to understand how such models came to the prediction. In critical domains such as healthcare or the financial sector, this can have serious consequences or it can completely prevent the use of complex models.

Explainable Artificial Intelligence (XAI) research seeks to provide methods and techniques to achieve the best of both worlds. Thus, the interpretability of simple models and the performance of complex ones, which are often referred to as black boxes. That is why we focus on XAI research at KInIT.

If you are interested in the topic of explainable artificial intelligence, you can look forward to other parts of the series. We will focus on transparent and so-called black box models or methods for explaining and interpreting artificial intelligence models. We will also show that one method of explanation is not enough and why this is so. We will take a look at how to measure the quality of explanations.


[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems – Volume 1 (NIPS’12). Curran Associates Inc., Red Hook, NY, USA, 1097–1105.

[2] ARRIETA, Alejandro Barredo, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 2020, 58: 82-115.

The PricewaterhouseCoopers Endowment Fund at the Pontis Foundation supported this project.

Explainable Artificial Intelligence: From Black Boxes to Transparent Models