Transparent Models vs. Black Box Models

Explainable Artificial Intelligence: From Black Boxes to Transparent Models

Can artificial intelligence decide on who will be released from prison, who will advance to the next round of job interviews or who will be recommended preventive examinations for various diseases? Can it take full responsibility for driving in a snowstorm? 

Not sure how to answer such questions? If you knew that artificial intelligence justified its decision in detail and everyone could verify that it made the right decision – would that change your opinion?

We bring a series of articles on the topic of explainable and transparent artificial intelligence.

In the third part, we will look at two categories of artificial intelligence models – transparent models and black box models. What are the differences between them? Why do we use non-transparent models if the ability to understand their predictions is so important? Which models can we consider transparent and which, on the contrary, require additional interpretation and explanation of their predictions?

Transparent models

In the case of transparent models, we can not only understand how they work from the algorithmic perspective, but we can also understand why they make decisions the way they do.  We, humans, can understand how models “think” and what knowledge they have gained about the real world. These models are typically older, less powerful and simpler.*.

Decision tree is a good example of a transparent model. In a decision tree, the prediction of the model for an input is based on knowledge that is expressed explicitly as rules. E.g., “IF a person has an income of more than 1,000 € and is less than 40 years old, THEN it is possible to approve a loan of 30,000 €”.

* Interestingly, even such naturally transparent models can relatively easily turn into something difficult to understand. It can be caused by the input data representation being very complex or the models being very large. In this case, it is difficult or impossible for a person to simulate the decision-making process of the model, which is one of the conditions of transparency.

Figure: A simplified example of a decision tree that provides a recommendation on whether  an applicant’s mortgage application should be approved a or not. The tree decides on the basis of four attributes – age and net income of the applicant, whether they have a guarantor and possibly the net income of the guarantor. In this case, the tree would recommend rejecting the application. The applicant is more than 40 years old, her net income is less than € 1,300 and at the same time the combined income with the guarantor is less than € 1,600. The tree creates rules in the learning process. In this case, for example, the tree could learn based on dividing the applicants into two groups – those who have repaid their loans properly in the past and those who have not. We can consider such a model to be transparent, because one can clearly simulate and understand the reason for the model’s decision. 

Black box models

We can understand the way complex black box models work. We understand their mechanics and the mathematical operations that are used in the learning process. So what is it that we do not understand?

Modern black box models are so complex that it is not possible to fully simulate how they reached their decisions. Not to such extent that would allow humans to understand what stimuli and knowledge encoded in the model led to their decision. 

In other words, it is not possible for humans to take a pen and paper and simulate the whole decision-making process of the model, because it is too complex. However, the individual operations that take place in the model can be very simple. This can be, for example, addition and multiplication. However, if the model contains millions to billions of such operations, it is not possible for humans to simulate or at least check the effect of all of them.

Deep neural networks are a typical representative of this category of models. While in decision trees, knowledge is expressed explicitly as rules, in the case of neural networks, knowledge is most commonly encoded in the form of very small numerical values – model parameters. The prediction is then created thanks to a sequence of mathematical operations (predominantly over matrices and vectors) that transform the input data into the final prediction of the model.

You can see an illustration of how neurons in hidden layers combine and gradually filter inputs using parameters in the image from the first part of the series. We have added an example to the image of how a neuron on the first hidden layer calculates the output. You can see that calculating the value of a single neuron is really not a rocket science, you just need to be able to add and multiply.

Image: Example of calculating the output of one neuron within a neural network. In the box, you can see what the calculation of the value of one of the hidden neurons on the first hidden layer may look like. “a”, “b”, “c” and “BIAS” are examples of neural network parameters. They are the ones that make up its memory. By setting the values of these parameters during the training process, the neural network learns. Today’s neural networks can have hundreds of thousands to hundreds of billions of parameters.

As another example, let’s look at the classification of black-and-white images into five different classes – car, truck, airplane, ship, and horse. An image can be represented in a computer as a two-dimensional grid of numbers – a matrix, in which each cell contains a small number. This number represents the value of a particular pixel (in a grayscale image, the white color is mostly represented by the number 255, while the black pixel has the value of 0).

The neural network in the first layer combines these numbers with its own parameters using various simple mathematical operations such as multiplication and addition. As a result, this image transforms to a different grid of numbers. However, these numbers are no longer the values of the pixels of the image itself, but they are a kind of a filtrate (expertly, we talk about hidden representation). When working with images, a convolution operation is often used in neural networks and it can also be the basis of many filters in image editors (e.g. Photoshop).

Image: In neural networks, the convolution operation (hence the term convolutional neural network) is often used when working with image data. Many popular filters in photo editors work on the same principle. For example, if you have ever smoothed or blurred a photo in such an editor, you have probably used a convolution-based filter.

Subsequently, the second hidden layer does exactly the same as the first one, but the input no longer consists of pixels, but it is a grid that was created in the first step. This process is repeated until we reach the last output layer in the neural network. The output is represented by multiple numbers in this layer. Each of these numbers tells us to what extent the model is “convinced”, whether the picture shows a car, a truck, an airplane, a ship or a horse. If the first number is the largest one out of these five, the model is convinced that there is a car in the picture, if the second, it is a truck, etc.

As we can see, in contrast to decision trees, it is very difficult for neural networks to determine the model’s reasons for a specific prediction. We often only indirectly observe how an input (e.g. a picture) transforms within a model from one hidden representation to another. Based on that we can estimate what is happening inside the model.

An example of gradual filtering from the input image to the final network’s prediction can be seen in Image 3. As we can see, the outputs of the first layers of the neural network (CONV, RELU,…) are easy to read for humans and we can even estimate what the particular layer is targeting. For example, at the outputs of the second layer (RELU), we can see that some parts of the network focused on detecting shadows under the car, while others detected the surrounding forest.

But the deeper we go in the network, the lower is our ability to interpret what is going on inside the network. For example, we do not know what the bright areas that represent the output of the last hidden layer of the network (POOL) mean. Does a bright pixel indicate that there is a wheel in the image? Or does it mean that the car is facing the right side?

Once again: What exactly do we not understand about black box models? A language model example for sentiment analysis

Let’s look at a specific example – an artificial neural network with the task to determine whether a text has a positive or negative sentiment.

What Was The Reason For The Specific Decision?

If a sufficiently large language model like BERT*, which is in fact a massive artificial neural network, is trained on text data with negative and positive sentiment, this model will probably achieve reasonably high accuracy.

However, we do not know on what basis the model assigns positive or negative sentiment to the text. Does the model consider the text as negative because it contains too many exclamation marks? Or is it because of the word “hell” that the author of the text used? There is a difference between the reviews “These waiters should go to hell!!!” and “In this restaurant they cook hella good!!!”.

* BERT: Bidirectional Encoder Representations from Transformers – currently one of the most widely used and powerful models for natural language processing.

What knowledge does the model contain and where is it stored?

If a trained language model works properly, it is obvious that it understands the human language to some extent. However, it is difficult to say whether and to what extent it understands individual aspects of the language. Does the model understand the difference between verbs and nouns? Did it learn grammar and sentence structure rules in a specific language? Does it understand the meaning (semantics) of individual words, or has it just learnt the statistics of their mutual occurrence?

In our model task – sentiment analysis, we can guess that the model focuses, among other things, on positive and negative words, but we do not know where this knowledge is stored. Is it some specific neuron that is responsible for recognizing positive sentiment in the text? Or is it a combination of multiple neurons? As we showed in the previous part, interpreting the significance of individual parameters of the model and their impact on the resulting prediction is a non-trivial problem.

Are Transparent Models Dead?

Most of the models that we consider transparent are simpler models with lower modeling capacity. Decision-making in complex models or models with too many rules cannot be understood by humans due to our cognitive limitations. It might therefore seem that the level of their use is low and not sufficient for more complex tasks.

However, the truth is that transparent models are still widely used in practice. This is because there are areas where transparency is at least as important as the performance of the model itself. A recent article [1] published in the prestigious journal Nature criticizes the unjustified overuse of overly complex models, even though simpler but transparent models would be sufficient to solve the problem.

Example: Nature Compas/Corels

Compas is a program used by several courts in the United States to estimate the likelihood that a person who has committed a crime will do so again. It is a program that has a direct impact on people’s lives, but at the same time it is non-transparent. The company that is developing the program has not even disclosed the principle on which it works.

In the above-mentioned article, the author Cynthia Rudin presented an alternative method Corels, which has much fewer parameters, is more transparent and at the same time achieves the same accuracy as Compas.

It is natural that a person who is about to be sentenced to jail wants to know the reason for the decision of the court. Even a person who has committed an offense or a crime has the right to make sure that the trial and the reasons for which he or she goes to prison are fair. 

To make things even more complicated, we can look at this particular issue from a different perspective. Would it really be reasonable if potential criminals knew the exact circumstances under which the software identifies them as high risk criminals? Wouldn’t it lead to an attempt to deceive the system in order to achieve a lower penalty?

In domains where human fate is at stake, let’s throw complex models away, and let’s accept the simpler but more understandable models, if they can achieve sufficient accuracy. 

Despite the deficiencies that result from complexity, it is still true that the strongest models we have are the most complex and most difficult to explain and interpret. In the first part of the series, we showed that choosing a machine learning algorithm to solve a specific problem is often about making compromises. While complex models usually achieve higher accuracy, their interpretability rate tends to be lower. In the following article, we will introduce methods that could help us increase their level of transparency.


In this part of the Explainable Artificial Intelligence series, we divided the artificial intelligence models into two categories – transparent models and black box models. In the case of transparent models, humans are able to understand not only the mechanism determining the way they work, but also their decision-making process. Thus, we can explain the reasons for their decisions. 

Black boxes are typically complex models whose decisions cannot be directly explained. Their knowledge is stored in a form that is incomprehensible to humans. If we want to increase their transparency, we need to provide them with additional explainability and interpretability mechanisms.

In the following articles, we will look at specific examples of methods of explainability and interpretability. We will explain why we need to have more than one explainability method. You will also learn how we can measure the quality of artificial intelligence prediction. We will also talk about how people can benefit from using transparent and explainable artificial intelligence methods.

The PricewaterhouseCoopers Endowment Fund at the Pontis Foundation supported this project.


[1] RUDIN, Cynthia. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 2019, 1.5: 206-215.

Explainable Artificial Intelligence: From Black Boxes to Transparent Models