Components and Properties of Good Explanations of Artificial Intelligence Models’ Decisions

Explainable Artificial Intelligence: From Black Boxes to Transparent Models

From the perspective of machine learning (ML), we are living in happy times. For many tasks that we were not able to find a satisfying solution with the help of artificial intelligence, we know not one, but many different ML algorithms or models with which we achieve at least acceptable performance. This number of models, types of data processing, their variations and possible combinations introduces a challenge – we have to find a configuration that suits our task and data.

To find the right model, we need to define criteria that measure how well a particular model and its parameters and hyperparameters fit the given problem. Accuracy, precision, recall, F1 score or squared error are some of the examples of frequently used criteria (which one to use depends on the task).

In the next step, we usually perform the optimization of the so-called hyperparameters. Here we try to find a good configuration of the model (for example neural network architecture) and the sequence of pre-processing and post-processing steps of the data. If we have sufficient resources, we can even use automated machine learning (AutoML), through which we can find a suitable configuration in the space of all available configurations fully automatically.

Thanks to the attention paid to explainable machine learning in recent years, the number of explainability methods has become as staggering as the number of different machine learning methods. For example, if we want to provide users with good explanations of the decisions of a complex model, we need to find not only the right explainability method as such, but also its specific configuration.

To find such a configuration, we can borrow some concepts from AutoML. Similar to AutoML, as an input we have a space of available XAI methods and their configurations. At the output, we want to get a combination of these that will provide us with good explanations for our combination of task, model and data. In this case, we can talk about automated explainable artificial intelligence or AutoXAI.

Figure 1: Nowadays, there is quite a large number of different explainability methods, and many of them are even highly configurable. If we want to identify a method that provides good explanations for a particular combination of task, model, and data, we can apply an approach similar to automated machine learning (AutoML). In this case we talk about automated explainable artificial intelligence or AutoXAI. The biggest challenge is to define criteria that measure the quality of explanations produced by different explainability methods and their configurations.

The challenging part of AutoXAI is defining how to compare different explainability methods . In other words – what does “good explanation” mean?

A Good Explanation Balances Two Components – Understandability and Fidelity

According to several authors (e.g. [1, 2]), a good explanation should be in balance between two components. It should faithfully describe the behavior of the model (fidelity) and it should be understandable to people (understandability).

If the explanation is understandable but lacks fidelity, we could end up with explanations that look “good” or they might reflect our expectations, but may not describe the decision-making process of the underlying model correctly. It means that the explanation may, for instance, not be complete. For example, it may lack information that some part of the input data significantly influenced the decision of the machine learning model. You can see an example in Figure 2.

Figure 2: The heatmap contains very bright and well bounded highlighted areas which, according to the explainability method used, contributed to the image being classified as a “parrot”. This explanation is easily understandable to humans because  it highlights only a small number of very well-bounded  parts of the image. On the other hand, this explanation is not complete. In fact, the area where the parrot’s head is located (the parrot on the left), contributed to the classification of the image into the “parrot” class.

On the other hand, if the explanation lacks understandability, the user may be overwhelmed with redundant information, or it may be so chaotic that the user (a human) will not understand it. 

For example, if some parts of the input had a negligible effect on the final model prediction, it may be counterproductive to include such information in the explanation a person is provided with. An example of the explanation in the form of a heat map, which also highlights the pixels of the image that had only a minimal impact on the prediction can be seen in Figure 3.

Figure 3: This heat map provides very detailed information on the  areas of the image that contributed positively (yellow and red) to the prediction and which, on the contrary, contributed negatively (blue). Although such an explanation provides fairly accurate information about the actual behavior of the model, it contains a large number of insignificant details that could overwhelm the user.

To make it more difficult, it is not always the case that there must be a balance between understandability and fidelity. An increase in understandability can actually lead to a decrease in fidelity and vice versa. If, for example, we want to use the explanation for automated debugging of the model, the fidelity of the explanations will be more important than the understandability.

A Closer Look at Understandability and Fidelity

For the sake of simplicity, we have so far limited ourselves to two components of explainability – understandability and fidelity. Let us now look at one possible taxonomy, inspired by the papers [1, 2], which provides a more detailed view of explainability. An even more detailed taxonomy of requirements on explainable methods, systems and frameworks for their evaluation was published by Sokol et al. [3].

Figure 4: Explainability components and properties. Inspired by [1, 2].

From the point of view of soundness, we ask to what extent the explanation is true with respect to the underlying model (and data). It explicitly focuses on whether the explanation really reflects only the behavior of the model. Therefore, the explanation should not be misleading, it should tell “nothing but the truth”. Let’s look at an example. Sometimes the implementation of explainability methods such as occlusion analysis uses tricks and optimizations that make them less computationally demanding. In this method, instead of gradually “deleting” parts of the input one by one and observing the impact on the model’s prediction, we delete a larger part of the input at once. Presumably, if we simultaneously delete a part that had an impact on the model’s prediction and one that didn’t, the final explanation about both of these parts will say that they significantly affected the prediction.

Completeness describes the quality of the explanations in terms of how much of the entire decision-making process and dynamics of the model is captured by the given explanation. We ask to what extent the explanation describes the whole model (or even the system). We want the explanation to tell us not only the truth, but the “whole truth”.

Clarity means that the explanation should not be confusing and should be unambiguous. In addition, for similar inputs, where the model itself made decisions in a similar way (e.g., a model decided for two similarly looking pictures that they belong to the same class), the explanations should also be similar. Otherwise, one will not perceive such explanations as consistent.

The broadness describes to which part of the task and data the explanation can be applied. As an example, we will take a deep convolutional neural network with the task to classify paintings painted in different styles. We want the explainability method to tell us why it classified a particular painting into a particular style. It is possible that exactly the same explainability method will be able to provide a meaningful explanation only for a certain group of paintings. For example, with modern paintings that only consist of basic geometric shapes, the reasoning will include that a painting contained a lot of triangles, squares or circles and no other shapes. On the contrary, for very abstract works (for example, in the style of Jackson Pollock), the method will not be able to provide any meaningful explanation.

We can look at parsimony as an application of the principle of Occam’s razor. As we showed in the example above, the explanation should, from the point of view of understandability, contain only those details that bring some value to the addressee (a human). It should be as simple as possible. If the explanation is in the form of a natural language sentence, the sentence should not contain unnecessary fluff. For example, a better explanation of why the model classified the activity in the image as “cooking” is the sentence “the image shows a man in an apron holding a wooden cooking spoon” rather than the sentence “the image shows an older man with gray hair holding a brown carved cooking spoon, the clock on the wall shows 11 o’clock and there are flowers on the table”.


In this part of our series on explainable artificial intelligence, we looked at what components and properties a good explanation should have.

We have shown that a good explanation should balance understandability and fidelity. It must be understandable to people and at the same time it must faithfully describe the real behavior of the model whose predictions it explains.

Achieving balance between understandability and fidelity is not an easy task, because increasing understandability can simultaneously decrease fidelity and vice versa. At the same time, it depends on the specific task whether we want the explanations to be more understandable, or whether we want them to be very detailed and cover all the details of the decision-making process of the machine learning model.

In the next part of the series, we will describe two families of approaches and examples of specific metrics that can be used to measure the quality of explanations from the perspective of the properties we have described in this article.

The PricewaterhouseCoopers Endowment Fund at the Pontis Foundation supported this project.


[1] ZHOU, Jianlong, et al. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 2021.

[2] MARKUS, Aniek F.; KORS, Jan A.; RIJNBEEK, Peter R. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. Journal of Biomedical Informatics, 2021.

[3] SOKOL, Kacper; FLACH, Peter. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020.

Explainable Artificial Intelligence: From Black Boxes to Transparent Models