Explainable AI: theory and a method for finding good explanations for (not only) NLP

In two phases of this project, we addressed the problem of finding a good post-hoc explainability algorithm for the task at hand. First, we researched the theory behind what’s a good explanation. Then, we proposed the concept of AutoXAI for finding a well performing explanation algorithm for a combination of model, task and data. We conducted a series of experiments on three different tasks with a particular explainability algorithm – Layer-wise relevance propagation (LRP).

From the perspective of machine learning (ML), we live in happy times. For many tasks we know not one, but many different ML algorithms or models we can select from and achieve at least a decent performance. This wealth of models and their variations introduces a challenge – we need to find such configuration that fits our task and data. 

To find the right model, we need to define the criteria that measure how well a particular model and its parameters and hyperparameters fit the problem at hand. Then, we usually do some kind of hyperparameter optimization or Automated Machine Learning (AutoML) [1].

In recent years, the number of post-hoc XAI methods became similarly overwhelming like the number of different machine learning methods. To find a post-hoc explainability algorithm that provides good explanations for the task at hand, we can borrow the concepts from AutoML. Like in AutoML, we have a space of available algorithms and their configurations, and we want to find the one that provides good explanations. The challenging part of AutoXAI is how to compare different explainability algorithms. In other words – what’s a good explanation?

According to multiple authors, a good explanation should balance between two properties – it should faithfully describe a model’s behavior and be understandable for humans.

Figure 1: A good explanation should balance between understandability and fidelity. This picture depicts two explanations in the form of a heatmap generated for the same prediction – the model classified the image as “parrot”. In the top picture, the explanation highlights a limited number of well bounded regions. These regions are, according to the explanation, responsible for the prediction of the model. The explanation looks pleasing, but, in fact, the prediction was significantly influenced by one more region. On the other hand, the explanation below might better describe behavior of the model, but it’s overwhelming.

We proposed a definition of AutoXAI as an optimization problem. Through optimization, we want to find an explainability algorithm that maximizes two sets of criteria – understandability and fidelity. These criteria measure the quality of explanations with respect to the underlying model and data. 

The first set of criteria, understandability, measures how similar the explanations generated by the explainability algorithm for predictions made by the model are to the explanations that the user considers understandable. 

The second set of criteria, fidelity, ensure that the explanations truly reflect the decision-making process of the model.

Figure 2: AutoXAI as an optimization problem. We want to find a configuration of an explanation algorithm that provides both understandable and faithful explanations for the problem at hand.

We conducted three experiments on three different classification tasks. In two tasks, we classified images from magnetic resonance as either healthy or not. In the last task, we classified sentiment of short textual reviews. For these we wanted to find a configuration of a particular explainability algorithm – Layerwise relevance propagation. We proposed three understandability measures that were maximized by using a modified Particle Swarm Optimization in order to obtain understandable explanations. 

The results of the proposed method and of the project were presented at the Workshop on Explainable Artificial Intelligence at the International joint conference on artificial intelligence (IJCAI) 2022 in Vienna and at a public seminar organized by KInIT. Proceedings from the conference workshop can be found here.

Popularization of Explainable AI

According to the number of papers related to Explainable AI published in recent years, it is clear that this topic drew attention in the scientific community. However, popularization and promotion of Explainable AI in the industry and general public is equally important.

Based on knowledge acquired in our own research and study of relevant scientific literature, we prepared a series of five popularization articles. We covered various topics, from general description of Explainable AI to measuring quality of explanations obtained by using different methods. 

Selected aspects of Explainable AI were presented in a technical talk at the Better AI Meetup on November 9th, 2022. You can watch the recording here:


We already know that artificial intelligence and especially machine learning is able to achieve excellent performance in many tasks. An important question is how to make the decision making process and predictions of inherently complex and black-box models more transparent? And, secondly, how do we choose the right explainability method for the task at hand from among the plethora of existing methods?

Martin Tamajka, Researcher

Kempelen Institute of Intelligent Technologies

Project team

Martin Tamajka
Research Engineer
Marcel Veselý
Research Intern
Marián Šimko
Lead and Researcher

The PricewaterhouseCoopers Endowment Fund at the Pontis Foundation supported this project.


[1] Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. Automated machine learning: methods, systems, challenges. Springer Nature, 2019.