E-tika podcast: How to deal with biased AI?

The second season of the E-tika podcast series is over but we decided to reflect back on the topics we discussed. Our goal is to bring the social and ethical dimensions of digital technologies into the spotlight and to discuss these with guests from various relevant fields.

When it comes to critical understanding of algorithms and artificial intelligence systems, there are certain terms that seem to be constantly reappearing. Black-boxes, transparency, explainability, fairness and biases are often mentioned. But what do these notions entail? 

In the second episode of the E-tika podcast these issues have been discussed with Martin Tamajka. Martin is a researcher and he focuses on natural language processing and transparent models of artificial intelligence. 

The term bias can be understood in many ways, but in general it tends to carry a negative connotation. This however does not necessarily have to be true when it comes to e.g. neural networks that are often used in artificial intelligence. In this case, bias is an integral part of its functionality. 

When it comes to undesired or unjust biases, we tend to think about algorithms systematically preferring certain groups of people without any good reason to do so. As discussed previously, humans are rarely bias-free and we use certain biases to our advantage.

This begs the question whether it is even technically possible for AI to be completely fair. The short answer could easily be no. This is particularly true for algorithms that learn from data, which can already carry certain biases. 

It is important to keep in mind that perceived fairness is subjective and highly context dependent. However, technically speaking there are ways for us to ensure that algorithms deliver more fair results. 

Fairness can be understood in many ways; e.g. distributive fairness that looks at perceived fairness of one’s outcomes or procedural fairness that is concerned with the fair procedures rather than the outcomes. These approaches have their own advantages and flaws, but what is more important is that people, who design and work with algorithms, consciously and actively reflect on these definitions and their respective consequences. This is however only part of the effort to make algorithms fair. As discussed in the podcast:

“The best way to minimize the probability that an algorithm will act unfairly is prevention. Literature often mentions careful data sourcing, but there is one more important thing – the people developing the algorithms. It has been shown that it is beneficial when the teams of developers and testers include a wide spectrum of people – be it their gender, age, race, beliefs or profession.”

In order to even discuss the fairness of algorithms, it seems crucial to first understand what happens within these algorithms and how they arrive at their decisions. To what extent are we, as humans, actually able to truly understand artificial intelligence? 

There are relatively simple rule-based systems of which we can speak as transparent. On the other hand, there are much more complex systems that we often call black-boxes. When trying to understand the inner workings of these, we often stumble upon terms such as explainability, or interpretability

When consulting the state of the art literature in these fields, we realize that there is rarely a unified definition of these terms. This just underlines the complexity of the issue where it is not uncommon for even the developers to not fully understand their own algorithms and the reasons why they deliver certain outcomes. 

This can happen when models find certain shortcuts that we do not fully understand. A good example of this is a computer vision model that was trained to classify photos – cars, trucks, horses etc. It was delivering satisfactory results, but it was later found that the photos of horses usually had watermarks over them, thus the model has learned to identify watermarks, rather than horses. In this case, the explanation could be achieved fairly easily, e.g. by highlighting the parts of the image that were decisive in classifying it. 

Similarly, an approach to explainability can be used in models that identify sentiment in text, where explainability could be provided by highlighting which parts of the texts contributed the most in arriving at the conclusion. 

Such explanations are relatively straightforward and easily understood by humans. Thus, explainability refers to the particular outcome of the model. In other words, we want to understand which were the input data that the model has used to make its decision. 

On the other hand, when addressing interpretability we try to understand what are the kinds of knowledge and concepts that the model has learned to use and how does it imagine them. In other words, we can ask the model to generate what it understands to be “a horse”.

At the same time, explainability and interpretability are always context dependent and we need to keep in mind the targeted audiences and level of expertise. It is proving beneficial when the target audiences are part of the development of the explainability and interpretability strategies. 

Let us take an example from Martin’s own research experience. With a research team, they devised a tool that analyzed MRI scans and could predict the probability that the patient has neurodegenerative disease. They highlighted the areas of the scan that the model used to make a decision and showed this to a doctor. The doctor could understand the final prediction of the disease probability, however he could not understand the precise explanation provided. This demonstrates the need to tailor explanations to specific users of the systems. 

Explainability of algorithms can also lead to unintended consequences. This can lead to certain ethical dilemmas about where to draw the red lines of explainability. 

There are systems such as those used in criminal justice systems or in various social security institutions where the knowledge of the precise inner workings of the algorithm could lead to misleading and fraudulent changes in behavior in order to arrive at unethical ends. 

It is clear that these are highly complex issues that must be addressed not only by the developers, but must include broader inclusion of the various stakeholders involved in the development of these systems but also those most affected. 

At KInIT, we try our best to proactively seek and identify the risks associated with the use of such technologies which can lead to more fair and transparent artificial intelligence systems.