How artificial intelligence can detect hate speech

Hate speech is not just content that negatively affects the vibe on social networking sites. Since it is aimed at certain groups of the population, it is hurtful and contributes to the polarization of society. In extreme cases, it can also be a serious offense with legal consequences for the originators of hate speech.

In any case, hate speech can be considered as one of the indicators that point to a dubious source of information. Data scientists from Meta also discovered that the anger provoking statements very often contained misinformation as well. At the same time, hate speech detection is problematic in such a huge and ever growing volume of online content. However, its identification would also help identify misinformation and make fact-checking easier, which is our priority within the CEDMO hub.

Artificial intelligence, specifically machine learning, can help with identifying hate speech in a haystack of countless internet posts. However, the artificial intelligence tools are not, and should not be, the ultimate judge that condemns the hate speech author. For example, machine learning methods can help to detect hate speech on Facebook, so that its administrators – people – can intervene and delete the problematic post, or proceed to other measures, such as blocking the hater’s account. In the article Addressing Hate Speech with Data Science, we also pointed out what artificial intelligence can and cannot do.

There are several reasons why artificial intelligence cannot replace the human intelligence of lawyers, judges or other experts from various institutions when it comes to proving a case of harassment. It’s difficult for artificial intelligence to identify evidence in a text that someone’s verbal expression is an attack (based on race, etc.) and incites hatred or violence. We are not able to automatically determine the effect of text on someone else. Such an effect, if it is to be used in a criminal procedure, must be proven by experts.

What we can expect from machine learning

Human intention is difficult to determine and prove – not only automatically, but also by other people. However, some phenomena that are commonly associated with hate speech can help – such as deception, lying, bullshitting, manipulation or misleading.

Thanks to artificial intelligence, we can find elements in texts that are characteristic for hate speech, such as aggressive language. Methods such as sentiment analysis, topic labeling, language detection and conversational intent can be very useful for this purpose.

When we talk about automatic hate speech identification, we are currently referring to hate speech in the form of text. The analysis of hate distributed through images and videos poses an even more challenging problem from the automated processing perspective. Also in this case, metadata and context, such as the authors´ demographics, their location, time or how people engage on social media, can provide a clue. However, these elements are not enough to condemn a hater. They can be used only to help the moderator find hate speech in a sea of content. Like this, the moderator does not have to spend as much time and energy as it would take to read all the posts.

What the process of finding hate speech might look like

For the simplest methods, you can look at lists of words and phrases used in hate speech – e.g. swear words. However, this method is not very effective, because swearing does not always mean hate speech. Sometimes swear words are also used in a friendly tone. And a more sophisticated hater will be careful not to use those words directly.

More complex machine learning models are able to identify hate speech using statistical methods. The first and most important prerequisite is a sufficiently large hate speech dataset, labeled (annotated) by experts. The models will then learn from the data how to detect hate speech in different texts. The models look for repetitive patterns in the data. For example, they notice how often certain words appear in a text. The quality of the text, such as its readability, is also a hint. Former Google manager Eric Smidt said that just a grammar check could help identify hate speech.

Neural networks, that are inspired by the human brain, can also find text features that are invisible to the human eye. The problem, however, tends to be explaining these machine findings to humans. But AI researchers are working on this issue, too. 

You too will be able to help AI identify hate speech

As we’ve already mentioned, the foundation for automatic hate speech identification is quality data that models can learn from. The problem, however, is that there is no widely established reference dataset of hate speech, let alone for the Slovak language.

We have therefore created a tool which Slovak speaking users can use to report hate speech, misinformation or other manipulation. All you need to do is install a browser extension (currently available in Slovak only). Together, we will try to make up for what Facebook intended to deploy (at least locally). In 2018, Facebook did in fact accidentally reveal a button to report hate speech. Unfortunately, this intention never made it to production deployment.