ČSOB: Enabling a virtual assistant to interact with clients in a human way
During this cooperation with the ČSOB bank, we focused on research and development of methods for transformation of Slovak words between different forms. It is necessary for a smooth conversation between clients of the bank and their virtual assistant Kate. The result of this collaboration is a set of tools available via API, which transform the words used in conversation into appropriate and correct form, e.g.: lemmatization, declension, verb tenses, gradation or conversion of digits to numerals.
In this project, we looked at the possibility of improving virtual assistants for the Slovak language so that they can be used for interactions between this software system and bank clients.
The Slovak language is a highly inflected language. It means that if we want to make a comprehensible sentence, we have to transform the words into the correct forms. Also, when we want to identify a word used in a sentence, we usually need to correctly determine the basic form (lemma) of the word.
Slovak language is also a low-resourced language, so finding tools or datasets for processing of Slovak text is difficult. Therefore, text processing in Slovak is often more demanding compared to other languages.
If we want to correctly interpret the request written by the user and then respond to this request in natural human language, we need to identify words and their forms. The number of word forms in Slovak is huge. For example, some adjectives can have up to 156 different forms, depending on the position in the sentence and the context of the word. When choosing the correct form of the word, we must take into account various parameters of part-of-speech tagging, especially number (singular, plural), gender, case, tense, person, degree, etc. All these grammar aspects greatly complicate the development and use of communication tools such as chatbots or virtual assistants.
During our collaboration, we focused on research and implementation of tools that would help virtual assistants to conduct conversations in a pleasant human-like manner. Therefore, it was necessary to create a set of transformation functions that can be used to convert words into the correct forms.
The goal of the project was to design three transformation functions for changing the forms of words:
- lemmatization of a word: changing any word form to its basic form
- word transformation: changing the word in its basic form to another form depending on the required attributes
- conversion of numbers to numerals for easier readability
We started with basic Slovak dictionary words (grammatically correct words used in communication). However, people often omit diacritics in Slovak when writing or make some typing mistakes. This is the reason why we also equipped the tools with the option of turning diacritics on and off, depending on the need for the given purpose.
In addition to common Slovak words, which can be processed on the basis of dictionaries, we have also extended the functionality of predicting the forms of words that do not belong to the common vocabulary, i.e. the names of named entities (first names, surnames, localities or organization names), and also words that are part of common Slovak dictionaries. Their forms are calculated based on similar words.
Finally, we created a REST API for these implemented services and packaged the whole application into docker image so that it could be easily deployed and scaled as needed.
We are confident that by collaborating with KIniIT on a shared project, we can achieve further enhancements in the processing of the Slovak language. This progress should provide all Slovaks with easier access to the latest language processing technologies.
Martin Hurban
Data Science Team Lead, ČSOB
Project team
Miroslav Blšták
Research Engineer