Home
News
Cross-Lingual Learning bridges inequalities among languages

What's
new

Author

Matúš Pikuliak

Sep 6. 2021

Cross-Lingual Learning bridges inequalities among languages

Matúš Pikuliak is our new successfully graduated PhD in KinIT. In his research, he focused on methods that should bridge inequalities among languages and offer useful applications to people and communities that have not yet been able to take full advantage of the progress based on deep learning in their mother tongue.

In recent years, deep learning has brought significant advances in natural language processing. Machine translation, information retrieval, speech transcription and other applications that we commonly use today, have achieved significant improvements thanks to deep learning. Similar to other domains, this progress is dependent on the collection of datasets which enable machine learning models to learn to perform a given task. For various historical, sociological, technological and other reasons, some languages (e.g. English, Mandarin Chinese) are very popular today, are spoken by billions of people and abundant artificial intelligence research is carried out with them. On the other hand, there are thousands of marginal languages, they are spoken by a much lower number of people and the new texts are generated in much lower volume in comparison with other languages. The use of artificial intelligence approaches is often much more complicated and sometimes impossible, especially because there are no large enough datasets for these languages.

Matúš is our last doctoral student recruited, who has recently defended his PhD degree. In his research, he focused on methods that should bridge inequalities among languages and offer useful applications to people and communities that have not yet been able to take full advantage of the progress based on deep learning in their mother tongue. He explored the possibilities of the so-called cross-lingual learning. It is a machine learning paradigm where knowledge and data created in one language are transferred to other languages, using various techniques. For example, if artificial intelligence learns to solve the problem of information retrieval in English, the aim of these techniques is to transfer this knowledge to other languages so that we can also search for information in them.

The image depicts a grid of 4 neural models. Each model solves a certain task in a certain language. The color scheme shows the layers that are shared by the models. Since all models are interconnected, they can communicate and exchange knowledge.

One of the very first literature reviews in this area was part of Matúš’s wor k and it was published in the journal Expert Systems With Application (in Google Scholar it is a journal from the Artificial Intelligence section with the highest h-index). The basis of this review was mapping of the approaches in cross-lingual learning used so far and more than 170 existing works were analyzed. A thorough analysis of the languages, datasets, tasks and approaches used, provided an insight into the state of the area, which we had lacked by then. Cross-lingual learning has proven to be a very versatile technology, applicable to any natural language processing task. Also, the dominance of major European languages in experimental work has been proven. Despite the fact that one of the main reasons for cross-lingual learning is to help languages with a lack of data, experimental verification is often carried out in languages that may not even need such help. Matúš also contributed to the theoretical understanding of such learning by formally defining it and proposing a division of existing approaches according to the type of knowledge used to communicate between different languages. In this way, sample annotations, sample features, parameter rates, and internal model representations can spread between languages.

Based on this analysis of the state of the area, Matúš then focused on the combination of multilingual and multitask learning. It was common to study these two approaches separately. Here, Matúš decided to combine them and explore the possibilities and limitations of such a connection. The experiments were carried out with the so-called grid of models. Each model had a specific task defined in a specific language and together they covered all possible combinations of the selected 4 tasks and 4 languages. Models in such a grid have learnt to solve a problem based on their data. At the same time, they also learned from each other through a mechanism called parameter sharing. The results of the experiments were discoveries in how to combine data from different languages properly, in order to maximize the performance of models in less studied languages.

The research has the potential to be applied in practice in the development of methods and models in “poor” languages. Slovak language can also be included in these “poor” languages in some cases. By transferring knowledge from rich languages using the proposed methods, we can improve the results for these languages. These technologies are already helping people today, public machine translators (e. g. Google Translate) already use such techniques and translations into and from less common languages are becoming better thanks to the transfer of knowledge. This is another step in the ongoing democratization of artificial intelligence and information technology in general. In addition, working with multiple languages is proving to be an interesting challenge for other KInIT activities, where we can use similar techniques in our existing projects focused on e. g. combating misinformation or in various research projects of our partners.

Selection from Matúš’s publications:

1. Pikuliak, M., Šimko, M., & Bielikova, M. (2021). Cross-lingual learning for text processing: A survey. Expert Systems with Applications, 165, 113765.

2. Pikuliak, M., Simko, M., & Bielikova, M. (2019, January). Towards combining multitask and multilingual learning. In International Conference on Current Trends in Theory and Practice of Informatics (pp. 435-446). Springer, Cham.

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

What's
new

Matúš Pikuliak

Cross-Lingual Learning bridges inequalities among languages

Selection from Matúš’s publications:

Cooperation with SAS yields results: The best doctoral thesis at SAS with the contribution from KInIT

Diacritics Restorer: Automatic diacritics restoration for your online documents in Slovak

Why partner with KInIT

What'snew

Matúš Pikuliak

Cross-Lingual Learning bridges inequalities among languages

Selection from Matúš’s publications:

Read more

Cooperation with SAS yields results: The best doctoral thesis at SAS with the contribution from KInIT

Diacritics Restorer: Automatic diacritics restoration for your online documents in Slovak

Why partner with KInIT

What's
new