Web & User Data Processing

Publication

Authors Pikuliak, M., Srba, I., Moro, R., Hromadka, T., Smoleň, T., Melišek, M., Vykopal, I., Simko, J., Podroužek, J., Bielikova, M.

Published in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing - EMNLP 2023

Download Download publication

Official Link

Multilingual Previously Fact-Checked Claim Retrieval

Pikuliak, M., Srba, I., Moro, R., Hromadka, T., Smoleň, T., Melišek, M., Vykopal, I., Simko, S., Podrouzek, J., and Bielikova, M.

Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper introduces a new multilingual dataset for previously fact-checked claim retrieval. We collected 28k posts in 27 languages from social media, 206k fact-checks in 39 languages written by professional fact-checkers, as well as 31k connections between these two groups. This is the most extensive and the most linguistically diverse dataset of this kind to date. We evaluated how different unsupervised methods fare on this dataset and its various dimensions. We show that evaluating such a diverse dataset has its complexities and proper care needs to be taken before interpreting the results. We also evaluated a supervised fine-tuning approach, improving upon the unsupervised method significantly.

Cite: Pikuliak, M., Srba, I., Moro, R., Hromadka, T., Smoleň, T., Melišek, M., Vykopal, I., Simko, S., Podrouzek, J., and Bielikova, M..2023. Multilingual Previously Fact-Checked Claim Retrieval. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16477–16500, Singapore. Association for Computational Linguistics. DOI: 10.18653/v1/2023.emnlp-main.1027.

Authors

Matúš Pikuliak

Research Consultant

Ivan Srba

Researcher

Róbert Móro

Researcher

Timo Hromádka

Research Intern 07/2022-01/2023

Timotej Smoleň

Research Intern 06/2022-04/2023

Martin Melišek

Research Intern 03/2022-05/2024

Ivan Vykopal

PhD Student

Jakub Šimko

Lead and Researcher

Juraj Podroužek

Lead and Researcher

Mária Bieliková

Lead and Researcher

Web & User Data Processing

Multilingual Previously Fact-Checked Claim Retrieval

Authors

Matúš Pikuliak

Ivan Srba

Róbert Móro

Timo Hromádka

Timotej Smoleň

Martin Melišek

Ivan Vykopal

Jakub Šimko

Juraj Podroužek

Mária Bieliková

Why partner with KInIT