Natural Language Processing

Publication

Authors Pikuliak, M., Hrckova, A., Oresko, S., Šimko, M.

Published in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

We present GEST — a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender experts. We used GEST to evaluate English and Slavic masked LMs, English generative LMs, and machine translation systems. We discovered significant and consistent amounts of gender-stereotypical reasoning in almost all the evaluated models and languages. Our experiments confirm the previously postulated hypothesis that the larger the model, the more stereotypical it usually is.

Cite: Pikuliak, M., Hrckova, A., Oresko, S., & Šimko, M. (2023). Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. DOI: 10.18653/v1/2024.findings-emnlp.173.

Authors

Matúš Pikuliak

Research Consultant

Andrea Hrčková

Researcher

Štefan Oreško

Researcher 03/2022-10/2024

Marián Šimko

Lead and Researcher