Fine-Tuning Language Models to Mitigate Gender Bias in Sentence Encoders

Dolci, Tommaso

doi:10.1109/BigDataService55688.2022.00036

Language models are used for a variety of downstream applications, such as improving web search results or parsing CVs to identify the best candidate for a job position. At the same time, concern is growing around word and sentence embeddings, popular language models that have been shown to exhibit large amount of social bias. In this work, by leveraging the possibility to further train state-of-the-art pre-trained embedding models, we propose to mitigate gender bias by fine-tuning sentence encoders on a semantic similarity task built around gender stereotype sentences and corresponding gender-swapped anti-stereotypes, in order to enforce similarity between the two categories. We test our intuition on two popular language models, BERT-Base and DistilBERT, and measure the amount of gender bias mitigation using the Sentence Encoder Association Test (SEAT). Our solution shows promising results despite using a small amount of training data, proving that post-processing bias mitigation techniques based on fine-tuning can effectively reduce gender bias in sentence encoders.

Fine-Tuning Language Models to Mitigate Gender Bias in Sentence Encoders

Abstract

BibTeX