Fine-Tuning Language Models to Mitigate Gender Bias in Sentence Encoders

Tommaso Dolci

IEEE International Conference on Big Data Computing Service and Applications, 2022, pp. 175-176.

Abstract

Language models are used for a variety of downstream applications, such as improving web search results or parsing CVs to identify the best candidate for a job position. At the same time, concern is growing around word and sentence embeddings, popular language models that have been shown to exhibit large amount of social bias. In this work, by leveraging the possibility to further train state-of-the-art pre-trained embedding models, we propose to mitigate gender bias by fine-tuning sentence encoders on a semantic similarity task built around gender stereotype sentences and corresponding gender-swapped anti-stereotypes, in order to enforce similarity between the two categories. We test our intuition on two popular language models, BERT-Base and DistilBERT, and measure the amount of gender bias mitigation using the Sentence Encoder Association Test (SEAT). Our solution shows promising results despite using a small amount of training data, proving that post-processing bias mitigation techniques based on fine-tuning can effectively reduce gender bias in sentence encoders.

Download full text (PDF)

DOI: 10.1109/BigDataService55688.2022.00036

BibTeX

  @inproceedings{dolci2022fine,
    title={Fine-Tuning Language Models to Mitigate Gender Bias in Sentence Encoders},
    author={Dolci, Tommaso},
    booktitle={IEEE International Conference on Big Data Computing Service and Applications},
    organization={IEEE},
    pages={175--176},
    year={2022},
    doi={10.1109/BigDataService55688.2022.00036}
  }