https://doi.org/10.1140/epjds/s13688-025-00541-1
Research
Whose voice matters? Word embeddings reveal identity bias in news quotes
1
Centre for Applied Data Science, University of Johannesburg, Johannesburg, South Africa
2
Department of Psychology, University of Johannesburg, Johannesburg, South Africa
3
Institute for Artificial Intelligence Systems, University of Johannesburg, Johannesburg, South Africa
Received:
2
November
2024
Accepted:
12
March
2025
Published online:
17
April
2025
This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa’s apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes – journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.
Key words: Word embedding / Race bias / Gender bias / COVID-19 vaccination / News media / South Africa
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-025-00541-1.
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.