https://doi.org/10.1140/epjds/s13688-025-00594-2
Research
The news in black and white: word embeddings quantify racism in South African news
1
Centre for Applied Data Science, University of Johannesburg, Johannesburg, South Africa
2
Department of Psychology, University of Johannesburg, Johannesburg, South Africa
3
Institute for Artificial Intelligence Systems, University of Johannesburg, Johannesburg, South Africa
a
224032657@student.uj.ac.za
b
kevind@uj.ac.za
Received:
13
February
2025
Accepted:
7
October
2025
Published online:
27
November
2025
Does race bias manifest in South African news, and how can computational methods like word embeddings reveal it? After apartheid’s end in 1994, South Africa implemented policies to address racial and economic divides and transform institutions and structures, including the news media. This study introduces a computational approach to quantify race bias in South African news using neural embeddings. We trained word2vec word embeddings on COVID-19 vaccination news articles from 76 South African news sources. These large-scale embeddings are unbiased by design but can detect and reveal hidden biases in language. We found consistent race bias in the coverage of socioeconomic phenomena, while health results were weaker, mixed and likely corpus-dependent. COVID-19 may have also amplified associations between “Black” and unhealthy terms in news coverage. Our methodology complements traditional qualitative techniques and allows for a more objective and representative way of investigating racism in South African news. Findings are validated through multiple methods, including human ratings, and have implications for South African news and this research field.
Key words: Word embedding / Race bias / News media / South Africa / COVID-19 vaccination / Speaker names / Computational social science / Natural language processing
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-025-00594-2.
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

