Measuring biases in AI-generated co-authorship networks

Ghazal Kalhor; Shiza Ali; Afra Mashhadi

doi:10.1140/epjds/s13688-025-00555-9

2024 Impact factor 2.5

Open Access

EPJ Data Sci. (2025) 14: 38
https://doi.org/10.1140/epjds/s13688-025-00555-9

Research

Measuring biases in AI-generated co-authorship networks

Ghazal Kalhor¹, Shiza Ali² and Afra Mashhadi²^a

¹ School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
² Computing and Software Systems, University of Washington, Bothell, WA, USA

^a mashhadi@uw.edu

Received: 2 July 2024
Accepted: 29 April 2025
Published online: 19 May 2025

Abstract

Large Language Models (LLMs) have significantly advanced prompt-based information retrieval, yet their potential to reproduce or amplify social biases remains insufficiently understood. In this study, we investigate this issue through the concrete task of reconstructing real-world co-authorship networks of computer science (CS) researchers using two widely used LLMs—GPT-3.5 Turbo and Mixtral 8x7B. This task offers a structured and quantifiable way to evaluate whether LLM-generated scholarly relationships reflect demographic disparities, as co-authorship is a key proxy for collaboration and recognition in academia. We compare the LLM-generated networks to baseline networks derived from DBLP and Google Scholar, employing both statistical and network science approaches to assess biases related to gender and ethnicity. Our findings show that both LLMs tend to produce more accurate co-authorship links for individuals with Asian or White names, particularly among researchers with lower visibility or limited academic impact. While we find no significant gender disparities in accuracy, the models systematically favor generating co-authorship links that overrepresent Asian and White individuals. Additionally, the structural properties of the LLM-generated networks differ from those of the baseline networks. These results highlight the importance of examining how LLMs represent social and scientific relationships, particularly in contexts where they are increasingly used for knowledge discovery and scholarly search.

Key words: Large language models / Co-authorship networks / Computer science / Gender / Ethnicity / Biases in network representation

Handling Editor: Jussara Almeida

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Conference announcements

12 Internat. Congress of the Balkan Physical Union
July 8-12, 2025
Bucharest, Romania

Joint Annual Meeting of ÖPG and SPS
August 18-22, 2025
Wien, Austria

111th Italian National Society Congress
September 22-26, 2025
Palermo, Italy