https://doi.org/10.1140/epjds/s13688-025-00573-7
Research
Bibliometric cartography of data science: a large-scale analysis on knowledge integration and diffusion
1
Department of Information Management, Peking University, 5 Yiheyuan Road, Haidian District, 100871, Beijing, China
2
Center for Information Management and Informationalization Research, Peking University, 5 Yiheyuan Road, Haidian District, 100871, Beijing, China
3
Center for Digital Intelligence Science and Education Research, Peking University Chongqing Research Institute of Big Data, 10 Science Valley, High-tech Zone, 400031, Chongqing, China
a
huangwb@pku.edu.cn
b
buyi@pku.edu.cn
Received:
13
February
2025
Accepted:
9
July
2025
Published online:
23
July
2025
As widely discussed, data science has been explored and applied in many scenarios. However, little has been known about its knowledge integration and diffusion patterns, particularly its interdisciplinary dynamics. Here, we adopt a citation-based strategy to define the scope of data science and utilize bibliometric methods to map its “cartography” from the perspectives of publications and scholars. Our analysis reveals that, over the past four decades, data science has increasingly integrated a diverse range of knowledge sources, albeit with a pronounced dependence on a limited number of dominant fields. Its knowledge diffusion extends across an expanding spectrum of disciplines, exerting significant cross-domain influence. The diffusion process is driven by both viral and broadcasting adopters, with the former facilitating early-stage dissemination, although a “damping” effect emerges as dissemination efficiency gradually declines. Notably, the interdisciplinary nature of data science is twofold: it not only synthesizes technical innovations but also permeates multiple applied domains, where its application-driven orientation is reflected in domain-specific methodologies. Additionally, we identify the emergence of governance attributes, an increasingly salient characteristic of data science that warrants greater attention. These findings underscore the dual role of data science as both a catalyst for interdisciplinary convergence and a practice-oriented discipline deeply embedded in real-world challenges. Ultimately, data science fosters rapid innovation and facilitates broad knowledge dissemination across academic and practical domains.
Key words: Data science / Knowledge integration / Knowledge diffusion
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-025-00573-7.
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.