https://doi.org/10.1140/epjds/s13688-026-00636-3
Research
Quantitative thematic diversification and evolution of classical science fiction in the public domain based on complex network analysis and natural language processing
1
Graduate School of Culture Technology, Korea Advanced Institute for Science & Technology, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea
2
School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegi-ro, Dongdaemun-gu, 02455, Seoul, Republic of Korea
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
6
August
2025
Accepted:
27
February
2026
Published online:
16
March
2026
Abstract
Science Fiction (SF) is a young but thriving modern literary genre characterized by vivid portrayals of alternate worlds featuring advanced science and technology, often distant in space and time from ours. Establishing captivating, fantastical, and futuristic environments for imaginative storytelling constantly required profound imagination and innovative thinking of the genre’s creators, propelling the young literary genre to evolve and grow in a relatively short time into one that now exerts a strong influence even outside its original realm of literature, such as cinema and television. As creative works in the written form, the texts of SF novels are the primary source for understanding their nature and characteristics, including the developmental history. In this paper we study in detail how SF has evolved since its beginning via two quantitative techniques, computational linguistics (natural language processing) and network science, jointly applied to a comprehensive data of classical SF in the public domain. We find that the network constructed between the texts based on linguistic similarity enables us to detect the emergence and evolution of distinct themes across different generations, and that the most important events in the history of SF strongly correlate with moments of rapid growth in the genre’s thematic diversity. This shows the necessity of a continuous infusion of new ideas and the resulting elevation in diversity for the success and growth of a creative genre. Also, in this age of a strengthening interest in the scientific understanding of human creativity and machine intelligence, this work represents a contribution to one of the most promising yet underrepresented topics in human-centered data science.
Key words: Science fiction / Complex network analysis / Computational linguistics / Thematic diversity
Handling Editor: Anna Sapienza
© The Author(s) 2026
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

