https://doi.org/10.1140/epjds/s13688-026-00651-4
Research
Large scale statistically validated comorbidity networks
1
IFISC, Instituto de Fisica Interdisciplinar y Sistemas Complejos (CSIC-UIB), Campus Universitat de les Illes Balears, 07122, Palma de Mallorca, Spain
2
Auria Biobank, Turku, Finland
3
Dipartimento di Fisica e Chimica Emilio Segrè, Università degli Studi di Palermo, Palermo, Italy
4
Department of Physics and Astronomy, University of Turku, Turku, Finland
5
Complexity Science Hub, Vienna, Austria
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
27
August
2025
Accepted:
23
March
2026
Published online:
14
April
2026
Abstract
We obtain comorbidity networks starting from medical information stored in electronic health records collected by the Wellbeing Services County of Southwest Finland (Varha). Based on the data, we connect each patient to one or more diseases and construct complex comorbidity networks associated with large patient cohorts characterized by an age interval and sex. The information about diseases in electronic health records is coded using the highest granularity present in the international classification of diseases (ICD codes) provided by the World Health Organization. We statistically validate links in each cohort’s comorbidity network and furthermore partition the networks into communities of diseases. These are characterized by the over-expression of a few disease categories, and communities from different age or sex cohorts show various similarities in terms of these disease classes. Moreover, the detected communities for all the cohorts can be organized into a hierarchical tree. This allows us to observe a number of clusters of communities — originating from diverse age and sex cohorts — that group together communities characterized by the same disease classes. We also perform a dismantling procedure of statistically validated comorbidity networks to highlight those categories of diseases that are most responsible for the compactedness of the comorbidity networks for a given cohort of patients.
Key words: Electronic Health Records / Comorbidity / Complex networks / Statistically Validated Networks
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-026-00651-4.
Handling Editor: Eugenio Valdano
© The Author(s) 2026
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

