https://doi.org/10.1140/epjds/s13688-026-00628-3
Research
A data-driven approach to supporting fact-checking and mitigating misinformation and disinformation through domain quality evaluation
1
Fondazione Bruno Kessler, Trento, Italy
2
University of Dundee, Dundee, UK
3
University of Trento, Trento, Italy
4
Centre for Sociology of Humans and Machines (SOHAM), Trinity College Dublin, 3-5 Foster Place, D02 YT92, Dublin, Ireland
5
Departamento de Matemáticas, Grupo Interdisciplinar de Sistemas Complejos, Universidad Carlos III de Madrid, Leganés, Spain
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
21
July
2025
Accepted:
4
February
2026
Published online:
9
March
2026
Abstract
Misinformation and disinformation spread rapidly on social media, threatening public discourse, democratic processes, and social cohesion. One promising strategy to address these challenges is to evaluate the trustworthiness of entire domains (source websites) as a proxy for content credibility. This approach demands methods that are both scalable and data-driven. However, current solutions such as NewsGuard and Media Bias/Fact Check (MBFC) rely on expert assessments, cover only a limited number of domains, and some (e.g., NewsGuard) require paid subscriptions. These constraints limit their usefulness for large-scale research. This study introduces a machine-learning-based system designed to assess the quality and trustworthiness of websites. We propose a data-driven approach that leverages a large dataset of expert-rated domains to predict credibility scores for previously unseen domains using domain-level features. Our supervised regression model achieves moderate performance on test data and high performance on independent datasets, highlighting its ability to generalize to unseen domains. Using feature importance analysis, we found that PageRank-based features provided the greatest reduction in prediction error, suggesting that link-based indicators play a central role in domain trustworthiness. The solution’s scalable design accommodates the continuously evolving nature of online content, ensuring that evaluations remain timely and relevant. The framework enables continuous assessment of thousands of domains with minimal manual effort. This capability allows stakeholders (social media platforms, media monitoring organizations, content moderators, and researchers) to allocate resources more efficiently, prioritize verification efforts, and reduce exposure to questionable sources.
Key words: Domain trustworthiness assessment / Fact-checking algorithms / Misinformation and disinformation mitigation / Machine learning for credibility analysis
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-026-00628-3.
Handling Editor: Diogo Pacheco
© The Author(s) 2026
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

