https://doi.org/10.1140/epjds/s13688-026-00641-6
Research
Information loss in aggregated social media discussions: studying the migration discourse within Europe
L3S Research Center, Leibniz University Hannover, Hannover, Germany
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
29
July
2025
Accepted:
10
March
2026
Published online:
23
March
2026
Abstract
The proliferation of Social Media (SM) platforms has reshaped the public discourse and opinion formation landscape, offering rich sources of user-generated content for socio-economic and political studies. With vast amounts of digital records for public discussions, studies may include datasets with millions of documents and sources spanning multiple regions. This goes along with the need for data aggregation so that the results can be described and interpreted by human analysts. This paper addresses some challenges of effectively aggregating views expressed in SM discussions. We emphasize the need to carefully consider the appropriate aggregation level to avoid unwanted information loss and enhance interpretability. Leveraging measures from information theory, we focus on assessing and comparing geo-aggregations that capture public sentiment on the contentious issue of migration in Europe. Unlike previous studies, our approach centers on SM arguments, analyzing the aggregation of sources focused on migration rather than target area aggregations by mentions in the media. Through geotagging of posts and user profiles, we gain insights into user sentiment distributions across different zones. Furthermore, we contrast different levels of policy-driven regional aggregations against data-driven clustering techniques that represent information more compactly while preserving its underlying distribution. Our intra-country analysis can provide nuanced insights into local mood homogeneity, enabling decision-makers to tailor interventions to specific communities. This study contributes to a more comprehensive understanding of public perceptions of controversial issues at a regional level (e.g., the EU), emphasizing the importance of accurate and context-aware data aggregation.
Key words: Information loss / Aggregation bias / Social media analysis / Sentiment analysis / Migration discourse
Handling Editor: Daniel Romero
© The Author(s) 2026
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

