https://doi.org/10.1140/epjds/s13688-025-00596-0
Research
Predicting organized collective violence through hostile discourse in social media
1
Facultad de Ingenieria en Electricidad y Computacion, Escuela Superior Politecnica del Litoral, Km 30.5 Perimetral Av., 090112, Guayaquil, Guayas, Ecuador
2
Department of Defense Analysis, Naval Postgraduate School, 1 University circle, 93943, Monterey, CA, USA
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
2
February
2025
Accepted:
25
October
2025
Published online:
9
December
2025
While substantial evidence indicates that patterns of organized collective violence can be influenced through social and political discourse, existing approaches to estimating these effects have often focused on specific actors, geographic regions, and languages, potentially obscuring the more general patterns that underlie efforts to promote collective violence across human societies. To address this, we leverage the capabilities of multilingual language models (MLLMs), combined with a language-agnostic spatio-temporal approach to sampling social media discourse, to train a neural network classifier that learns to detect systematic linguistic precursors to collective violence at global scales. Our findings show that this approach outperforms traditional machine learning models, generating more accurate predictions by leveraging the abilities of MLLMs to generalize across diverse cultural and linguistic contexts. We further demonstrate, using zero-shot natural language inference, that our model has learned to detect semantically meaningful emotional dimensions in social media texts which are systematically linked to the model’s ability to distinguish between the discursive precursors to violent conflict and the discursive consequences of violent conflict. By making all code, data, and trained models publicly available, we aim to provide a foundation for future development of conflict early-warning systems that can operate effectively at global scales.
Key words: Multilingual language models / Violence prediction / Social media / Deep learning / Machine learning
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

