https://doi.org/10.1140/epjds/s13688-025-00578-2
Research
Classifying social position with social media behavioral data
1
MTA–TK Lendület “Momentum” Digital Social Science Research Group for Social Stratification, HUN-REN Centre for Social Sciences, Tóth Kálmán utca 4, 1097, Budapest, Hungary
2
Department of Social Research Methodology, Faculty of Social Sciences, ELTE Eötvös Loránd University, Pázmány Péter sétány 1/A, 1117, Budapest, Hungary
3
Department of Statistics, Faculty of Social Sciences, ELTE Eötvös Loránd University, Pázmány Péter sétány 1/A, 1117, Budapest, Hungary
4
CSS-RECENS, HUN-REN Centre for Social Sciences, Tóth Kálmán 13 utca 4, 1097, Budapest, Hungary
5
Department of Sociology, Faculty of Social Sciences, ELTE Eötvös Loránd University, Pázmány Péter sétány 1/A, 1117, Budapest, Hungary
6
Doctoral School of Demography and Sociology, University of Pécs, Ifjúság útja 6, 7624, Pécs, Hungary
7
Institute for Sociology, HUN-REN Centre for Social Sciences, Tóth Kálmán 13 utca 4, 1097, Budapest, Hungary
Received:
19
January
2025
Accepted:
21
July
2025
Published online:
15
August
2025
The main question of our study is how far social position can be predicted solely based on digital behavior. The phenomenon that offline inequalities are reflected in the digital space has been heavily researched since the digital revolution. Nevertheless, there are few data, which both measure social inequalities and digital behavior: scientists either have information on the social status of people, or on their observed digital behavior, but not on both. When analyzing digital behavioral data, however large scale it is, information on the social position of the users is hardly available. In the current paper, we analyze a special dataset collected with a data donation technique, which contains information on both the social position and the observed digital behavior of participants, and which is representative for the internet user population of Hungary. In the analysis, using diverse models, we explored how well basic indicators measuring digital behavior on Facebook can classify users’ social class measured by the 5-category version of the European Socio-economic Classification (ESeC). The results show that based on basic quantitative indicators of digital behavior and usage the models cannot classify users’ social position with a high degree neither in the classification of social class, nor in the case of socio-economic status. Nevertheless, the inclusion of socio-demographic characteristics as features increased the predictive power of the models, that could differentiate between the lowest and highest social position with a high degree. The models based on purely observed digital behavior could identify those in the lowest social position with the highest performance. Among those features, that played an important role in this classification, usage time, frequency network size and language characteristics (especially the diversity of the used language and punctuation) should be highlighted, while diverse Facebook activities and detected interest categories also played a role. These results are in line with the results of previous studies derived from smaller-scale, non-representative, or self-reported survey-based data on the same topic.
Key words: Observed digital behavior / Social position / Social inequalities / Social media / XGBoost / Classification
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1140/epjds/s13688-025-00578-2.
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

