https://doi.org/10.1140/epjds/s13688-020-00245-8
Regular article
Estimating educational outcomes from students’ short texts on social media
Institute of Education, National Research University Higher School of Economics, Moscow, Russia
* e-mail: ibsmirnov@hse.ru
Received:
25
June
2020
Accepted:
24
August
2020
Published online:
1
September
2020
Digital traces have become an essential source of data in social sciences because they provide new insights into human behavior and allow studies to be conducted on a larger scale. One particular area of interest is the estimation of various users’ characteristics from their texts on social media. Although it has been established that basic categorical attributes could be effectively predicted from social media posts, the extent to which it applies to more complex continuous characteristics is less understood. In this research, we used data from a nationally representative panel of students to predict their educational outcomes measured by standardized tests from short texts on a popular Russian social networking site VK. We combined unsupervised learning of word embeddings on a large corpus of VK posts with a simple, supervised model trained on individual posts. The resulting model was able to distinguish between posts written by high- and low-performing students with an accuracy of 94%. We then applied the model to reproduce the ranking of 914 high schools from 3 cities and of the 100 largest universities in Russia. We also showed that the same model could predict academic performance from tweets as well as from VK posts. Finally, we explored predictors of high and low academic performance to obtain insights into the factors associated with different educational outcomes.
Key words: Academic performance / Prediction / Social media / Text / Transferability
© The Author(s), 2020