Challenges when identifying migration from geo-located Twitter data
School of Computer Science, McGill University, Montréal, Canada
2 Department of Geography, Singapore University of Technology and Design, Singapore, Singapore
3 Department of Geography, University of Kentucky, Lexington, United States
4 Department of Sociology, McGill University, Montréal, Canada
Accepted: 18 November 2020
Published online: 7 January 2021
Given the challenges in collecting up-to-date, comparable data on migrant populations the potential of digital trace data to study migration and migrants has sparked considerable interest among researchers and policy makers. In this paper we assess the reliability of one such data source that is heavily used within the research community: geolocated tweets. We assess strategies used in previous work to identify migrants based on their geolocation histories. We apply these approaches to infer the travel history of a set of Twitter users who regularly posted geolocated tweets between July 2012 and June 2015. In a second step we hand-code the entire tweet histories of a subset of the accounts identified as migrants by these methods. Upon close inspection very few of the accounts that are classified as migrants appear to be migrants in any conventional sense or international students. Rather we find these approaches identify other highly mobile populations such as frequent business or leisure travellers, or people who might best be described as “transnationals”. For demographic research that draws on this kind of data to generate estimates of migration flows this high mis-classification rate implies that findings are likely sensitive to the adjustment model used. For most research trying to use these data to study migrant populations, the data will be of limited utility. We suspect that increasing the correct classification rate substantially will not be easy and may introduce other biases.
Key words: Migration / Twitter / Global human mobility
© The Author(s) 2020
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.