EPJ Data Science Highlight - Gaining historical and international relations insights from social media
- Published on 17 October 2017
As more and more people get their news from social media platforms, these become hosts to vast amounts of information on human behavior in relation to real-time events around the world. In a study published in EPJ Data Science, Vanessa Peña-Araya and team successfully match geopolitical interactions obtained from Twitter activity with real-world historical international relations.
(Guest post by Vanessa Peña-Araya, Mauricio Quezada, Denis Parra and Barbara Poblete, originally published on the SpringerOpen blog
Online social media platforms, like Twitter, Sina Weibo, or Facebook, have become very popular in recent years. They are primarily used to share personal experiences and to keep in touch with friends. Nevertheless, many users turn to these platforms as reliable sources to find real-time information about world events, such as the Ukrainian Crisis or recent natural disasters. In particular, Twitter has become one of the prefered sources on the Web for breaking news updates
Scientists have become quite interested in how humans interact in social media, studying subjects such as human behavior and social trends. For example, prior investigations have studied sentiment expressions of users to particular news events, as well as how information propagates around the world.
In our EPJ Data Science study, we focused on the geopolitical relationships (i.e., international relations) that can be derived from the information about real-world events that people share in Twitter. Specifically, we define two geographical components of an event: protagonist locations and interested locations.
The first refers to the geographical locations that are mentioned as being part of the information of the event (e.g. where a certain news occurred); the second, refers to geographical locations from where social media users that comment on the event are physically located. For example, the news of a soccer match between Germany and Brazil in the context of the 2014 World Cup will have these two countries as protagonist locations. However, for this same event, several countries are likely to display interest in the match and will be therefore considered as interested locations. In particular, we model this interaction by defining a geopolitical link between Germany and Brazil, since both countries are protagonist of the same event.
Using these notions, we analyze a 2-year dataset of world news shared by users on Twitter, encompassing 193 million shared messages, called tweets. We create vector representations of events based on geopolitical interactions, in order to study relationships among countries. We do so from two points of view: a visual perspective, which allows people to manually explore the resulting information, and from a data mining perspective, which yields different metrics about the strength of the relationships found.
From the visual point of view, we created Galean, a visual web-based tool that displays event-level information within its geographical context. In particular, it displays events’ protagonist countries, as well as their impact in social media, measured as number of tweets and whether the event is of local, regional or global scope. When the user selects a particular event, the interface displays a choropleth map with the distribution of interested countries.
Our tool also allows the user to inspect the events over time, by providing a simple timeline, which displays the number of events per date. In addition, the user can apply several filters to find particular events or international relationships. This interface allows users to monitor, explore, and understand ongoing events as they are discussed in social media. A preliminary version of Galean, focused on Chilean news, can be found at http://galean.cl/explorar
From the data mining perspective, we studied several quantitative metrics to understand which countries were the most similar in terms of being protagonists of the same events. Specifically, we studied 20,066 news events contained in our dataset and analyzed the resulting relationships using graph structures.
We define each country as a node in the graph, where its size represents the number of events that the country participates in. Weighted edges represent the resulting similarity found between countries. We disconnect this graph by varying the minimum similarity for an edge to exist. This allowed us to observe which countries have strong links such as, for example, Ukraine and Russia, which displayed intense interactions as a consequence of the Crimean crisis which evolved in the period of our data collection.
We can study the communities that emerge from these graphs, which summarize years of world interactions. These interactions show the impact that events, and the countries in which they occur, have on social media users and on news media.
The two perspectives used in our study, visual and data mining, allowed us to inspect a vast amount of geopolitical interactions, extracted automatically from Twitter. Most interestingly, our overall results reflect real-world historical international relations. This indicates that social media data has potential value for discovering new knowledge in fields such as historic research, political sciences, journalism, and much more.