2024 Impact factor 2.5

EPJ Data Science Highlight - Social media trending: real or manufactured?

Details: Published on 11 July 2017

Pixabay, CC0 Public Domain. — Pixabay, CC0 Public Domain

The era of "fake news" is upon us. Navigating social media is a constant exercise of judgement, but data science can be a helpful to distinguish real from fabricated trending topics. In EPJ Data Science, Emilio Ferrara and team set out to determine from very early on whether information is being organically or artificially disseminated on social media.

Guest post by Emilio Ferrara, originally published on SpringerOpen blog

Every day, billions of individuals participate in online social media platforms. These digital ecosystems expose their users to tailored information based on individual interests, friendship networks, and the news from the offline world. Each “story”, which in concert with related ones forms a “meme” or information campaign, can emerge organically, from grassroots activity, or in some cases sustained by advertisement or other coordinated efforts.

Most information campaigns are genuine and benign; however, we recently witnessed the emergence of “bad actors” exploiting social media to alter public opinion, with the intent to deceive, or just create chaos. For example, our research showed that before the 2016 US presidential elections fake news became the vehicle to spread disinformation, attack candidates, and generate confusion online. Similarly, we demonstrated how ISIS and other extremist groups exploited Twitter for terrorist propaganda and recruitment purposes.

It is therefore of paramount importance to be able to detect, in their early stage, memes and information campaigns that are artificially sustained, and separate them from the organic ones. This problem has important social implications and poses numerous technical challenges, in part due to the scarcity of large scale annotated datasets with examples of both types of information campaigns.

In EPJ Data Science, we make progress in the direction of discriminating between trending memes that are either organic or promoted by means of advertisement. This classification proves very challenging: ads usually cause bursts of collective attention that can easily be mistaken for those yielded by organic trends. Fortunately, we can rely on Twitter for labeled examples: when a hashtag is promoted by an advertiser, Twitter clearly states so. This feature allowed us to collect a dataset of millions of tweets belonging to promoted information campaigns, as well as millions of tweets belonging to organic trends.

We propose a machine-learning framework and new techniques to classify such memes. Our algorithm exploits hundreds of time-varying features to capture changing network and diffusion patterns, content and sentiment information, timing signals, and user metadata.

We conceptualize two different prediction problems: the early detection of promoted information campaigns right at trending time poses significant challenges due to the minimal volume of activity data available for prediction prior to trending; campaign detection after trending is easier due to the large volume of activity data generated by the many users joining that conversation.

Our framework achieves 75% accuracy for early detection, increasing to above 95% after trending. We evaluate the robustness of the algorithm by introducing several factors, such as random temporal shifts on trend time-series, to reproduce situations that may occur in the real world. We finally explore which features predict promoted campaigns best, finding that content cues provide consistently useful signals; user features are more informative for early detection, while network and timing features are more helpful once more data is available.

In the future, we will extend this framework to monitor social media to detect coordinated information efforts such as fake news, conspiracy theories, anti-vaccination campaigns, etc.

Early detection of promoted campaigns on social media, Onur Varol, Emilio Ferrara, Filippo Menczer and Alessandro Flammini (2017), EPJ Data Science, 6:13, DOI: 10.1140/epjds/s13688-017-0111-y

All news

Editors-in-Chief
David Garcia and Yelena Mejova

Submit your Paper

ISSN: 2193-1127

© EDP Sciences, Società Italiana di Fisica and Springer-Verlag

Conference announcements

International Conference on Phenomena in Ionized Gases
June 20-27, 2025
Aix-en-Provence, France

15th European Conference on Atoms Molecules and Photons (ECAMP)
June 30 to July 4, 2025
Innsbruck, Austria

Joint Annual Meeting of ÖPG and SPS
August 18-22, 2025
Wien, Austria

111th Italian National Society Congress
September 22-26, 2025
Palermo, Italy