Identifying Super-spreaders of Misinformation on Social Media

Matthew R. DeVerna, Rachith Aiyappa, Diogo Pacheco,
John Bryden, Filippo Menczer

July 7th, 2021

Online super-spreaders: a growing concern

  • Fake news on Twitter during the 2016 U.S. presidential election (2019) Grinberg et al., Science
  • The COVID-19 Infodemic: Twitter versus Facebook (2021) Yang et al., Big Data & Society

Can we identify super-spreaders of misinformation on social media?


Reliable over time

Platform agnostic

FIB index


  • False

  • Information

  • Broadcaster

What is the FIB index?


Repurpose the h-index...

... to capture users who consistently share low-credibility sources...

... that are reshared many times.

h- vs. FIB- index?


h-index = 100

100 publications with at least 100 citations

FIB-index = 100

100 tweets (containg a low-credibility source) with at least 100 retweets

General approach


  1. Reconstruct a misinformation ecosystem
    • Query the Twitter Decahose with low-credibility sources (Iffy+)
      • All tweets contain a low-credibility source
    • 10 months (Jan. – Oct. 2020)
  1. Identify super-spreaders from the first two months
  1. Analyze how super-spreaders account for future misinformation

Selection metrics


All metrics calculated from the Jan/Feb misinformation data

  • Popularity (number of followers)
  • Influence (total retweets)
  • k-core (users from the inner most shell)
  • FIB index

Which group removed the most misinformation retweets from future months?


  • Top FIBers and Influentials dominate
    • Remove +60% of the following eight months misinformation
  • FIBers dismantle most quickly
  • On average, FIBers remove ~2% more
    (+490,000) misinformation retweets per user removed
    in comparison to Influentials

How else do FIBers and Influentials differ?


How much of what they share is misinformation?

  1. Gather all tweets sent by these users
    • Jan - Mar. 2020
  2. Calculate the proportion of misinformation shared by user $i$ ($m_i$)
$$m_i = \frac{\text{# low-credibility sources shared}_i }{ \text{all sources shared}_i }$$

FIBers share ~7.4x more misinformation (per source) than influentials

Removing Influentials also removes much more non-misinformation

FIB-index is much more reliabile

How consistent is the FIB index over time?


On average, 52% of future FIBers were previously identified in the Jan/Feb period

~52% of top FIBers (85 out of 181) have been suspended by Twitter

In summary...

Present a novel application of a classic network algorithm to
reliably identify super-spreaders of misinformation
on social media

FIB Index

Test it's efficacy on real-world Twitter data

What we learn...

Outperforms other tested baseline metrics

Users with the highest FIB index...

  1. Removed +60% of future misinformation
  2. On average, +1/5 sources shared by FIBers are low-credibility
  3. On average, +50% of future FIBers were identified in previous months
  4. ~52% of identified users were subsequently removed by Twitter

Platform Agnostic

@mdeverna2

mdeverna@iu.edu


Rachith Aiyappa

Diogo Pachecho

John Bryden

Filippo Menczer