NBC News Analyzes Hundreds of Thousands of Russian Troll Tweets Using Neo4j
Challenge
There’s no question that Russian Twitter trolls interfered in the 2016 U.S. Presidential election.
But determining precisely how they did so has been difficult due to the shadowy nature of cyber
warfare, the anonymity of the internet, the ease of hiding behind counterfeit identities and the
vast volume of social media data.
In November 2017, the U.S. House of Representatives Permanent Select Committee on
Intelligence released a list of 2,752 Twitter accounts associated with the Internet Research
Agency, the Kremlin-linked “troll farm.” (Twitter later expanded the list to 3,814 accounts.)
Russian agents impersonated U.S. citizens, news organizations and political groups, and set up
fake accounts to spread disinformation and incite division.
By the time the list was released, Twitter had suspended the accounts and deleted tweets and
user profiles. NBC reporters needed to find the missing troll tweets.
How could the data be recovered and analyzed? How did the networks operate? How did trolls
infiltrate the online conversations of everyday Americans and attempt to sway public opinion?
The questions were of paramount public interest – and the answers elusive without tools to
rescue and analyze the data.
Solution
The graph showed the relationships between entities such as tweets, users (some exposed as
known trolls), hashtags, source applications and links.
Graph algorithms measured centrality of nodes based on connections with other entities.
Community detection algorithms revealed networks of users who frequently interacted – and
identified which trolls were influencers and which simply amplified other trolls. PageRank
identified the most influential accounts within each cluster.
The reporters began to see the troll networks in action. Each community featured a small core
of content generators and a larger body of retweeters. Only about 25 percent of the troll tweets
were original; the rest were retweets. Trolls took advantage of common hashtags and replied to
popular accounts in order to amass followers and build influence.
The trolls left lots of footprints. Legitimate Twitter users often tweet from their phones, but the
investigators discovered a disproportionately high number of tweets from the Twitter web client.
When plotted by time, troll tweets spiked during working hours in Russia.