NBC News Analyzes Hundreds of Thousands of Russian Troll Tweets Using Neo4j

The Challenge

There’s no question that Russian Twitter trolls interfered in the 2016 U.S. Presidential election. But determining precisely how they did so has been difficult due to the shadowy nature of cyber warfare, the anonymity of the internet, the ease of hiding behind counterfeit identities and the vast volume of social media data.

In November 2017, the U.S. House of Representatives Permanent Select Committee on Intelligence released a list of 2,752 Twitter accounts associated with the Internet Research Agency, the Kremlin-linked “troll farm.” (Twitter later expanded the list to 3,814 accounts.) Russian agents impersonated U.S. citizens, news organizations and political groups, and set up fake accounts to spread disinformation and incite division.

By the time the list was released, Twitter had suspended the accounts and deleted tweets and user profiles. NBC reporters needed to find the missing troll tweets.

How could the data be recovered and analyzed? How did the networks operate? How did trolls infiltrate the online conversations of everyday Americans and attempt to sway public opinion? The questions were of paramount public interest – and the answers elusive without tools to rescue and analyze the data.

The Solution

The graph showed the relationships between entities such as tweets, users (some exposed as known trolls), hashtags, source applications and links.

Graph algorithms measured centrality of nodes based on connections with other entities. Community detection algorithms revealed networks of users who frequently interacted – and identified which trolls were influencers and which simply amplified other trolls. PageRank identified the most influential accounts within each cluster.

The reporters began to see the troll networks in action. Each community featured a small core of content generators and a larger body of retweeters. Only about 25 percent of the troll tweets were original; the rest were retweets. Trolls took advantage of common hashtags and replied to popular accounts in order to amass followers and build influence.

The trolls left lots of footprints. Legitimate Twitter users often tweet from their phones, but the investigators discovered a disproportionately high number of tweets from the Twitter web client. When plotted by time, troll tweets spiked during working hours in Russia.

Download Case Study