OSCON Twitter Graph


OSCON Twitter Graph

As a part of Neo4j’s community engagement around OSCON, we wanted to look at the social media activity of the attendees on Twitter. Working with the Twitter Search API and searching for mentions of “OSCON”, we wanted to create a graph of Users, Tweets, Hashtags and shared Links.   OSCON Twitter Graph Model   The Twitter Search API returns a list of tweets matching a supplied search term. We then populated the graph model that is shown above by representing the results as nodes and relationships, achieved through using Neo4j’s query language, Cypher. We designed a single Cypher query to import each tweet into the graph model in Neo4j. This is achieved using a single parameter that contains all of the tweets returned from Twitter’s Search API. Using the UNWIND clause we are able to pivot a collection of tweets into a set of rows containing information about each tweet, which can then be structured into the outlined graph model from the image.
UNWIND {tweets} AS t
MERGE (tweet:Tweet {id:t.id})
SET tweet.text = t.text,
tweet.created_at = t.created_at,
tweet.favorites = t.favorite_count
MERGE (user:User {screen_name:t.user.screen_name})
SET user.profile_image_url = t.user.profile_image_url
MERGE (user)-[:POSTS]->(tweet)
FOREACH (h IN t.entities.hashtags |
    MERGE (tag:Hashtag {name:LOWER(h.text)})
    MERGE (tag)-[:TAGS]->(tweet)
)
… source, mentions, links, retweets, ...
We used this Cypher query to continuously poll the Twitter API on a regular interval, expanding our graph from the results of each search. At the time of writing this we have imported the following data:

Labels

Count

Tweet

10653

User

4910

Link

1153

Hashtag

742

Source

175

With this, we are able to answer many interesting questions about Twitter users at OSCON. For example, which platform are users tweeting from most often?
MATCH (t:Tweet)-[:USING]->(s:Source)
RETURN s.name as Source, count(t) as Count
ORDER BY Count DESC
LIMIT 5

Source

Count

Twitter Web Client

2294

Twitter for iPhone

1712

Twitter for Android

1590

TweetDeck

877

Hootsuite

668

Which hashtags co-occur with #python most frequently?
MATCH (:Hashtag {name:'python'})-[:TAGS]->(:Tweet)<-[:TAGS]-(h:Hashtag)
WHERE h.name <> 'oscon'
RETURN h.name AS Hashtag, COUNT(*) AS Count
ORDER BY Count DESC
LIMIT 5

Hashtag

Count

java

7

opensource

5

data

5

golang

5

nodejs

5

Which other topics could we recommend for a specific user? Finding the most frequently co-occurring topics to the ones they used and that they haven’t used themselves.
MATCH (u:User {screen_name:"mojavelinux"})-[:POSTS]->(tweet)
    <-[:TAGS]-(tag1:Hashtag)-[:TAGS]->(tweet2)<-[:TAGS]-(tag2:Hashtag)
WHERE tag1.name <> 'oscon' AND tag2.name <> 'oscon'
AND NOT (u)-[:POSTS]->()<-[:TAGS]-(tag2)
RETURN tag2.name as Topics, count(*) as Count
ORDER BY count(*) DESC LIMIT 5

Topics

Count

graphdb

30

graphviz

24

rstats

21

alchemyjs

21

cassandra

21

Which tweet has been retweeted the most, and who posted it?
MATCH (:Tweet)-[:RETWEETS]->(t:Tweet)
WITH t, COUNT(*) AS Retweets
ORDER BY Retweets DESC
LIMIT 1
MATCH (u:User)-[:POSTS]->(t)
RETURN u.screen_name AS User, t.text AS Tweet, Retweets

User

Tweet

Retweets

andypiper

Wise words #oscon https://t.co/f4Jr9hnMcV

470

To test your own queries on this graph model, check out our GraphGist.

Graph Visualization

The interesting aspect of this tweet-graph is that it contains the implicit connections between users via their shared hash tags, mentions and links. This graph differs from the “official” followers graph that Twitter makes explicit. Via the inferred connections we can discover new groups of people or topics we could be interested in. So we wanted to visualize this aspect of our graph on the big screen. We wrote a tiny python application that queries Neo4j for connections between people and tags (skipping the tweets in between) and makes the data available to a JavaScript front-end. The query takes the last 2000 tweets to analyze, follows the paths to tags and mentioned users and returns 1000 tuples of users connect to a tag or user to keep it manageable in the visualization.
MATCH (t:Tweet)
WITH t ORDER BY t.id DESC LIMIT 2000
MATCH (user:User)-[:POSTS]->(t)<-[:TAGS]-(tag:Hashtag)
MATCH (t)-[:MENTIONS]->(user2:User)  
UNWIND [tag,user2] as other WITH distinct user,other
WHERE lower(other.name) <> 'oscon'  
RETURN { from: {id:id(user),label: head(labels(user)), data: user},
    rel: 'CONNECTS',
    to: {id: id(other), label: head(labels(other)), data: other}} as tuple
LIMIT 1000
The front-end then uses VivaGraphJS, a WebGL enabled graph rendering library to render the Twitter activity graph of OSCON attendees. We use the Twitter images and hash tag representations to visualize nodes. Neo4j Twitter Graph Visualization Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook