OSCON Twitter Graph

As a part of Neo4j’s community engagement around OSCON, we wanted to look at the social media activity of the attendees on Twitter. Working with the Twitter Search API and searching for mentions of “OSCON”, we wanted to create a graph of Users, Tweets, Hashtags and shared Links.   OSCON Twitter Graph Model   The Twitter Search API returns a list of tweets matching a supplied search term. We then populated the graph model that is shown above by representing the results as nodes and relationships, achieved through using Neo4j’s query language, Cypher. We designed a single Cypher query to import each tweet into the graph model in Neo4j. This is achieved using a single parameter that contains all of the tweets returned from Twitter’s Search API. Using the UNWIND clause we are able to pivot a collection of tweets into a set of rows containing information about each tweet, which can then be structured into the outlined graph model from the image.
UNWIND {tweets} AS t
MERGE (tweet:Tweet {id:t.id})
SET tweet.text = t.text,
tweet.created_at = t.created_at,
tweet.favorites = t.favorite_count
MERGE (user:User {screen_name:t.user.screen_name})
SET user.profile_image_url = t.user.profile_image_url
MERGE (user)-[:POSTS]->(tweet)
FOREACH (h IN t.entities.hashtags |
    MERGE (tag:Hashtag {name:LOWER(h.text)})
    MERGE (tag)-[:TAGS]->(tweet)
)
… source, mentions, links, retweets, ...
We used this Cypher query to continuously poll the Twitter API on a regular interval, expanding our graph from the results of each search. At the time of writing this we have imported the following data:

Labels

Count

Tweet

10653

User

4910

Link

1153

Hashtag

742

Source

175

With this, we are able to answer many interesting questions about Twitter users at OSCON. For example, which platform are users tweeting from most often?
MATCH (t:Tweet)-[:USING]->(s:Source)
RETURN s.name as Source, count(t) as Count
ORDER BY Count DESC
LIMIT 5

Source

Count

Twitter Web Client

2294

Twitter for iPhone

1712

Twitter for Android

1590

TweetDeck

877

Hootsuite

668

Which hashtags co-occur with #python most frequently?
MATCH (:Hashtag {name:'python'})-[:TAGS]->(:Tweet)<-[:TAGS]-(h:Hashtag)
WHERE h.name <> 'oscon'
RETURN h.name AS Hashtag, COUNT(*) AS Count
ORDER BY Count DESC
LIMIT 5

Hashtag

Count

java

7

opensource

5

data

5

golang

5

nodejs

5

Which other topics could we recommend for a specific user? Finding the most frequently co-occurring topics to the ones they used and that they haven’t used themselves.
MATCH (u:User {screen_name:"mojavelinux"})-[:POSTS]->(tweet)
    <-[:TAGS]-(tag1:Hashtag)-[:TAGS]->(tweet2)<-[:TAGS]-(tag2:Hashtag)
WHERE tag1.name <> 'oscon' AND tag2.name <> 'oscon'
AND NOT (u)-[:POSTS]->()<-[:TAGS]-(tag2)
RETURN tag2.name as Topics, count(*) as Count
ORDER BY count(*) DESC LIMIT 5

Topics

Count

graphdb

30

graphviz

24

rstats

21

alchemyjs

21

cassandra

21

Which tweet has been retweeted the most, and who posted it?
MATCH (:Tweet)-[:RETWEETS]->(t:Tweet)
WITH t, COUNT(*) AS Retweets
ORDER BY Retweets DESC
LIMIT 1
MATCH (u:User)-[:POSTS]->(t)
RETURN u.screen_name AS User, t.text AS Tweet, Retweets

User

Tweet

Retweets

andypiper

Wise words #oscon http://t.co/f4Jr9hnMcV

470

To test your own queries on this graph model, check out our GraphGist.

Graph Visualization

The interesting aspect of this tweet-graph is that it contains the implicit connections between users via their shared hash tags, mentions and links. This graph differs from the “official” followers graph that Twitter makes explicit. Via the inferred connections we can discover new groups of people or topics we could be interested in. So we wanted to visualize this aspect of our graph on the big screen. We wrote a tiny python application that queries Neo4j for connections between people and tags (skipping the tweets in between) and makes the data available to a JavaScript front-end. The query takes the last 2000 tweets to analyze, follows the paths to tags and mentioned users and returns 1000 tuples of users connect to a tag or user to keep it manageable in the visualization.
MATCH (t:Tweet)
WITH t ORDER BY t.id DESC LIMIT 2000
MATCH (user:User)-[:POSTS]->(t)<-[:TAGS]-(tag:Hashtag)
MATCH (t)-[:MENTIONS]->(user2:User)  
UNWIND [tag,user2] as other WITH distinct user,other
WHERE lower(other.name) <> 'oscon'  
RETURN { from: {id:id(user),label: head(labels(user)), data: user},
    rel: 'CONNECTS',
    to: {id: id(other), label: head(labels(other)), data: other}} as tuple
LIMIT 1000
The front-end then uses VivaGraphJS, a WebGL enabled graph rendering library to render the Twitter activity graph of OSCON attendees. We use the Twitter images and hash tag representations to visualize nodes. Neo4j Twitter Graph Visualization Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook

 

Keywords:  


About the Author

Kenny Bastani, Developer Relations

Kenny Bastani Image

Kenny Bastani is a passionate technology evangelist and and open source software advocate in Silicon Valley. As an enterprise software consultant he has applied a diverse set of skills needed for projects requiring a full stack web developer in agile mode.

As a passionate advocate for the popular graph database Neo4j, Kenny has supported developers from globally recognized companies who have inserted the NoSQL database inside their technology stack. As a passionate blogger and open source contributor, Kenny engages a community of passionate developers who are looking to take advantage of newer graph processing techniques to analyze data.


1 Comment

Hi,

I appreciated a lot you work and suggestion, but I have a kind of different goal:

I need to realize an interactive graph visualization, where the users can:

– see the result of a query (plan or complex, maybe 10 to 500 nodes);
– navigate the graph, moving uno node and all their related ones outside from the mess;
– click on one nnode to load new related ones;
– filter nodes by label (one or more) or by some combination of criteria;

But the most important things are the ability, in a graphic manner (better then Neoclipse, to be honest ;-):
– create a new node and add relationships with one or more existing ones;
– create new relationships between existing nodes;

Have you some idea on a product or library to accomplish these reqs?

Paolo

[…] attendees’ questions about graph databases. Our Developer Evangelists onsite even created the OSCON Twitter graph, mapping the social interactions of conference participants. Read more about how graph databases […]

[…] also did something similar with getting tweets from the Twitter search API into Ne4oj for the OSCON […]

3 Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Upcoming Event

 


Have a Graph Question?

Stack Overflow
Slack
Contact Us

Share your Graph Story?

Email us: content@neo4j.com