GraphGists

Finding Influencers in a Social Network

Introduction

For this Graph Gist I am going to be analysing the interaction between users of a social network, such as Twitter or Facebook.

Many examples using social networks focus on friends and recommendations, I will instead focus on how people use the network and try to establish different types of behaviours and find influencers. This is important as the owner of a social network you want to know who the power users are, or as a user to know who might be a good person to follow. You could also you this to identify shoppers on an e-commerce site who have influence through their reviews.

Model

The model for my graph looks like this:

social model
Figure 1. How popular is Alice?

Nodes

User: A user of a social network.

Message: A message is SENT between User`s and have `FORWARD and REPLY_TO relationships with other `Message`s.

Relationships

FOLLOWS: `User`s follow other `User`s.

SENT: `User`s send `Message`s to other `User`s

FORWARD: A Mesage may be a forwarded version of another Message

REPLY_TO: A Message may be a reply to another Message

Setup

Use Case: List all Users and Messages

For our analysis, let’s begin simple by listing all of the `User`s and `Message`s:

MATCH path=(User)-[:SENT]->(Message)
RETURN path

As you can see, it is hard to spot any patterns from this view.

Use Case: Find User Counts

Remember our goal here is to find the influencers in the network, we could start with the most simple measure which is the number of people who follow a user.

MATCH (follower:User)-[:FOLLOWS]->(targetUser:User)-[:FOLLOWS]->(following:User)
RETURN targetUser AS User, COUNT(distinct follower) AS Followers, COUNT(distinct following) AS Following

For a bit more information we could provide the names of all of the followers:

MATCH (p:User)-[f:FOLLOWS]->(p1:User)
RETURN p.id AS User, COLLECT(p1.id) AS Following

While this is interesting, it doesn’t tell us much about the actions of a user. The user they may be inactive, or they may send multiple messages a day.

We can easily see how active users are with the following query:

MATCH (p:User)-[:SENT]->(tweet:Message)
RETURN p.id AS User, COUNT(tweet) AS Tweets

We can now get an idea of how active a user is, but let us dive deeper and see what sort of activity they have.

Use Case: Forwarded Messages as a Measure of Influence

One measure of influence is how often a message from a user gets forwarded throughout the network, so let’s find the most forwarded messages:

MATCH (retweet:Message)-[r:FORWARD]->(tweet:Message)
RETURN tweet, COUNT(r)
ORDER BY COUNT(r) DESC

we can restrict to a certain day by limiting the messages we look at:

MATCH (retweet:Message)-[r:FORWARD]->(tweet:Message {day_sent:'Monday'})
RETURN tweet, COUNT(r)
ORDER BY COUNT(r) DESC

Remember that we are trying to find the influencers, so we need to know who sent those messages:

MATCH (retweet:Message)-[r:FORWARD]->(tweet:Message)<-[:SENT]-(p:User)
RETURN p.id AS User, COUNT(r) AS `Messages Retweeted`
ORDER BY COUNT(r)
DESC LIMIT 5

From this we can see that Bridget gets lots of her messages forwarded, but Mark’s message got more forwards.

If you are a user of Twitter or a similar social network, you will be aware that there are lots of bots on Twitter that simply forward messages. We want to remove these bots from our analysis.

MATCH (p:User)-[s:SENT]->(tweet:Message)-[retweet:FORWARD]->(tweet1:Message), (p:User)-[s2:SENT]->(tweet2:Message)
WITH p, COUNT(DISTINCT tweet) AS forwards, COUNT(DISTINCT tweet2) AS messages
WHERE (forwards*1.00)/messages > 0.8
RETURN p.id AS `Potential Bot`, (forwards*1.00)/messages*100 AS `Percent Retweeted`
ORDER BY `Percent Retweeted` DESC

As you can see, Doug only forwards messages so is probably a bot. To get a better idea of influence we need to remove him and any other bots from the analysis:

MATCH (p:User)-[s:SENT]->(tweet:Message)-[retweet:FORWARD]->(tweet1:Message), (p:User)-[s2:SENT]->(tweet2:Message)
WITH p, COUNT(DISTINCT tweet) AS forwards, COUNT(DISTINCT tweet2) AS messages
WHERE (forwards*1.00)/messages < 0.8
WITH p
MATCH (p)-[s:SENT]->(tweet:Message)-[rt:FORWARD]->(tweet1:Message)<-[:SENT]-(p1:User)
RETURN p1.id as User, COUNT(tweet) as `Retweeted Messages`
ORDER BY COUNT(tweet)
DESC LIMIT 15

Note we now look for users for whom forwards make up LESS THAN 80% of their messages.

As you can see this shows a slightly different picture, as Mark only had messages forwarded by bots. The reason I want to remove the forwarders from the analysis is that a human forwarding will do some filtering and only forward things they like.

We now have a couple of measures of influence, based on follower count and how many forwards a user gets.

There is a third measure that I want to investigate which is how often a user starts a conversation or discussion on Twitter and amongst how many people.

Use Case: Conversations as a measure of influence?

Finding conversations is a good measure of influence a it shows people want to engage with that user.

To begin this analysis, let’s start by getting a list of conversations, note that I have restricted the length of the conversation path, you may want to consider extending for your use case.

MATCH p=(tweet:Message)-[:REPLY_TO*1..10]->(conversation:Message)
RETURN p

We can restrict this to a single conversation:

MATCH (tweet:Message {id:'20'})<-[:REPLY_TO*0..10]-(conversation:Message)
RETURN DISTINCT(conversation) AS Conversation
ORDER BY conversation.day_sent

Note the DISTINCT(conversation), which will ensure we only get one of each message in our response.

Now that we have a list of our conversations, let us dive deeper.

Get a list of messages that start a conversation, that is messages that someone has replied to:

MATCH (tweet:Message)-[r:REPLY_TO]->(conversation:Message)
WHERE NOT (conversation)-[:REPLY_TO]->()
RETURN DISTINCT conversation

and find out who sent the messages that started the conversation:

MATCH (tweet:Message)-[:REPLY_TO]->(conversation:Message)<-[s:SENT]-(p:User)
WHERE NOT (conversation)-[:REPLY_TO]->()
RETURN DISTINCT conversation, p.id AS `Conversation Starter`

Build on this to get a list of the users that a user will engage with and respond to as this shows that there is more than a shallow 'Follow' relationship.

MATCH conv=(b:User)-[:SENT]->(tweet:Message)-[:REPLY_TO]->(tweet1:Message)<-[:SENT]-(a:User)-[:SENT]->(tweet2:Message)-[:REPLY_TO]->(tweet)
RETURN a.id AS `Conversationalist 1`, b.id AS `Conversationalist 2`, COUNT(DISTINCT conv) AS Conversations
ORDER BY a.id ASC

We also want to get an idea of how large the conversations are and how many people are involved in them, a long conversation involving lots of people shows more signs of influence than a short conversation with a couple of people.

MATCH (tweet:Message)-[r:REPLY_TO]->(conversation:Message)<-[s:SENT]-(p:User)
WHERE NOT (conversation)-[:REPLY_TO]->()
WITH DISTINCT conversation,p
MATCH conv=(participant:User)-[:SENT]->(tweet:Message)-[:REPLY_TO*0..10]->(conversation)
RETURN conversation, p, COUNT(DISTINCT participant) AS `Distinct Participants`

Finally modify the query again to add the number of conversations started by the user.

MATCH (tweet:Message)-[r:REPLY_TO]->(conversation:Message)<-[s:SENT]-(p:User)
WHERE NOT (conversation)-[:REPLY_TO]->()
WITH DISTINCT conversation,p
MATCH conv=(participant:User)-[:SENT]->(tweet:Message)-[:REPLY_TO*0..10]->(conversation)
WITH conversation, p, COUNT(DISTINCT tweet) AS messageCount, COUNT(DISTINCT participant) AS participantCount
WHERE participantCount > 2
RETURN p.id, COUNT(p) AS `Conversations`, AVG(messageCount) AS `Average Length`, AVG(participantCount) AS `Average Participants`

As you can see, Alice starts more conversations, but the conversation Mark started had more engagement. You will need to determine yourself which of these has greater influence in your network.

Conclusion

As you can see, Neo4j is a powerful tool for analysing social networks and you can use some of the values above to observe who the influencers are in your network.