Personality Prediction based on Pattern Analysis on Social Media


On August 10th, 2015 Coby Persin uploaded a video on YouTube titled The Dangers Of Social Media (Child Predator Social Experiment) Girl Edition!. In this video he posed as a young boy (Jason) aged 15 years old, and he sent friend request to three girls aged 14, 13 and 12. Then he spoke to these girls for 3-4 days, then he requested these girls to meet up. He was supposed to meet a girl at the park, the other at her house and the last agreed to get picked up by his(Jason’s) brother so that they could meet somewhere else. You need to watch the video to know what happened.

This video inspired us, the more we searched on-line we discovered such events occurred more frequently than we expected. We wanted to make a model that analyses the behavioural patterns of people on social media and alerts the required personnel when it finds similar patterns.

Problem Statement

In our example we are trying to find the accounts that exhibit such behaviour, by running script on their database. In this example we are trying to filter out Males who exhibit such tendencies. The qualities that define such a activity are :

  1. The account has only one display image.

  2. The email used to create the account is never an email that is used for any public communication, and is not available on any professional network(like LinkedIn, etc).

  3. The culprit and the prey have no mutual friends.

  4. The friends that the culprit has are usually people with similar profiles, who are usually staying abroad.

  5. The account that the culprit uses has recently been created, and the date of creation of the account is usually less than a week prior to the friends request sent.

  6. The conversation that the culprit has with his victims usually revolves around sex, drugs, money, phone number, nudity, etc., and the usage of these words are actually more often then when the victim chats with her friends.

The points listed above for this model can be broadly categorized as graph problems and non graph problems. The issues 3 and 4 represent graph problems and hence this data is represented as nodes and relations, and the rest can be classified as non graph problems and are represented as properties on the nodes.

The Model: Nodes & Relationships

The model revolves around a Person Node that is a representation of an account on social media website.

Person Node
Figure 1. Person Node

Each Conversation between Persons is represented via a Conversation Node. This node contains the conversation exchanges between two Persons.

Conversation Node
Figure 2. Conversation Node

Each conversation has a [HAD_A] relation with two Person Nodes.

Relationship between Person and Conversation Nodes

image:: [Relationship between Person and Conversation Nodes]

To represent point 2, we have another node called PublicEmails, this will just hold a set of valid email ids. In a real world scenario this will be a third party service that will not be a part of the social media app, and hence for our example it is just represented by a node containing valid emails.

Public Email Node
Figure 3. Public Email Node

Similar to point 2, we have a node called MaliciousWords that will help us represent point 6.

Malicious Words Node
Figure 4. Malicious Words Node

Cypher & Usage

In our example we have broadly two types of Person Nodes:

  • Normal Average User of a Social Media Website.

  • People who exploit the Social Media for Malicious activities. In our example all theses people have a names starting with Creep.

Also the current model targets Male Creeps who are trying to females.

Setup the Data

The query present below inserts all the required nodes and relationships in the DB.

The Creepiness Index

The points that were listed out initially can be covered in the below mentioned two queries. The first query mentioned calculates the Creepiness Index. This query runs for every conversation a Female has with a Male, and calculates the how many times a malicious word was used as compared to the total words exchanged. The intentions of a Creep are never good and it is reflected in the kind of conversation he has with his social media friends. His conversation mostly revolves around sex, drugs, flattery, etc. and his usage of such words will be more often as compared to a normal conversation with a friend.

Creepiness Index
The Creepiness Index is the number of malicious words used per conversation.
Cyper to Calculate the Creepiness Index of a Friend
MATCH (n:Person {sex:"female"})-[r:FRIENDS_WITH]-(m:Person {sex:"male"}), (m)-[:FRIENDS_WITH]-(x), (pubEmails:PublicEmails)
WITH n, m,count(distinct f) as fof_count,
size(collect(x)) as total,
size( [other in collect(x) where <>]) as aliens,
size([w in collect(pubEmails) WHERE m.userid = w ]) as listedinPublic,
RETURN n.userid as target, m.userid as potentialSuspect,fof_count as noOfMutualFriends,toFloat(aliens*100)/total as percentofNonLocalFriends , (listedinPublic=0) as emailNotListed, (m.numberOfPics=1) as singleProfilePic, (toFloat(r.friends_since) -  toFloat( m.accountcreationdate)) < toFloat("864000000") as friendsLessThan10Days order by target,fof_count asc, percentofNonLocalFriends desc

Potential Creeps

The second query is centred around the authenticity of the Creepy social media profile. This query selects potential Creeps. Here the query selects all the friends that a Female has, who are Male and the have no common friends. The query also checks for the following in the list obtained :

  • Does the account have only 1 profile picture.

  • Mutual friend count.

  • Percentage of friends that are not residents of the country where the Female lives.

  • The friend request sent by the potential creep was less than 10 days from the day of account creation.

  • The email used for the creation of the account, is being used publicly or an any professional website.

Cyper to segregate fake accounts from actual user accounts
MATCH (n:Person {sex:"female"})-[r:FRIENDS_WITH]-(m:Person {sex:"male"}), (m)-[:FRIENDS_WITH]-(x), (pubEmails:PublicEmails)
WITH  n, m,count(distinct f) as fof_count,size(collect(x)) as total,
size( [other in collect(x) where <>]) as aliens  ,
size([w in pubEmails.emails where m.userid = w]) as listedinPublic,r
RETURN n.userid as target, m.userid as potentialSuspect,fof_count as noOfMutualFriends,toFloat(aliens*100)/total as percentofNonLocalFriends , (listedinPublic=0) as emailNotListed, (m.numberOfPics=1) as singleProfilePic, (toFloat(r.friends_since) -  toFloat( m.accountcreationdate)) < toFloat("864000000") as friendsLessThan10Days order by target,fof_count asc, percentofNonLocalFriends desc

Query1 Output & Observations

From the query output below we can see there are Creep1 and CreepFriend6 accounts show higher values of CreepyConversationIndex. Creep1 shows similar behaviour towards another female as well, and hence this account can be flagged as a potential threat.

CreepFriend6 can also be kept under the radar as his CreepyConversationIndex is high, also the number of conversations he has had is less and currently there are no signs of recurrence, but cannot be flagged as a potential threat as of now.

Query1 Output and Observations
Figure 5. Query1 Output

Query2 Output and Observations

The output of Query2 bolsters our findings from Query1. Here we see Creep1 having minimal or no mutual friends with his potential targets, also the percentage of his non-local friends is relatively higher. His email account is not used on any other publicly available domain or any professional social network, combined with factors like having a single profile picture and the time between the account creation and befriending is less than 10 days, provides more evidence that Creep1 is malicious account and should be monitored as a potential threat.

Query2 Output and Observations
Figure 6. Query2 Output


This model is currently in the nascent stage and is not completely equipped to flag malicious accounts. Also, this model can be extend to flag females poaching males too, also this can be used in a completely different context such as terrorist activities, human trafficking, prostitution etc. The rule set may vary from context to context, eg. like Creepy Conversation Index may be higher in First World Countries as compared to Third World Counties as the Malicious Words may be used as jargons or part of normal conversation.

This model will not make the social media completely safe, and will not eradicate such activities, but will surely help to mitigate the risks of such activities.

As C.J. Roberts said:
People often believed they were safer in the light, thinking monsters only came out at night. But safety – like light – is a façade.