Exploring Stackoverflow
Every developer has a tab open Stack Overflow, the massively popular question-and-answer site. This dataset explores questions, answers, comments, and tags related to the neo4j
- and cypher
- tagged questions on Stack Overflow.
Load JSON Import
Update this dataset using apoc.load.json
. Run the query as-is or add a tag to the tags
array.
// look for several pages of questions
WITH ["neo4j","cypher"] as tags
UNWIND tags as tagName
UNWIND range(1,2) as page // careful with throttling
WITH "https://api.stackexchange.com/2.3/questions?page="+page+"&pagesize=100&order=desc&sort=creation&tagged="+tagName+"&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" as url
CALL apoc.load.json(url) YIELD value
CALL apoc.util.sleep(250)
UNWIND value.items AS q
// create the questions
MERGE (question:Question {uuid:q.question_id})
ON CREATE SET question.title = q.title,
question.link = q.share_link,
question.creation_date = q.creation_date,
question.accepted_answer_id=q.accepted_answer_id,
question.view_count=q.view_count,
question.answer_count=q.answer_count,
question.body_markdown=q.body_markdown
// who asked the question
MERGE (owner:User {uuid:coalesce(q.owner.user_id,'deleted')})
ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)
// what tags do the questions have
FOREACH (tagName IN q.tags |
MERGE (tag:Tag {name:tagName})
ON CREATE SET tag.link = "https://stackoverflow.com/questions/tagged/" + tag.name
MERGE (question)-[:TAGGED]->(tag))
// who answered the questions?
FOREACH (a IN q.answers |
MERGE (question)<-[:ANSWERED]-(answer:Answer {uuid:a.answer_id})
ON CREATE SET answer.is_accepted = a.is_accepted,
answer.link=a.share_link,
answer.title=a.title,
answer.body_markdown=a.body_markdown,
answer.score=a.score,
answer.favorite_score=a.favorite_score,
answer.view_count=a.view_count
MERGE (answerer:User {uuid:coalesce(a.owner.user_id,'deleted')})
ON CREATE SET answerer.display_name = a.owner.display_name
MERGE (answer)<-[:PROVIDED]-(answerer)
)
FOREACH (c in q.comments |
MERGE (question)<-[:COMMENTED_ON]-(comment:Comment {uuid:c.comment_id})
ON CREATE SET comment.link=c.link, comment.score=c.score
MERGE (commenter:User {uuid:coalesce(c.owner.user_id,'deleted')})
ON CREATE SET commenter.display_name = c.owner.display_name
MERGE (comment)<-[:COMMENTED]-(commenter)
);
Read More: Import from StackOverflow API
Top Tags:
MATCH (q:Question)-[:TAGGED]->(t:Tag)
RETURN t.name, count(q) AS questions
ORDER BY questions DESC
LIMIT 5;
Exploring Users
Top users asking questions:
MATCH (u:User)-[:ASKED]->(q:Question)
RETURN u.display_name, count(*) AS questions
ORDER by questions DESC
LIMIT 10;
Top users answering:
MATCH (u:User)-[:PROVIDED]->(a:Answer)-[:ANSWERED]->(q:Question)
RETURN u.display_name as user,COUNT(a) AS answers
ORDER BY answers DESC LIMIT 10;
Shortest path between users:
MATCH path = allShortestPaths((u1:User {display_name:"alexanoid"})-[*]-(u2:User {display_name:"InverseFalcon"})
)
RETURN path LIMIT 1;
Unanswered Questions
Tags for unanswered questions:
MATCH (q:Question)-[:TAGGED]->(t:Tag)
WHERE not t.name IN ['neo4j','cypher'] AND NOT (q)<-[:ANSWERED]-()
RETURN t.name as tag, count(q) AS questions
ORDER BY questions DESC LIMIT 10;
Correlation
Tag correlations:
MATCH (t1:Tag)<-[:TAGGED]-()-[:TAGGED]->(t2:Tag)
WHERE id(t1) < id(t2) and t1.name <> 'neo4j' and t2.name <> 'neo4j'
RETURN t1.name, t2.name,count(*) as freq
ORDER BY freq desc LIMIT 10;
Engagement
User engagement over time:
MATCH (u:User)-[:PROVIDED]->()-[:ANSWERED]->(q:Question)-[:TAGGED]->(t:Tag)
WHERE u.display_name = "InverseFalcon"
RETURN apoc.date.format(q.creation_date,'s','yyyy-MM') as month, count(distinct q) as count, collect(distinct t.name) as tags
ORDER BY month asc
Virtual Graphs
Project tags via co-occurrence:
MATCH (t1:Tag)<-[:TAGGED]-()-[:TAGGED]->(t2:Tag)
WHERE id(t1) < id(t2) and t1.name <> 'neo4j' and t2.name <> 'neo4j'
WITH t1, t2,count(*) as freq where freq > 3
RETURN t1,t2, apoc.create.vRelationship(t1,'OCCURRED',{freq:freq},t2) as rel
Is this page helpful?