GraphGist: EPublishing: A graphical approach to digital publications

by Deepesh Kuruppath

Concept

Electronic publishing has become common in scientific publishing where it has been argued that peer-reviewed scientific journals are in the process of being replaced by electronic publishing. It is also becoming common to distribute books, magazines, and newspapers to consumers through tablet reading devices, a market that is growing by millions each year, generated by online vendors such as Apple’s iTunes bookstore, Amazon’s bookstore for Kindle, and books in the Google Play Bookstore. Market research suggests that half of all magazine and newspaper circulation will be via digital delivery by the end of 2015 and that half of all reading in the United States will be done without paper by 2015.

Included Variables

  • Article: an e-published document

  • Author: a user of type person who 'Authors' articles

  • Expert: a user of type person who contributes to an/multiple articles

  • Media: a user of type media who demand articles and publish them

  • Url: the url of a published article

  • Question: queries raised/posted by Authors

  • Quote: answers posted by Experts to Questions

  • Status: status of an article (InProgress, Completed, Published)

  • Trend: timeline of activities for e.g. when was an article written or conceived, when was it published, who is demanding the article etc.


Graph Data Model


Graph Cypher Sample Data

The cypher query for the sample graph data model is as below

//Node Creation Queries First
CREATE (RelativityTheory:Article {contentId: 'ART000001', name : 'Relativity Theory', content: 'The theory of relativity, or simply relativity in physics, usually encompasses two theories by Albert Einstein: special relativity and general relativity.' })

CREATE (AlbertEinstein:Person {userId: 'UID000001', name : 'Albert Einstein'})
CREATE (ElonMusk:Person {userId: 'UID000002', name : 'Elon Musk'})
CREATE (DaveThomson:Person {userId: 'UID000003', name : 'Dave Thomson'})

CREATE (VivendiSt:Media {userId: 'UID000009', name : 'Vivendi St'})
CREATE (TeslaMedia:Media {userId: 'UID000010', name : 'Tesla Media'})

CREATE (UrlRelativityTheory:Url {url: 'http://www.abcd.com/relativitytheory'})

CREATE (StatusInProgress:Status {status: 'InProgress'})

CREATE (QuestionQID01:Question {questionId: 'QID01', question: 'What is Relativity Theory ?'})
CREATE (QuoteQUT000001:Quote {quoteId: 'QUT000001', quote : 'Relativity theory is a famous theory in physics…'})

CREATE (QuestionQID02:Question {questionId: 'QID02', question: 'What are the consequences of theory of relativity ?'})
CREATE (QuoteQUT000002:Quote {quoteId: 'QUT000002', quote : 'Relativity theory postulates multiple consequences for e.g. Time Dilation, Relativistic mass, Relativity of Simultaneity etc'})


//Run - Create TimeStamp Node for Quote Demand by Author
//Get or create Day
CREATE (day:Day{day:18, month: 12, year: 2013})

// Create TimeStamp Node and new LAST relationship
CREATE (TimeStampNode1:Timestamp {timestamp: 'Oct 12, 2013 4:30PM'})
CREATE (TimeStampNode2:Timestamp {timestamp: 'Oct 12, 2013 6:00PM'})
CREATE (TimeStampNode3:Timestamp {timestamp: 'Oct 12, 2013 7:30PM'})
CREATE (TimeStampNode4:Timestamp {timestamp: 'Oct 12, 2013 9:00PM'})
CREATE (TimeStampNode5:Timestamp {timestamp: 'Oct 12, 2013 10:30PM'})
CREATE (TimeStampNode6:Timestamp {timestamp: 'Oct 12, 2013 11:00PM'})

CREATE

 (RelativityTheory)-[:AUTHOR]->(AlbertEinstein),
 (RelativityTheory)-[:STATUS]->(StatusInProgress),
 (RelativityTheory)-[:DEMAND_SIMILAR]->(VivendiSt),
 (RelativityTheory)-[:OFFERED_PUBLISHING]->(TeslaMedia),
 (RelativityTheory)-[:PUBLISHER]->(TeslaMedia),
 (RelativityTheory)-[:PUBLISHED_URL]->(UrlRelativityTheory),
 (UrlRelativityTheory)-[:PUBLISHED_BY]->(TeslaMedia),

 (RelativityTheory)-[:EXPERT]->(ElonMusk),
 (RelativityTheory)-[:SHARED]->(ElonMusk),
 (RelativityTheory)-[:LIKED]->(ElonMusk),

 (RelativityTheory)-[:EXPERT]->(DaveThomson),
 (RelativityTheory)-[:LIKED]->(DaveThomson),

 (RelativityTheory)-[:QUESTION]->(QuestionQID01),
 (RelativityTheory)-[:QUESTION]->(QuestionQID02),

 (QuestionQID01)-[:SENT_TO]->(ElonMusk),
 (QuestionQID02)-[:SENT_TO]->(DaveThomson),

 (ElonMusk)-[:ANSWER_QUOTE]->(QuoteQUT000001),
 (DaveThomson)-[:ANSWER_QUOTE]->(QuoteQUT000002),

 (RelativityTheory)<-[:QUOTE_FOR]-(QuoteQUT000001)-[:SUBMITTED]->(QuestionQID01),
 (RelativityTheory)<-[:QUOTE_FOR]-(QuoteQUT000002)-[:ACCEPTED]->(QuestionQID02),

 (day)-[:FIRST]->(TimeStampNode1)-[:NEXT]->(TimeStampNode2)-[:NEXT]->(TimeStampNode3)-[:NEXT]->(TimeStampNode4)-[:NEXT]->(TimeStampNode5)-[:NEXT]->(TimeStampNode6)<-[:LAST]-(day),


//Run - Now Connect user VivendiSt node to timestamp thus creating an Event for Demand Similar Activity
(VivendiSt)-[:ACTIVITY_TIME {activity:['TimeStampDemandSimilar'],contentId: ['ART000001']}]->(TimeStampNode1),

//Run - Now Connect user TeslaMedia to timestamp thus creating an Event for Offered Publishing
(TeslaMedia)-[:ACTIVITY_TIME {activity:['TimeStampOfferedPublishing'],contentId: ['ART000001']}]->(TimeStampNode2),

//Run - Now Connect user TeslaMedia to timestamp thus creating an Event for Accepted Publishing
(TeslaMedia)-[:ACTIVITY_TIME {activity:['TimeStampAcceptedPublication'],contentId: ['ART000001']}]->(TimeStampNode3),

//Run - Now Connect user TeslaMedia to timestamp thus creating an Event for Offered Publishing
(UrlRelativityTheory)-[:ACTIVITY_TIME {activity:['TimeStampExternalUrlPublication'],contentId: ['ART000001']}]->(TimeStampNode4),

//Run - Now Connect the Quote to timestamp thus creating an Event for Quote Submission
(QuoteQUT000001)-[:ACTIVITY_TIME {activity:['TimeStampQuoteSubmitted'],contentId: ['ART000001']}]->(TimeStampNode5);

Graph

Loading graph...

Get the Author for a particular article

MATCH (n1)-[:AUTHOR]->(x)
WHERE n1.contentId='ART000001'
RETURN n1.name AS Article, x.name AS Author
Loading table...

Get all the Experts engaged for the article by the author

Are the delays at a given airport mostly out of one’s control (weather delays) or are the delays mostly preventable (carrier delays)? A flight planner would be interested to learn which of these types of delays are most prevalent at each of its airports.

MATCH (n1)-[:EXPERT]->(x)
WHERE n1.contentId='ART000001'
RETURN n1.name AS Article , x.name AS Expert
Loading table...

Get total number of times an expert was quoted i.e. # of quotes accepted for an Expert

MATCH (expert)-[:ANSWER_QUOTE]->(answer)-[:ACCEPTED]->(question)-[:SENT_TO]->(expert)
RETURN expert.name AS Expert, COUNT(*) AS `# of Quotes Accepted by Authors`
Loading table...

Get total number of press appearances of an Expert

Get total number of press appearances of an Expert i.e. # of articles that has published url and for which the expert’s quote was accepted

MATCH (n1)-[:ANSWER_QUOTE]->()-[:ACCEPTED]->()-[:SENT_TO]->(n1)<-[:EXPERT]-()-[:PUBLISHED_URL]-()
RETURN n1.name AS Expert, COUNT(*) AS `No of Press Appearances`
Loading table...

Get total number of articles an expert is currently contributing to

Note - Only Quotes that are in 'submitted' status but not in 'accepted' status will contribute to the count.

MATCH (expert)-[:ANSWER_QUOTE]->(answer)-[:SUBMITTED]->(question)-[:SENT_TO]->(expert)<-[:EXPERT]-(article)-[:STATUS]->(st)
WHERE st.status='InProgress'
RETURN expert.name AS Expert, COUNT(*) AS Count
Loading table...

Get total number of articles in development by an Author

MATCH (a1)<-[:AUTHOR]-(art)-[:STATUS]->(st)
WHERE a1.userId = 'UID000001' AND st.status='InProgress'
RETURN art.name AS `Article Name`, a1.name AS `Author Name`, COUNT(*) as N
Loading table...

Get total number of articles completed by an Author

In the demo example only one article in status 'InProgress' has been shown, so the below result will be empty but this use case will pull a list of all articles in 'Completed' status by an Author

MATCH (a1)<-[:AUTHOR]-(art)-[:STATUS]->(st)
WHERE a1.userId = 'UID000001' AND st.status='Completed'
RETURN COUNT(*) AS `Number of Articles Completed by Author Albert Einstein`
Loading table...

Get total number of articles published in Media by an Author

Any article that has a 'Published External Url' (also in Completed Status) is considered as a 'Published Article'

MATCH (a1)<-[:AUTHOR]-(art)-[:STATUS]->(st), (art)-[:PUBLISHED_URL]-(url)
WHERE a1.userId = 'UID000001'
RETURN COUNT(*) AS `# of Articles Published in Media by Albert Einstein`
Loading table...

Get Featured Article Details

The editor can pick an artcile as a 'Featured Article' for a week or month, the below query will pull details specific to that article. Note - this is the simplified version - in reality we can pull a lot more statistics related to the article in a single query.

MATCH (a1)<-[:AUTHOR]-(art)
WHERE art.contentId='ART000001'
RETURN art.name AS `Article Name`, a1.name AS `Author Name`,art.content AS `Content Details`
Loading table...

Get Featured Article Trend Details

This one query will fetch all the trend details for a Featured Article

MATCH pS=(art)-[:SHARED]->()
WITH count(pS) AS countPS
MATCH pL=(art)-[:LIKED]->()
WITH countPS,count(pL) AS countPL
MATCH pE=(art)-[:EXPERT]->()
WITH countPS,countPL,count(pE) AS countPE
MATCH pO=(art)-[:OFFERED_PUBLISHING]->()
WITH countPS,countPL,countPE,count(pO) AS countPO
MATCH pD=(art)-[:DEMAND_SIMILAR]->()
WHERE art.contentId='ART000001'
RETURN countPS AS `No of Shares`, countPL AS `No of Likes`, countPE AS `No of Experts Engaged`, countPO AS `No of External Publications`, count(pD) AS `No of Demands in Media`
Loading table...

Get Last 6 Latest Activity

Note - An activity is defined by ActivityType, the Artcile Id to which it relates to and the Other User who performed it for e.g. Media Company, Expert etc. The Author can be derived by doing another query on articleId

MATCH (day:Day { day:18, month: 12, year: 2013 })-[:LAST]->(n)<-[:NEXT*0..5]-(timenodes)
WITH timenodes
MATCH p=(a)-[r:ACTIVITY_TIME]-(timenodes)
RETURN collect(r.activity) AS ActivityType, collect(r.contentId) AS ArticleId, timenodes.timestamp AS EventTime;
Loading table...

Console

Enter your own query here.

Running queries, preparing the console!


Moving Forward

This is a just a base model of an epublishing portal, however the real life use cases are far more complex and intriguing, the best part about Neo4J is that it makes a very complex intertwined data model look simple, easy and quick to implement. With all the tools and support we have today for Neo4J, implementing a complex model like epublishing is a cakewalk.

Created by Deepesh Kuruppath:

Run
Table
Graph
Table!
Graph!
Error!
Loading