Handling Connected Domains with the “Right Tool for the Job”


Sarah Mei recently wrote a great blog post describing the problems she and her colleagues ran into when managing highly connected data using document databases.

Document databases (like other aggregate stores) do a good job at storing a single representation of an aggregate entity, but they struggle to handle use cases that require multiple, different views of the domain. Handling connections between documents tends to be an afterthought that isn’t covered well by the aggregate data model.

Real-World Use Cases


Sarah described how she worked on a TV show application at Pivotal and discussed the modeling and data management implications that surfaced when the application’s use case evolved.

The same applied when working on the Diaspora project which started out as a Ruby on Rails application using MongoDB.

For both projects, these requirements caused difficulties with the chosen data model and database which triggered the move to PostgreSQL. A relational database was chosen as it allowed some of the fidelity in the model to return.

Unfortunately this comes at the cost of dealing with queries with a high number of JOINS which can cause performance issues.

Fortunately, there is a data model that embraces rich connections between your domain entities: Graph databases.

Live Graph Data Models of Diaspora and the TV Show


To show how a graph database would handle these use cases, we created two live graph data models of both a social network like Diaspora and a TV show graph.

To this end, we set up a small example data set and then represent the use-cases she mentioned as a set of graph search queries with the Cypher graph query language. These GraphGists allow for easy modeling discussions, a live exploration of the dataset, and provide a good starting point for your own (forked) variant of the domain model.

Example Graph Data Model – TV Shows


To quickly develop the models, we use the typical patterns that we’re looking for in a graph when answering the use cases described. We call it whiteboard-friendlyness 🙂

Learn Why Graph Databases Are Best for Connected Data, Like in Diaspora


Here are some example patterns we might want to model for:

Shows, seasons and episodes:

(:TVShow)-[:HAS_SEASON]->(:Season)-[:HAS_EPISODE]->(:Episode)

Characters played by actors featured in a episode:

(:Episode)-[:FEATURED_CHARACTER]->(:Character),
(:Character)<-[:PLAYED_CHARACTER  ]- (:Actor)

Users writing reviews for individual episodes:

(:User)-[:WROTE_REVIEW]->(:Review)<-[:HAS_REVIEW]-(:Episode)

Using these basic patterns we can quickly create sample data for the domain and also develop the queries used to solve the use cases. For example:

Listing all the episodes (filmography) of an actor across episodes and shows:

MATCH
(actor:Actor)-[:PLAYED_CHARACTER  ]->(character),
(character) <-[:FEATURED_CHARACTER]- (episode),
(episode)-[*]->(show:TVShow)
WHERE actor.name = "Josh Radnor"
RETURN show.name, episode.name, character.name

If you want to learn more, please check it out in more detail in the live graph model.

Example Graph Model – Social Network


WM1MgZFJd1bN5ivKo6rggBegYuHJfsN6RVCIsO8KskueNqquczdbePlDyaevOOAcuouxXLxQjVKHIX4kVJkFa1JD2dcp055b6Ogkh5cdLsVysnTrNbeMe_Sy


Again, a few examples of patterns we’ll be looking for:

Users, friends and posts:

(:User)-[:FRIEND]->(:User)-[:POSTED]->(:Post)

Posts, comments and commenters:

(:User)-[:POSTED]->(:Post)<-[:COMMENTED]-(:User)

Users like posts:

(:User)-[:LIKED]->(:Post)

Now for our Cypher queries. Find the posts made by Rachel’s friends:

MATCH (u:User)-[:FRIEND]-(f)-[:POSTED]->(post)
WHERE u.name = "Rachel Green"
RETURN f.name AS friend, post.text as content

List people who commented on posts by Rachel’s friends:

MATCH (u:User)-[:FRIEND]-(f)-[:POSTED]->(post)<-[:LIKED]-(liker)
WHERE u.name = "Rachel Green"
RETURN f.name AS friend, post.text as content,
COLLECT (liker.name) as liked_by

And you can see the rest in the live graph model.

Graph Databases as a Niche Technology?


As you can see, it is incredibly easy to model these use cases with a graph database. So why weren’t they considered? To quote from the article:

But what are the alternatives? Some folks say graph databases are more natural, but I’m not going to cover those here, since graph databases are too niche to be put into production.

This is an interesting observation, as Neo4j is the most widely used graph database and has been running in production setups for more than 10 years now.

Neo Technology has hundreds of customers (and many of them are household-name enterprises), and there are even more community users that deploy Neo4j for their production applications. The industries of these use cases span everything from network management, gaming, social, finance, job search to logistics and dating sites.

We can understand why some people may have felt that graph databases were a niche technology in 2010 when Diaspora got started – we actually backed Diaspora on Kickstarter and offered our help at the time – but now the landscape has changed and graph databases are an uncontroversial choice.

Judge for Yourself


If you work in a domain with richly connected data, we encourage you to try to model it as a graph and manage it with a graph database. For some more insights of how this works, feel free to check out the freely available book Graph Databases by O’Reilly.

Also, the offer to support Diaspora still stands! We’re happy to help, so please reach out to us if you’re interested. You can also follow the discussion with Sarah on Twitter. Feel free to jump in!

Cheers,

Michael Hunger
with help from Kenny Bastani, Mark Needham and Peter Neubauer.


Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your mission-critical application today.

Download My Free Copy

 

Keywords:  


About the Author

Michael Hunger, Developer Relations

Michael Hunger Image

Michael Hunger has been passionate about software development for a very long time. For the last few years he has been working with Neo Technology on the open source Neo4j graph database filling many roles. As caretaker of the Neo4j community and ecosystem he especially loves to work with graph-related projects, users and contributors. As a developer Michael enjoys many aspects of programming languages, learning new things every day, participating in exciting and ambitious open source projects and contributing and writing software related books and articles.


5 Comments

Bullshit gold.<br /><br />Joins or subqueries can be optimized into one another and vice-versa depending on the specific characteristics (SQL Server for example requires you to write only subqueries and auto-optimizes into joins where applicable – supposedly).<br /><br />&quot;Graph&quot; databases are just a simplified case of SQL (arguably, the most useful one) and are subject to the exact same

Ludovic: what you say may be true of some graph databases–like FlockDB which is just a thin layer atop MySQL– but it&#39;s not true of Neo4j.<br /><br />Here&#39;s the tech secret (not magic, even if it can sometimes feel that way to users): records are stored on disk &amp; in memory in fixed-length buffers, and point to each other using (offset) pointers. This technique, which amounts to

Philip Rathle: You can&#39;t set pointers at write time that will also be correct when loading in-memory, unless of course you load the whole block as once, which doesn&#39;t make much sense.<br /><br />According to you, the particularity with Neo4j is that it stores the relations with the item, which is basically just another way to model your data, that does provide faster access to relations.<

Ludovic: There are no statements mentioned here that claim that Neo4j is &quot;better&quot; than PostgreSQL in general sense. The article simply articulates that Graph Databases allow you to do fairly complex things very simply.<br /><br />Take a look at https://github.com/neo4j-contrib/graphgist/wiki<br /><br />You&#39;ll find a set of interactive tutorials that show you the benefit of using

pandahands says:

I would like to learn this stuff for handling recruitment data .. looks good seems intuitive one problem i have is building the web interface not got much experience of that .. also best way to model the data….

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Share your Graph Story?

Email us: content@neotechnology.com


Have a Graph Question?

Stackoverflow
Contact Us

Popular Graph Topics

Archives