This Week in Neo4j – Time Based Graph Versioning, Pearson Coefficient, Neo4j Multi DC, Modeling Provenance


Welcome to This Week in Neo4j, where we round up the last week in the world of graph databases. This week our Chief Scientist Dr Jim Webber describes how to run Neo4j in a Multi Data Center Environment, Max De Marzi shows us how to find the shortest path on a rail network, and Stefan Bieliauskas shows us why graphs are a perfect fit for modeling data provenance


This week’s featured community members is David Stevens, Global Technology Transformation Lead, DXC.

David Stevens - This Week’s Featured Community Member

David Stevens – This Week’s Featured Community Member

David is the author of the DXC DigitalExplorer, an Enterprise knowledge graph built using Neo4j. The platform provides the means to understand, shape, and enable Digital Transformation projects, and David won a Graphie for his work at GraphConnect NYC 2018.

David presented The Enterprise Knowledge Graph Explorer as part of this week’s Neo4j Online Meetup, and also recently featured in the 5 minute interviews series.

He’ll will be presenting at GraphTour Madrid next week, so if you’re going don’t forget to say hi!

Graph versioning Episode one — Time Based


Versioning graphs is a commonly asked question, and Tom Geudens has started a series of post explaining the different approaches.

Installment 1 focuses on time-based versioning of graphs. Using an e-commerce example, Tom shows how to separate identity from state, where the name of shops and the products that they sell can vary over time.

Community detection of survey responses based on Pearson correlation coefficient with Neo4j


A couple of weeks ago we added the Pearson Similarity algorithm to the Graph Algorithms library, and Tomaz Bratanic wrote a blog post showing how we could use it to make sense of Kaggle’s Young People Survey dataset.

This dataset contains amongst other things, music preferences, phobias, and health habits, and Tomaz initially shows how to use the algorithm to work out correlations between the answers in these different categories.

He then builds a similarity graph of users based on their answers, and uses the Louvain algorithm to find communities of users, before creating a Gephi visualisation of those communities.

Provenance with Neo4j, Playbook for graph database projects


Running Neo4j in Multi Data Center Environments


Dr Jim Webber was back in the video studio, this time recording a video explaining how to run Neo4j in Multi Data Center Environments.



Jim describes how to configure Neo4j servers with metadata to optimise the way that data is both queried and moved between them. You can learn more about the concepts Jim covers in the Multi DC documentation.

Create a Data Marvel — Part 8: Controlling and Servicing our Comic Endpoints


Jennifer Reif‘s Marvel Series is back, and this week Jennifer shows us how to build the controller and service classes for handling requests and shaping results.

Everything is now in place to feed the data into d3 to better visualize the Marvel Universe in the next installment!

Tweet of the Week


My favourite tweet this week was by Thibault Chevrin:

Don’t forget to RT if you liked it too.

That’s all for this week. Have a great weekend!

Cheers, Mark