This Week in Neo4j – 25 March 2017

Welcome to this week in Neo4j where we collect the most interesting things that have happened in the world of graph databases over the last 7 days.

If you’ve got something that you’d like to see featured in a future version let me know. I’m @markhneedham on Twitter or send an email to

In last week’s online meetup Mesosphere’s Johannes Unterstein showed us how to get a Neo4j causal cluster up and running on DC/OS.

This was the culmination of several weeks’ effort where Johannes started with the Neo4j Docker image, figured out how to get it to play nicely with the Mesos ecosystem and created a Mesosphere Universe package so that users can easily create Neo4j clusters via the Marathon scheduler.

On top of this Johannes has been a part of the Neo4j community since 2013 and has organized several meetups as well as writing a Play Framework integration for Spring Data Neo4j.

On behalf of the Neo4j community I’d like to thank Johannes for all his efforts and I’m looking forward to your talk at GraphConnect Europe on 11th May 2017!

Using Graph Visualization to Explore Corruption in Egypt and FIFA

There were a couple of interesting posts showing how to use graph visualizations to explore two different types of corruption.

Lana Chan wrote What Do Big Data Paris and the Panama Papers Have In Common? In this post Lana shows how you can use the Tom Sawyer graph data visualization tool to explore the 2015 FIFA corruption scandal.

Explore everything that's happening in the Neo4j community for the week of 25 March 2017

Visualizing the Egypt corruption network

Noonpost, an interactive Arabic media website, explain how they used Linkurious for large-scale investigations in a project on Egypt’s corruption networks.

In the post, they explain how they were able to explore connections between the army and its affiliates across various influence networks including the health, food, and tourism sectors using a combination of Cypher queries and graph visualizations.

There’s lots of good stuff in both of these posts if you’re interested in data journalism.

If you’d like to do data journalism work using Neo4j but don’t know how, sign up for the Neo4j Data Journalism Accelerator Program and you’ll get the opportunity to work with engineers from Neo4j’s Developer Relations team to get your analysis up and running.

Visual Graph Modeling and Importing

Michael Hunger created a video showing how to sketch graph models and load them into Neo4j using Alistair Jonesarrows tool.

Will Lyon presented a webinar late last week where he showed how to model and import real-world datasets using Neo4j.

Will shows how to import data from Yelp using several different approaches:

    • apoc.load.json – a procedure from the APOC library that can import JSON data directly.
    • LOAD CSV – a Cypher command for importing CSV files. Works well up to ~10 million rows.
    • neo4j-import – a tool for importing large initial datasets.

Will also talks about Neo4j’s user-defined procedures and functions, and if you’re interested in creating your own ones we’ve created a couple of new pages on the Neo4j developer site to help you get started:

Emil in Forbes, Hiking Recommendations, Malware Clustering, and DC/OS

    • Neo4j’s CEO Emil Eifrem features in a Forbes article – Growth Stories: The Magical Power Of A Name – in which he talks about the history of Neo4j and how he came up with the graph databases category. This is a multi-part interview so stay tuned for more next week!
    • Dirk Mahler released version 0.8 of the object graph mapping library for Java extended-objects. It now supports the Bolt protocol which was introduced in Neo4j 3.0.
    • Amanda Schaffer posted slides and code from last week’s talk at pyladies Seattle. Amanda’s created a hiking recommendation engine which uses content-based filtering based on features (e.g., lakes, waterfalls) that hikes have in common. There’s even a bit of web scraping of the WTA using Python’s beautifulsoup library.
    • Our friends from Neueda released version 2.5.0 of the Graph Databases Plugin for the Jetbrains IDE family. The new version adds node and relationship editing as well as listing indexes and constraints.
    • Max de Marzi has a new blog post where he shows how to search for objects across multiple dimensions. Max shows how to use the trusty RoaringBitmap to write a user-defined procedure that short circuits as soon as possible when searching across multiple facets.
    • Shusei Tomonaga wrote about a malware clustering and network analysis tool called impfuzzy that can be used to visualize and look for similar pieces of malware using Neo4j. The similarity score is calculated using the Louvain community detection and Fuzzy Hash algorithms.
    • Pavel Yakovlev released version of hasbolt, a Haskell driver for Neo4j. This release has some minor fixes to keep the strictness and laziness gods happy!

On the Podcast

This week Rik interviewed Alistair Jones about the Causal Clustering feature released in Neo4j 3.1 back in December.

They go through the history of clustering in Neo4j from the use of Zookeeper in the 1.8 series up to the current day where we’ve implemented a version of Diego Ongaro‘s Raft consensus protocol.

If you want to learn more, there’s also a video of Alistair presenting on this topic.

Next Week

So what’s there to look forward to in the world of graphs next week?

Tweet of the Week

My favorite tweet this week was by Jose Ramón Cajide who’s been analyzing Twitter networks using Neo4j in RStudio:

If you want to graph your own Twitter network you can try out the Neo4j Twitter Sandbox. Don’t forget to tweet your graph using the #Neo4j hashtag if you give it a try.

Enjoy your weekend, it’s finally spring – hoorah!

Cheers, Mark