News

Neo4j 1.8.M04 – Happy Paths

June 12, 2012

7 min read

Neo4j 1.8 Milestone 4 is available today, offering a few new ways to help you find happy paths. To query a graph you use a traversal, which identifies paths of nodes and relationships. This release updates the capabilities of Neo4j’s core Traversal Framework and introduces new ways to use paths in Cypher.

Graph Sherpa Mattias Persson

Mattias Persson works throughout the Neo4j code base, but is particularly well acquainted with the Traversal Framework, a core component of the Neo4j landscape. He’s agreed to guide us on a traversal tour:

AK: So, what exactly is a Traversal?
MP: I would say from one or more given nodes in your graph move around to other nodes via their connected relationships in search of your answer. The traversal can be controlled in different ways, for example which relationships to traverse at any given position, ordering and so on. The general outcome is a list of paths from which the relevant information can be extracted.
AK: And the Traversal Framework, then. Is it just for describing a Traversal?
MP: Sure, it’s for describing where the traversal should go and also implementation to execute the traversal itself.
AK: Can you give an example, like how would I find the friends of my friends?
MP: So here the starting point is you, the node representing you. And you’d tell the traversal to follow KNOWS relationships or similar down to depth 2. Also every friend of friend should only be returned once (such uniqueness is by default). So in embedded code:

Iterable<Node> friendsOfFriends = traversal()
  .breadthFirst()
  .relationships(KNOWS)
  .evaluator(Evaluators.atDepth(2))
  .traverse().nodes();

AK: OK, interesting. It’s such a different way of querying, though. For people who are new to Traversals, what’s your advice for how to ‘get it’?
MP: Look at traversals as local, where instead of having your entire database and query globally by matching values, you start at a known point where your relationships becomes your index and lead you to what you’re looking for. So you describe how the traversal will behave, where it should go and not go and you receive callbacks about relevant data, as per your description.
AK: And what are the benefits of the new update to the Traversal Framework?
MP: There are some additions here. One is bidirectional traversals, which is essentially like describing two traversals, one from each side (meaning one or more given start nodes) and where they collide in the middle will produce results in the form of paths. In most scenarios where you know both the start and end node(s) a bidirectional traversal will get you your answer with much less relationships traversed, i.e. faster traversal. Reason being that number of relationships needed to be traversed on each depth increases exponentially, so by traversing half the depth from each side cuts down on that growth. The “all paths” and “all simple paths” implementations in the graph-algo collection uses bidirectional traversals now. Dijkstra and A* will probably move over to that as well, and it’s essentially just a small change in your traversal description to make it bidirectional.
There’s also an addition to the “expander”, i.e. the one responsible for deciding which relationships to follow given a position in the traversal. Previously it could only make decisions based on the node for the current position, but now it can view the whole path leading up to the current position.
Also some minor things like being able to get metadata about the traversal (number of relationships visited and so forth), more convenience methods on Path interface.
AK: Nice. That’s a lot of good stuff. How will REST users be able to take advantage of these new capabilities?
MP: Well, you can soon expect Cypher to optimize queries that can take advantage of it. That’s the usual thing, just keep writing queries and we’ll keep making them faster.
AK: Thanks so much Mattias for all the hard work.

Paths as Expressions

In Cypher, much of the work in a statement involves working with paths. Now, paths themselves can be treated as expressions. This is most immediately explained with a simple example. Prior to 1.8.M04, you could capture a path with an identifier like this:

START n=node(...), m=node(...) 
    match p=n-->()<--m 
    return collect(p) as allPaths

With paths as expressions, that can be re-written as:

START n=node(...), m=node(...) 
    return n-->()<--m as allPaths

Simply return the path that you want. There are, of course, much more fun things that can be done with this, which we’ll leave to explore another time. Because the best thing to do right now is…