GraphGist: Preview

[Warning]Warning

This GraphGist has not yet been submitted and approved for publication. If you're the developer, please submit for publication using the GraphGist Portal.

Many graph database applications need to version a graph so as to see what it looked like at a particular point in time. Neo4j doesn’t provide intrinsic support either at the level of its labelled property graph or in the Cypher query language for versioning. Therefore, to version a graph we need to make our application graph data model and queries version aware.

Separate Structure From State

The key to versioning a graph is separating structure from state. This allows us to version the graph’s structure independently of its state.

To help describe how to design a version-aware graph model, I’m going to introduce some new terminology: identity nodes, state nodes, structural relationships and state relationships.

Identity Nodes

Identity nodes are used to represent the positions of entities in a domain-meaningful graph structure. Each identity node contains one or more immutable properties, which together constitute an entity’s identity. In a version-free graph (the kind of graph we build normally) nodes tend to represent both an entity’s position and its state. Identity nodes in a version-aware graph, in contrast, serve only to identify and locate an entity in a network structure.

Structural Relationships

Identity nodes are connected to one another using timestamped structural relationships. These structural relationships are similar to the domain relationships we’d include in a version-free graph, except they have two additional properties, from and to, both of which are timestamps.

State Nodes and Relationships

Connected to each identity node are one or more state nodes. Each state node represents a snapshot of an entity’s state. State nodes are connected to identity nodes using timestamped state relationships.

Example

The following example describes a simple domain made up of shops, products and suppliers. A Shop SELLS one or more Products, which are SUPPLIED_BY Suppliers. Over time prices change, shops sell different products, and source them from different suppliers. With a versioned graph we’ll be able to see which products a shop was selling, what price they were being sold at, and who was supplying them, at any point in the graph’s history.

Here’s the Cypher for creating our sample graph in its initial state:

Create graph in its initial state
CREATE
(s1:Shop{shop_id:1})
  -[:STATE{from:1388534400000,to:9223372036854775807}]->
  (ss1:ShopState{name:'General Store'}),
(p1:Product{product_id:1})
  -[:STATE{from:1388534400000,to:9223372036854775807}]->
  (ps1:ProductState{name:'Cheese',price:1.00}),
(p2:Product{product_id:2})
  -[:STATE{from:1388534400000,to:9223372036854775807}]->
  (ps2:ProductState{name:'Crisps',price:0.50}),
(p3:Product{product_id:3})
  -[:STATE{from:1388534400000,to:9223372036854775807}]->
  (ps3:ProductState{name:'Orange Juice',price:1.50}),
(u1:Supplier{supplier_id:1})
  -[:STATE{from:1388534400000,to:9223372036854775807}]->
  (us1:SupplierState{name:'International Imports'}),
(u2:Supplier{supplier_id:2})
  -[:STATE{from:1388534400000,to:9223372036854775807}]->
  (us2:SupplierState{name:'Local Markets'}),
(s2:Shop{shop_id:2})
  -[:STATE{from:1388534400000,to:9223372036854775807}]->
  (ss2:ShopState{name:'Cornershop'}),
(s1)-[:SELLS{from:1388534400000,to:9223372036854775807}]->(p1),
(s1)-[:SELLS{from:1388534400000,to:9223372036854775807}]->(p2),
(s2)-[:SELLS{from:1388534400000,to:9223372036854775807}]->(p3),
(p1)-[:SUPPLIED_BY{from:1388534400000,to:9223372036854775807}]->(u2),
(p2)-[:SUPPLIED_BY{from:1388534400000,to:9223372036854775807}]->(u1),
(p3)-[:SUPPLIED_BY{from:1388534400000,to:9223372036854775807}]->(u2)

This graph represents the domain in its initial state on 1 January 2014. All the relationships, both structural and state, are current. Each relationship has a from property with a long value representing 1 January 2014, and a to property with a very large value (End-Of-Time, or EOT, which is a magic number – in this case Long.MAX_VALUE) that effectively means there is no current upper bound to the period associated with the relationship.

Loading graph...

Current Queries

To query the current state of a version-aware graph, we need to match current structural and state relationships, which we can do simply by matching the to property on relationships with our EOT value. Here’s a query for finding all products that currently retail for one pound or less, their suppliers, and the shops where they can be purchased:

Find products currently one pound or less
MATCH (u:Supplier)<-[:SUPPLIED_BY{to:9223372036854775807}]-
      (p:Product)-[:STATE{to:9223372036854775807}]->(ps:ProductState)
WHERE ps.price <= 1.0
MATCH (p)<-[:SELLS{to:9223372036854775807}]-(s:Shop)
MATCH (u)-[:STATE{to:9223372036854775807}]->(us:SupplierState)
MATCH (s)-[:STATE{to:9223372036854775807}]->(ss:ShopState)
RETURN us.name AS supplier,
       ps.name AS product,
       ps.price AS price,
       ss.name AS shop
ORDER BY price DESC

Results:

Loading table...

Modifying the Graph

The kinds of changes we may need to track in a version-aware graph include changes to state and changes to structure. Changes to state involve adding, removing and modifying node properties. Structural changes involve adding and deleting relationships, as well as modifying the strength, weight or quality of relationships by changing one or more of their properties.

Modifying Structure

In the following query, we move product 1 from shop 1 to shop 2. This change will be effective from 1 February 2014.

Move product
MATCH (p:Product{product_id:1})<-[r:SELLS]-(:Shop)
WHERE r.to = 9223372036854775807
MATCH (s:Shop{shop_id:2})
SET r.to = 1391212800000
CREATE (s)-[:SELLS{from:1391212800000,to:9223372036854775807}]->(p)

As part of this query, we archive the old SELLS relationship, setting its to property to a value representing 1 February 2014, and introduce a new SELLS relationship that connects product 1’s identity node to shop 2’s identity node. We set the from property of this new relationship to 1 February 2014, and its to property to EOT.

This is what the graph looks like after we’ve made the change:

Loading graph...

Modifying State

In the following query, we update product 1’s price. As with our structural change, this change will be effective from 1 February 2014.

Update product
MATCH (p:Product{product_id:1})
       -[r1:STATE{to:9223372036854775807}]->(currentState:ProductState)
SET r1.to = 1391212800000
CREATE (p)-[r2:STATE{from:1391212800000,to:9223372036854775807}]->
       (newState:ProductState)
SET newState = currentState
SET newState.price = 2.0

This time, we archive the product’s current STATE relationship, and then create a new state node and a new STATE relationship. We copy the product’s properties from the old state node to the new state node, and then update the price.

Loading graph...

Historical Queries

We’ve seen that to query the current state of the graph, we need to do an exact match on the to property of both structural and state relationships, matching against our EOT value. Historical queries are slightly different, in that we’re looking for relationships whose from-to period covers the point in time we’re interested in; that is, relationships where the from value is less than or equal to the point we’re interested in, and the to value is greater than that point. These kinds of comparisons must be done in the WHERE clause.

The following query finds all the products sold by shop 1 on 5 January 2014:

Find shop’s products
MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product)
WHERE (r1.from <= 1388880000000 AND r1.to > 1388880000000)
MATCH (p)-[r2:STATE]->(ps:ProductState)
WHERE (r2.from <= 1388880000000 AND r2.to > 1388880000000)
RETURN p.product_id AS productId,
       ps.name AS product,
       ps.price AS price
ORDER BY price DESC

Results:

Loading table...

And here’s the same query for 5 February 2014:

MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product)
WHERE (r1.from <= 1391558400000 AND r1.to > 1391558400000)
MATCH (p)-[r2:STATE]->(ps:ProductState)
WHERE (r2.from <= 1391558400000 AND r2.to > 1391558400000)
RETURN p.product_id AS productId,
       ps.name AS product,
       ps.price AS price
ORDER BY price DESC

Results:

Loading table...
Run
Table
Graph
Table!
Graph!
Error!
Loading