GraphGists

Introduction

This gist is a complement to a blogpost that I wrote about managing hierarchical data structures in Neo4j.

In this example, we are using a "product hierarchy", essentially holding information about the composition of a product (what is it made of, how many of the components are used, and at the lowest level, what is the price of these components). The model looks like this:

Screen+Shot+2014 03 29+at+16.04.35
Figure 1. Model of a Product Hierarchy

Note that in the GraphGist, I have cut the tree depth to 5 levels (product to costs) instead of 6 in the blogpost - and that I also reduced the width of the tree to make it manageable in a gist.

Loading some data: a 5-level tree

First we have to load the data into the graph. This was a bit of work - but not difficult at all:

Creating the top of the tree, the Product (just one in this case):
CREATE (n1:Product {id:1})
Then CREATE the Cost Groups:
MATCH (n1:Product) foreach (r in range(1,3) | CREATE (n2:CostGroup {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n1))
Then add the Cost Types to the Cost Groups:
MATCH (n2:CostGroup) foreach (r in range(1,5) | CREATE (n3:CostType {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n2))
Then add the Cost Subtypes to the Cost Types:
MATCH (n3:CostType) foreach (r in range(1,3) | CREATE (n4:CostSubtype {id:r})-[:PART_OF {quantity:round(rand()*100)}]->(n3))
Then finally add the Costs to the Cost Subtypes:
MATCH (n4:CostSubtype) foreach (r in range(1,5) | CREATE (n5:COST {id:r,price:round(rand()*1000)})-[:PART_OF {quantity:round(rand()*100)}]->(n4))

The actual graph then looks like this:

Querying the hierarchy structure

Then we can do some easy queries. Let’s check the structure of the hierarchy and the number of nodes:

MATCH (n)
RETURN labels(n) AS `Kinds of Nodes`, count(n) AS `Number of Nodes`;

This is what it looks like:

Now let’s start manipulating the graph and do some interesting stuff. Let’s calculate the price of the product at the top of this hierarchy by sweeping through the graph and mutiplying price by the quantities on each ot the relationships.

//calculating price based on full sweep of the tree
MATCH (n1:Product {id:1})<-[r1]-(:CostGroup)<-[r2]-(:CostType)<-[r3]-(:CostSubtype)<-[r4]-(n5:COST)
RETURN sum(r1.quantity*r2.quantity*r3.quantity*r4.quantity*n5.price) AS `Price of Product`

Optimising the calculation with intermediate price values at every level

But maybe we can do that more efficiently by calculating intermediate prices for each of the levels in the hierarchy:

//calculate intermediate pricing
MATCH (n4:CostSubtype)<-[r4]-(n5:COST)
WITH n4,sum(r4.quantity*n5.price) AS Sum
SET n4.price=Sum;
MATCH (n3:CostType)<-[r3]-(n4:CostSubtype)
WITH n3,sum(r3.quantity*n4.price) AS Sum
SET n3.price=Sum;
MATCH (n2:CostGroup)<-[r2]-(n3:CostType)
WITH n2,sum(r2.quantity*n3.price) AS Sum
SET n2.price=Sum;
MATCH (n1:Product)<-[r1]-(n2:CostGroup)
WITH n1, sum(r1.quantity*n2.price) AS Sum
SET n1.price=Sum
RETURN Sum;

Then we can easily calculate the price of the product by just using the intermediate pricing, and scanning a MUCH smaller part of the graph:

MATCH (n1:Product {id:1})<-[r1]-(n2:CostGroup)
RETURN sum(r1.quantity*n2.price) AS `Price of Product`

We can check the accuracy by looking at a different level and verifying if we get the same result:

MATCH (n1:Product {id:1})<-[r1]-(n2:CostGroup)<-[r2]-(n3:CostType)
RETURN sum(r1.quantity*r2.quantity*n3.price) AS `Price of Product`

Yey! That seems to have confirmed the theory!

What if something changes to the hierarchy?

Now let’s see what happens if we change something to the price of one of the costs at the bottom of the tree:

MATCH (n5:COST)
WITH n5, n5.price AS OldPrice LIMIT 1
SET n5.price = n5.price*10
WITH n5.price-OldPrice AS PriceDiff,n5
MATCH (n5)-[r4:PART_OF]->(n4:CostSubtype)-[r3:PART_OF]->(n3:CostType)-[r2:PART_OF]->(n2:CostGroup)-[r1:PART_OF]-(n1:Product)
SET n4.price=n4.price+(PriceDiff*r4.quantity),
n3.price=n3.price+(PriceDiff*r4.quantity*r3.quantity),
n2.price=n2.price+(PriceDiff*r4.quantity*r3.quantity*r2.quantity),
n1.price=n1.price+(PriceDiff*r4.quantity*r3.quantity*r2.quantity*r1.quantity)
RETURN PriceDiff AS `Price Difference`, n1.price AS `New Price of Product`

Then we can also go back and replay the queries above and see what has happened in the console below:

Conclusion

I hope this gist complements the blogpost and gives you some ideas around how to work with any kind of hierarchy using Neo4j.

About the Author

This gist was created by Rik Van Bruggen