Written by Mark Needham, originally posted on His Blog
An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node.
Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in.
However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out.
I created 4 different databases which had one node with 60,000 outgoing relationships – 10,000 which we wanted to retrieve and 50,000 that were irrelevant.
I modelled the ‘relationship’ in 4 different ways…
- Using a specific relationship type
(node)-[:HAS_ADDRESS]->(address)
- Using a generic relationship type and then filtering by end node label
(node)-[:HAS]->(address:Address)
- Using a generic relationship type and then filtering by relationship property
(node)-[:HAS {type: “address”}]->(address)
- Using a generic relationship type and then filtering by end node property
(node)-[:HAS]->(address {type: “address”})
…and then measured how long it took to retrieve the ‘has address’ relationships.
The
code is on github if you want to take a look.
Although it’s obviously not as precise as a
JMH micro benchmark I think it’s good enough to get a feel for the difference between the approaches.
I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms):
Using a generic relationship type and then filtering by end node label
50%ile: 6.0 75%ile: 6.0 99%ile: 402.60999999999825
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0 75%ile: 22.0 99%ile: 504.85999999999785
Using a generic relationship type and then filtering by end node label
50%ile: 4.0 75%ile: 4.0 99%ile: 145.65999999999931
Using a specific relationship type
50%ile: 0.0 75%ile: 1.0 99%ile: 25.749999999999872
|
We can drill further into why there’s a difference in the times for each of the approaches by profiling the equivalent cypher query. We’ll start with the one which uses a specific relationship name
Using a specific relationship type
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS_ADDRESS]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+-----------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+-----------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| SimplePatternMatcher | 10000 | 10000 | n, UNNAMED53, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+-----------------------+
Total database accesses: 10001
|
Here we can see that there were 10,002 database accesses in order to get a count of our 10,000 HAS_ADDRESS relationships. We get a database access each time we load a node, relationship or property.
By contrast the other approaches have to load in a lot more data only to then filter it out:
Using a generic relationship type and then filtering by end node label
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->(:Address) return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+----------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+----------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10000 | 10000 | | hasLabel( UNNAMED45:Address(0)) |
| SimplePatternMatcher | 10000 | 60000 | n, UNNAMED45, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+----------------------------------+
Total database accesses: 70001
|
Using a generic relationship type and then filtering by relationship property
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS {type: "address"}]->() return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10000 | 20000 | | Property( UNNAMED35,type(0)) == { AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n, UNNAMED63, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
Total database accesses: 140001
|
Using a generic relationship type and then filtering by end node property
neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->({type: "address"}) return count(n);
+----------+
| count(n) |
+----------+
| 10000 |
+----------+
1 row
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+SimplePatternMatcher
|
+NodeByIdOrEmpty
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(n) |
| EagerAggregation | 1 | 0 | | |
| Filter | 10000 | 20000 | | Property( UNNAMED45,type(0)) == { AUTOSTRING1} |
| SimplePatternMatcher | 10000 | 120000 | n, UNNAMED45, UNNAMED35 | |
| NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} |
+----------------------+-------+--------+-----------------------------+--------------------------------------------------+
Total database accesses: 140001
|
So in summary…specific relationships #ftw!
Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.
Download My Ebook