apoc.diff.graphs procedure
The procedure accepts 2 string argument, the source and the dest, representing 2 queries to compare,
and an optional config map as a 3rd parameter.
The procedure compares the source and dest two graphs and returns the differences in terms of:
-
same node count
-
same count per label
-
same relationship counter
-
same count per rel-type
For each node in the source graph with a certain label, find the same node (in the dest graph) by keys or internal id in the other graph and if found:
* compare all labels
* compare all properties
Please note that the node finding leverage the existing constraint to find an equivalent node. To find a node using the internal id you can use findById:true config (see below).
For each relationship in the source graph we’ll get the two nodes of the relationship and look into the dest
graph if there is a relationship with the same properties and the same start/end node.
The procedure support the following config parameters:
| name | type | default | description |
|---|---|---|---|
findById |
boolean |
false |
to find a node by id, instead of using existing constraint |
relsInBetween |
boolean |
false |
if enabled consider other terminal nodes, in case of query returning relationships and start or end nodes. |
boltConfig |
Map |
{} |
to provide additional configs to the |
source |
Map |
{} |
see below |
dest |
Map |
{} |
see below |
The source and dest maps are applied to respectively to the 1st and the 2nd procedure arguments, they can have the following keys:
| name | type | default | description |
|---|---|---|---|
target |
Map |
{} |
see below |
params |
Map |
{} |
to pass additional query params |
The target param accepts:
| name | type | default | description |
|---|---|---|---|
type |
Enum[URL, DATABASE] |
|
to search the |
value |
String |
{} |
in case of config |
Usage examples
Given this dataset in a neo4j:
CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE;
CREATE (m:Person {name: 'Michael Jordan', age: 54});
CREATE (q:Person {name: 'Tom Burton', age: 23})
CREATE (p:Person {name: 'John William', age: 22})
CREATE (q)-[:KNOWS{since:2016, time:time('125035.556+0100')}]->(p);
We can compare 2 set in the same database:
CALL apoc.diff.graphs("MATCH (start:Person) WHERE start.age < $age RETURN start", "MATCH (start:Person) WHERE start.age > $age RETURN start", {source: {params: {age: 25}}, dest: {params: {age: 25}}})
| difference | entityType | id | sourceLabel | destLabel | source | dest |
|---|---|---|---|---|---|---|
"Total count" |
"Node" |
null |
null |
null |
2 |
1 |
"Count by Label" |
"Node" |
null |
null |
null |
{"Person": 2 } |
|
{"Person": 1 } |
"Destination Entity not found" |
"Node" |
1 |
"Person" |
null |
{"name": "Tom Burton" } |
null |
"Destination Entity not found" |
"Node" |
2 |
"Person" |
null{"name": "John William" } |
null |
If we create another dataset in a new secondDb database:
CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE;
CREATE (m:Person:Other {name: 'Michael Jordan', age: 54}),
(n:Person {name: 'Tom Burton', age: 47}),
(q:Person:Other {name: 'Jerry Burton', age: 23}),
(p:Person {name: 'Jack William', age: 22}),
(q)-[:KNOWS{since:1999, time:time('125035.556+0100')}]->(p);
We can execute, in the neo4j database:
CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end",
"MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end",
{dest: {target: {type: "DATABASE", value: "secondDb"}}})
| difference | entityType | id | sourceLabel | destLabel | source | dest |
|---|---|---|---|---|---|---|
"Destination Entity not found" |
"Node" |
1 |
"Person" |
null |
{"name": "Tom Burton" } |
null |
"Destination Entity not found" |
"Node" |
2 |
"Person" |
null |
{"name": "John William"} |
null |
"Destination Entity not found" |
"Relationship" |
0 |
"KNOWS" |
null |
{"start":{"name":"Tom Burton"},"end":{"name":"John William"},"properties":{"time":"12:50:35.556000000+01:00","since":2016}} |
null |
Vice versa, we can compare 2 dataset starting from the secondDb database:
CALL apoc.diff.graphs("MATCH (node:Person) RETURN node",
"MATCH (node:Person) RETURN node",
{dest: {target: {type: "DATABASE", value: "neo4j"}}})
| difference | entityType | id | sourceLabel | destLabel | source | dest |
|---|---|---|---|---|---|---|
"Total count" |
"Node" |
null |
null |
null |
6 |
3 |
"Count by Label" |
"Node" |
null |
null |
null |
{"Person": 4, "Other": 2 } |
|
{"Person": 3 } |
"Different Labels" |
"Node" |
0 |
"Person" |
"Person" |
["Other", "Person"] |
["Person"] |
"Different Properties" |
"Node" |
1 |
"Person" |
"Person" |
{"age": 47 } |
{"age": 23 } |
"Destination Entity not found" |
"Node" |
2 |
"Person" |
null |
{"name": "Jerry Burton" } |
null |
"Destination Entity not found" |
"Node" |
7 |
"Person" |
null |
{"name": "Jack William" } |
If we create another dbms instance with the same dataset as seconddb we can compare the 2 graph leveraging the apoc.bolt.load:
CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end", "MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end", {dest: {target: {type: "URL", value: "<MY_BOLT_URL>"}}})
| difference | entityType | id | sourceLabel | destLabel | source | dest |
|---|---|---|---|---|---|---|
"Destination Entity not found" |
"Node" |
1 |
"Person" |
null |
{"name": "Tom Burton" } |
null |
"Destination Entity not found" |
"Node" |
2 |
"Person" |
null |
{"name": "John William"} |
null |
"Destination Entity not found" |
"Relationship" |
0 |
"KNOWS" |
null |
{"start":{"name":"Tom Burton"},"end":{"name":"John William"},"properties":{"time":"12:50:35.556000000+01:00","since":2016}} |
null |
If we want to point to a secondDestDb database present in a remote target instance, we can pass the boltConfig parameter to pass additional parameter to apoc.bolt.load(url, query, params, <boltConfig>).
In this case we can pass the databaseName, that is:
CALL apoc.diff.graphs("MATCH p = (start:Person)-[rel:KNOWS]->(end) RETURN start, rel, end", "MATCH p = (start)-[rel:KNOWS]->(end) RETURN start, rel, end", {boltConfig: {databaseName: "secondDestDb"}, dest: {target: {type: "URL", value: "bolt://neo4j:apoc@localhost:7687"}}})
with the same result as above, if the dataset is the same.