If you are looking to compare 2 graphs (or sub-graphs) to determine if they are equivalent, the following Cypher will produce a md5sum of the nodes and properties to make that comparison. For example, you may wish to compare a test/QA instance with a production instance.
Neo4j 3.1 forward
and when run against the default Movie Graph which includes 38 nodes with a label of Movie, this returns:
The above Cypher requires the installation of the apoc stored procedures set.
In the above example, we are examining all nodes with the label :Movie and producing a md5sum of all properties those nodes, using that sum to produce a md5sum hash.
To get correct results we need to order the nodes by a property value that is both defined for each node and unqiue. For this reason you might want to use a property that is defined as a property existence constraint and unique property constraint.
For example if the :Movie nodes had multiple nodes with the same title property, and since the Cypher above is ordering by n.title,
then the results are passed to the md5 stored procedure in the order they are found. This is typically based upon the order the nodes were created. If you had two :Movie nodes with
title='The Matrix' created with the following Cypher:
then simply running the Cypher to produce the md5 hash will produce a md5_property of:
However, if you reversed the order of the CREATE statements, like this:
the result of the same md5 hashing Cypher will yield a different md5_property:
In the above example, so as to get the correct md5 values, regardless of the order of the creates, we need to run Cypher which will return data in a guarenteed order, using an ORDER BY clause:
which will always return:
NOTE: Additionally, we cannot simply collect(n) (i.e. the entire node) for internally it includes the internal node id (a unique internal identifier).
If you run the same Cypher on two separate environments and get the same md5 sums, the nodes can be proven to be the same in terms of defintion of labels and proerties.