Fingerprinting

The following functions calculate a hashsum over nodes, relationships or the entire graph. It takes into account all properties, node labels and relationship types.

The algorithm used for hashing may change between APOC versions. It is therefore only possible to compare the hashing results of two entities/graphs from the same graph, or from different graphs using the same apoc version.

The hashsum of a graph first calculates the hashsum for each node. The resulting hashsum list is ordered, and for each node the hashsum for all relationships and their end nodes are added. Internal ids are not included in the hashsum.

It is also possible to supply a list of propertyKeys which should be ignored on all nodes. This can be useful when storing properties, such as created=timestamp() that should be ignored.

Function name Description

apoc.hashing.fingerprint(object ANY, excludedPropertyKeys LIST<STRING>)

calculates a MD5 checksum over a NODE or RELATIONSHIP (identical entities share the same checksum). Unsuitable for cryptographic use-cases.

apoc.hashing.fingerprinting(object ANY, config MAP<STRING, ANY>)

calculates a MD5 checksum over a NODE or RELATIONSHIP (identical entities share the same checksum). Unlike apoc.hashing.fingerprint(), this function supports a number of config parameters. Unsuitable for cryptographic use-cases.

apoc.hashing.fingerprintGraph(propertyExcludes LIST<STRING>)

Calculates a MD5 checksum over the full graph. This function uses in-memory data structures. Unsuitable for cryptographic use-cases.

Configuration parameters

Table 1. apoc.hashing.fingerprinting configuration params
name type default description

digestAlgorithm

STRING

"MD5"

The algorithm used to compute the fingerprint. Supported values are: MD5, SHA-1, SHA-256

strategy

STRING

"LAZY"

Defines the filtering behaviour of nodes/relationships. Supported values are:

  • LAZY - does not include properties.

  • EAGER - includes all non-filtered properties.

nodeAllowMap

MAP<STRING, LIST<STRING>>

{}

Node label name mapped to a list of allowed properties for that label.

nodeDisallowMap

MAP<STRING, LIST<STRING>>

[]

Node label name mapped to a list of properties to ignore for that label.

relAllowMap

MAP<STRING, LIST<STRING>>

{}

Relationship type name mapped to a list of allowed properties for that type.

relDisallowMap

MAP<STRING, LIST<STRING>>

[]

Relationship type name mapped to a list of properties to ignore for that type.

mapAllowList

LIST<STRING>

[]

A list of allowed keys when the object being hashed is a map.

mapDisallowList

LIST<STRING>

[]

A list of keys to ignore when the object being hashed is a map.

allNodesAllowList

LIST<STRING>

[]

A list of globally allowed node properties

allNodesDisallowList

LIST<STRING>

[]

A list of globally ignored node properties.

allRelsAllowList

LIST<STRING>

[]

A list of globally allowed relationship properties.

allRelsDisallowList

LIST<STRING>

[]

A list of globally ignored relationship properties.

It is not possible to define both allow and disallow lists for the same entity type. Lists must consequently be either allowed or disallowed when setting the fingerprinting parameters for nodes, relationships, and maps.

Fingerprinting strategy

In case the properties defined in the configuration are not present on the node and/or relationship, it is possible to define how the fingerprinting procedure proceeds:

  • EAGER: includes all properties in the hashing if no allow/disallow lists are supplied.

  • LAZY: excludes all properties in the hashing if no allow/disallow lists are supplied.