Using the following Cypher

MATCH (n) WHERE rand() <= 0.1
DISTINCT labels(n),
count(*) AS NumofNodes,
avg(size(keys(n))) AS AvgNumOfPropPerNode,
min(size(keys(n))) AS MinNumPropPerNode,
max(size(keys(n))) AS MaxNumPropPerNode,
avg(size((n)-[]-())) AS AvgNumOfRelationships,
min(size((n)-[]-())) AS MinNumOfRelationships,
max(size((n)-[]-())) AS MaxNumOfRelationships

will produce an ‘inventory’ of the nodes within the graph and statistics related to number of Nodes per label, average number of properties, minimum number of properties, maximum number of properties, average number of relationships, minimum number of relationships and maximum number of relationships. This Cypher can be used to help in the understanding in terms of performance and/or database growth.

This above Cypher does perform an entire graph traversal and then will ‘sample’ out 90% of the nodes by way of inclusion of ‘rand()⇐ 0.1’. As a result the numbers returned are effectively a 10% sample of the graph.

With Neo4j 3.0, the above query is included as a Favorite within the browser and is defined under Data Profiling / What kind of nodes exist.


Dana Canzano
Applicable versions: