Aggregating functions
An aggregating function performs a calculation over a set of values, returning a single value. Aggregation can be computed over all the matching paths, or it can be further divided by introducing grouping keys.
To learn more about how Cypher^{®} handles aggregations performed on zero rows, refer to Neo4j Knowledge Base → Understanding aggregations on zero rows. 
Example graph
The following graph is used for the examples below:
To recreate the graph, run the following query against an empty Neo4j database:
CREATE
(keanu:Person {name: 'Keanu Reeves', age: 58}),
(liam:Person {name: 'Liam Neeson', age: 70}),
(carrie:Person {name: 'Carrie Anne Moss', age: 55}),
(guy:Person {name: 'Guy Pearce', age: 55}),
(kathryn:Person {name: 'Kathryn Bigelow', age: 71}),
(speed:Movie {title: 'Speed'}),
(keanu)[:ACTED_IN]>(speed),
(keanu)[:KNOWS]>(carrie),
(keanu)[:KNOWS]>(liam),
(keanu)[:KNOWS]>(kathryn),
(carrie)[:KNOWS]>(guy),
(liam)[:KNOWS]>(guy)
avg()
Syntax 


Description 
Returns the average of a set of 

Arguments 
Name 
Type 
Description 


A value aggregated to form an average. 

Returns 

Any 

MATCH (p:Person)
RETURN avg(p.age)
The average of all the values in the property age
is returned:
avg(p.age) 


Rows: 1 
UNWIND [duration('P2DT3H'), duration('PT1H45S')] AS dur
RETURN avg(dur)
The average of the two supplied DURATION
values is returned:
avg(dur) 


Rows: 1 
collect()
Syntax 


Description 
Returns a list containing the values returned by an expression. 

Arguments 
Name 
Type 
Description 


A value aggregated into a list. 

Returns 

Any 

MATCH (p:Person)
RETURN collect(p.age)
All the values are collected and returned in a single list:
collect(p.age) 


Rows: 1 
count()
Syntax 


Description 
Returns the number of values or rows. 

Arguments 
Name 
Type 
Description 


A value to be aggregated. 

Returns 




Neo4j maintains a transactional count store for holding count metadata, which can significantly increase the speed of queries using the 
Using count(*)
to return the number of nodes
The function count(*)
can be used to return the number of nodes; for example, the number of nodes connected to a node n
.
MATCH (p:Person {name: 'Keanu Reeves'})>(x)
RETURN labels(p), p.age, count(*)
The labels and age
property of the start node Keanu Reeves
and the number of nodes related to it are returned:
labels(p)  p.age  count(*) 




Rows: 1 
Using count(*)
to group and count relationship types
The function count(*)
can be used to group the type of matched relationships and return the number of types.
MATCH (p:Person {name: 'Keanu Reeves'})[r]>()
RETURN type(r), count(*)
The type of matched relationships are grouped and the group count of relationship types is returned:
type(r)  count(*) 





Rows: 2 
Counting nonnull
values
Instead of simply returning the number of rows with count(*)
, the function count(expression)
can be used to return the number of nonnull
values returned by the expression.
MATCH (p:Person)
RETURN count(p.age)
The number of nodes with the label Person
and a property age
is returned:
(To calculate the sum, use sum(n.age)
)
count(p.age) 


Rows: 1 
Counting with and without duplicates
The default behavior of the count
function is to count all matching results, including duplicates.
To avoid counting duplicates, use the DISTINCT
keyword.
As of Neo4j 5.15, it is also possible to use the ALL
keyword with aggregating functions.
This will count all results, including duplicates, and is functionally the same as not using the DISTINCT
keyword.
The ALL
keyword was introduced as part of Cypher’s GQL conformance.
This example tries to find all friends of friends of Keanu Reeves
and count them.
It shows the behavior of using both the ALL
and the DISTINCT
keywords:
MATCH (p:Person)>(friend:Person)>(friendOfFriend:Person)
WHERE p.name = 'Keanu Reeves'
RETURN friendOfFriend.name, count(friendOfFriend), count(ALL friendOfFriend), count(DISTINCT friendOfFriend)
The nodes Carrie Anne Moss
and Liam Neeson
both have an outgoing KNOWS
relationship to Guy Pearce
.
The Guy Pearce
node will, therefore, get counted twice when not using DISTINCT
.
friendOfFriend.name  count(friendOfFriend)  count(ALL friendOfFriend)  count(DISTINCT friendOfFriend) 





max()
Syntax 


Description 
Returns the maximum value in a set of values. 

Arguments 
Name 
Type 
Description 


A value to be aggregated. 

Returns 

Any 
In a mixed set, any numeric value is always considered to be higher than any 
Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end. 

UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val
RETURN max(val)
The highest of all the values in the mixed set — in this case, the numeric value 1
— is returned:
max(val) 


Rows: 1 
The value 
UNWIND [[1, 'a', 89], [1, 2]] AS val
RETURN max(val)
The highest of all the lists in the set — in this case, the list [1, 2]
— is returned, as the number 2
is considered to be a higher value than the STRING
'a'
, even though the list [1, 'a', 89]
contains more elements.
max(val) 


Rows: 1 
MATCH (p:Person)
RETURN max(p.age)
The highest of all the values in the property age
is returned:
max(p.age) 


Rows: 1 
min()
Syntax 


Description 
Returns the minimum value in a set of values. 

Arguments 
Name 
Type 
Description 


A value to be aggregated. 

Returns 

Any 
In a mixed set, any 
Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end. 

UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val
RETURN min(val)
The lowest of all the values in the mixed set — in this case, the STRING
value "1"
— is returned.
Note that the (numeric) value 0.2
, which may appear at first glance to be the lowest value in the list, is considered to be a higher value than "1"
as the latter is a STRING
.
min(val) 


Rows: 1 
UNWIND ['d', [1, 2], ['a', 'c', 23]] AS val
RETURN min(val)
The lowest of all the values in the set — in this case, the list ['a', 'c', 23]
— is returned, as (i) the two lists are considered to be lower values than the STRING
"d"
, and (ii) the STRING
"a"
is considered to be a lower value than the numerical value 1
.
min(val) 


Rows: 1 
MATCH (p:Person)
RETURN min(p.age)
The lowest of all the values in the property age
is returned:
min(p.age) 


Rows: 1 
percentileCont()
Syntax 


Description 
Returns the percentile of a value over a group using linear interpolation. 

Arguments 
Name 
Type 
Description 


A value to be aggregated. 



A percentile between 0.0 and 1.0. 

Returns 

Any 

MATCH (p:Person)
RETURN percentileCont(p.age, 0.4)
The 40th percentile of the values in the property age
is returned, calculated with a weighted average:
percentileCont(p.age, 0.4) 


Rows: 1 
percentileDisc()
Syntax 


Description 
Returns the nearest 

Arguments 
Name 
Type 
Description 


A value to be aggregated. 



A percentile between 0.0 and 1.0. 

Returns 

Any 

MATCH (p:Person)
RETURN percentileDisc(p.age, 0.5)
The 50th percentile of the values in the property age
is returned:
percentileDisc(p.age, 0.5) 


Rows: 1 
stDev()
Syntax 


Description 
Returns the standard deviation for the given value over a group for a sample of a population. 

Arguments 
Name 
Type 
Description 


The value to calculate the standard deviation of. 

Returns 

Any 

MATCH (p:Person)
WHERE p.name IN ['Keanu Reeves', 'Liam Neeson', 'Carrie Anne Moss']
RETURN stDev(p.age)
The standard deviation of the values in the property age
is returned:
stDev(p.age) 


Rows: 1 
stDevP()
Syntax 


Description 
Returns the standard deviation for the given value over a group for an entire population. 

Arguments 
Name 
Type 
Description 


The value to calculate the population standard deviation of. 

Returns 

Any 

MATCH (p:Person)
WHERE p.name IN ['Keanu Reeves', 'Liam Neeson', 'Carrie Anne Moss']
RETURN stDevP(p.age)
The population standard deviation of the values in the property age
is returned:
stDevP(p.age) 


Rows: 1 
sum()
Syntax 


Description 
Returns the sum of a set of 

Arguments 
Name 
Type 
Description 


A value to be aggregated. 

Returns 

Any 

MATCH (p:Person)
RETURN sum(p.age)
The sum of all the values in the property age
is returned:
sum(p.age) 


Rows: 1 
UNWIND [duration('P2DT3H'), duration('PT1H45S')] AS dur
RETURN sum(dur)
The sum of the two supplied durations is returned:
sum(dur) 


Rows: 1 
Aggregating expressions and grouping keys
Aggregating expressions are expressions which contain one or more aggregating functions.
A simple aggregating expression consists of a single aggregating function.
For instance, sum(x.a)
is an aggregating expression that only consists of the aggregating function sum( )
with x.a
as its argument.
Aggregating expressions are also allowed to be more complex, where the result of one or more aggregating functions are input arguments to other expressions.
For instance, 0.1 * (sum(x.a) / count(x.b))
is an aggregating expression that contains two aggregating functions, sum( )
with x.a
as its argument and count( )
with x.b
as its argument.
Both are input arguments to the division expression.
Grouping keys are nonaggregating expressions that are used to group the values going into the aggregating functions.
For example, given the following query containing two return expressions, n
and count(*)
:
RETURN n, count(*)
The first, n
is not an aggregating function, so it will be the grouping key.
The latter, count(*)
is an aggregating function.
The matching paths will be divided into different buckets, depending on the grouping key.
The aggregating function will then be run on these buckets, calculating an aggregate value per bucket.
The input expression of an aggregating function can contain any expression, including expressions that are not grouping keys.
However, not all expressions can be composed with aggregating functions.
The example below will throw an error since n.x
, which is not a grouping key, is combined with the aggregating function count(*)
.
RETURN n.x + count(*)
To sort the result set using aggregating functions, the aggregation must be included in the ORDER BY
subclause following the RETURN
clause.
Examples
MATCH (p:Person)
RETURN max(p.age)
max(p.age) 


Rows: 1 
MATCH (p:Person)
RETURN max(p.age) + 1
max(p.age) + 1 


Rows: 1 
Note that p
is a grouping key:
MATCH (p:Person{name:'Keanu Reeves'})[:KNOWS](f:Person)
RETURN p, p.age  max(f.age)
p  p.age  max(f.age) 



Rows: 1 
Note that p.age
is a grouping key:
MATCH (p:Person {name:'Keanu Reeves'})[:KNOWS](f:Person)
RETURN p.age, p.age  max(f.age)
p.age  p.age  max(f.age) 



Rows: 1 
Grouping keys themselves can be complex expressions. For better query readability, Cypher only recognizes a subexpression in aggregating expressions as a grouping key if the grouping key is either:

A variable  e.g. the
p
inRETURN p, p.age  max(f.age)
. 
A property access  e.g. the
p.age
inRETURN p.age, p.age  max(f.age)
. 
A map access  e.g. the
p.age
inWITH {name:'Keanu Reeves', age:58} AS p RETURN p.age, p.age  max(p.age)
.
If more complex grouping keys are needed as operands in aggregating expression, it is always possible to project them in advance using WITH
.
Using the property p.age
will throw an exception, since p.age
is not a grouping key.
Therefore, it cannot be used in the expressions which contain the aggregating function.
The below two queries would consequently return the same error message:
MATCH (p:Person {name:'Keanu Reeves'})[:KNOWS](f:Person)
RETURN p.age  max(f.age)
MATCH (p:Person {name:'Keanu Reeves'})[:KNOWS](f:Person)
RETURN p.age + p.age, p.age + p.age  max(f.age)
Aggregation column contains implicit grouping expressions. For example, in 'RETURN n.a, n.a + n.b + count(*)' the aggregation expression 'n.a + n.b + count(*)' includes the implicit grouping key 'n.b'. It may be possible to rewrite the query by extracting these grouping/aggregation expressions into a preceding WITH clause. Illegal expression(s): n.age
However, the latter query would work if rewritten to:
MATCH (p:Person {name:'Keanu Reeves'})[:KNOWS](f:Person)
WITH p.age + p.age AS groupingKey, f
RETURN groupingKey, groupingKey  max(f.age)
groupingKey  groupingKey  max(f.age) 



Rows: 1 
Rules for aggregating expressions
For aggregating expressions to be correctly computable for the buckets formed by the grouping key(s), they have to fulfill some requirements. Specifically, each subexpression in an aggregating expression has to be either:

an aggregating function, e.g.
sum(x.a)
. 
a constant, e.g.
0.1
. 
a parameter, e.g.
$param
. 
a grouping key, e.g. the
a
inRETURN a, count(*)
. 
a local variable, e.g. the
x
incount(*) + size([ x IN range(1, 10)  x ])
. 
a subexpression, all operands of which have to be allowed in an aggregating expression.