3.4.3. Aggregating functions

To calculate aggregated data, Cypher offers aggregation, analogous to SQL’s GROUP BY.

Aggregating functions take a set of values and calculate an aggregated value over them. Examples are avg() that calculates the average of multiple numeric values, or min() that finds the smallest numeric or string value in a set of values. When we say below that an aggregating function operates on a set of values, we mean these to be the result of the application of the inner expression (such as n.age) to all the records within the same aggregation group.

Aggregation can be computed over all the matching subgraphs, or it can be further divided by introducing grouping keys. These are non-aggregate expressions, that are used to group the values going into the aggregate functions.

Assume we have the following return statement:

RETURN n, count(*)

We have two return expressions: n, and count(*). The first, n, is not an aggregate function, and so it will be the grouping key. The latter, count(*) is an aggregate expression. The matching subgraphs will be divided into different buckets, depending on the grouping key. The aggregate function will then be run on these buckets, calculating an aggregate value per bucket.

To use aggregations to sort the result set, the aggregation must be included in the RETURN to be used in the ORDER BY.

The DISTINCT operator works in conjunction with aggregation. It is used to make all values unique before running them through an aggregate function. More information about DISTINCT may be found here.

Functions:

The following graph is used for the examples below:

Figure 3.22. Graph
alt

3.4.3.1. avg()

avg() returns the average of a set of numeric values.

Syntax: avg(expression)

Returns:

Either an Integer or a Float, depending on the values returned by expression and whether or not the calculation overflows.

Arguments:

Name Description

expression

An expression returning a set of numeric values.

Considerations:

Any null values are excluded from the calculation.

avg(null) returns null.

Query. 

MATCH (n:Person)
RETURN avg(n.age)

The average of all the values in the property age is returned.

Table 3.129. Result
avg(n.age)

1 row

30.0

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN avg(n.age)

3.4.3.2. collect()

collect() returns a list containing the values returned by an expression. Using this function aggregates data by amalgamating multiple records or values into a single list.

Syntax: collect(expression)

Returns:

A list containing heterogeneous elements; the types of the elements are determined by the values returned by expression.

Arguments:

Name Description

expression

An expression returning a set of values.

Considerations:

Any null values are ignored and will not be added to the list.

collect(null) returns an empty list.

Query. 

MATCH (n:Person)
RETURN collect(n.age)

All the values are collected and returned in a single list.

Table 3.130. Result
collect(n.age)

1 row

[13,33,44]

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN collect(n.age)

3.4.3.3. count()

count() returns the number of values or rows, and appears in two variants:

  • count(*) returns the number of matching rows, and
  • count(expr) returns the number of non-null values returned by an expression.

Syntax: count(expression)

Returns:

An Integer.

Arguments:

Name Description

expression

An expression.

Considerations:

count(*) includes rows returning null.

count(expr) ignores null values.

count(null) returns 0.

Using count(*) to return the number of nodes

count(*) can be used to return the number of nodes; for example, the number of nodes connected to some node n.

Query. 

MATCH (n { name: 'A' })-->(x)
RETURN labels(n), n.age, count(*)

The labels and age property of the start node n and the number of nodes related to n are returned.

Table 3.131. Result
labels(n) n.age count(*)

1 row

["Person"]

13

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n {name: 'A'})-->(x) RETURN labels(n), n.age, count(*)

Using count(*) to group and count relationship types

count(*) can be used to group relationship types and return the number.

Query. 

MATCH (n { name: 'A' })-[r]->()
RETURN type(r), count(*)

The relationship types and their group count are returned.

Table 3.132. Result
type(r) count(*)

1 row

"KNOWS"

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n {name: 'A'})-[r]->() RETURN type(r), count(*)

Using count(expression) to return the number of values

Instead of simply returning the number of rows with count(*), it may be more useful to return the actual number of values returned by an expression.

Query. 

MATCH (n { name: 'A' })-->(x)
RETURN count(x)

The number of nodes connected to the start node is returned.

Table 3.133. Result
count(x)

1 row

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n {name: 'A'})-->(x) RETURN count(x)

Counting non-null values

count(expression) can be used to return the number of non-null values returned by the expression.

Query. 

MATCH (n:Person)
RETURN count(n.age)

The number of :Person nodes having an age property is returned.

Table 3.134. Result
count(n.age)

1 row

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN count(n.age)

Counting with and without duplicates

In this example we are trying to find all our friends of friends, and count them:

  • The first aggregate function, count(DISTINCT friend_of_friend), will only count a friend_of_friend once, as DISTINCT removes the duplicates.
  • The second aggregate function, count(friend_of_friend), will consider the same friend_of_friend multiple times.

Query. 

MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

Both B and C know D and thus D will get counted twice when not using DISTINCT.

Table 3.135. Result
count(DISTINCT friend_of_friend) count(friend_of_friend)

1 row

1

2

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person) WHERE me.name = 'A' RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

3.4.3.4. max()

max() returns the maximum value in a set of values.

Syntax: max(expression)

Returns:

Either an Integer or a Float or a String, depending on the values returned by expression.

Arguments:

Name Description

expression

An expression returning a set containing either numeric values only, or string values only, or a mixture of string and numeric values.

Considerations:

Any null values are excluded from the calculation.

In a mixed list, any numeric value is always considered to be higher than any string value.

max(null) returns null.

Query. 

UNWIND [1, 'a', NULL , 0.2, 'b', '1', '99'] AS val
RETURN max(val)

The highest of all the values in the mixed list — in this case, the numeric value 1 — is returned. Note that the (string) value "99", which may appear at first glance to be the highest value in the list, is considered to be a lower value than 1 as the latter is a string.

Table 3.136. Result
max(val)

1 row

1

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val RETURN max(val)

Query. 

MATCH (n:Person)
RETURN max(n.age)

The highest of all the values in the property age is returned.

Table 3.137. Result
max(n.age)

1 row

44

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN max(n.age)

3.4.3.5. min()

min() returns the minimum value in a set of values.

Syntax: min(expression)

Returns:

Either an Integer or a Float or a String, depending on the values returned by expression.

Arguments:

Name Description

expression

An expression returning a set containing either numeric values only, or string values only, or a mixture of string and numeric values.

Considerations:

Any null values are excluded from the calculation.

In a mixed list, any string value is always considered to be lower than any numeric value.

min(null) returns null.

Query. 

UNWIND [1, 'a', NULL , 0.2, 'b', '1', '99'] AS val
RETURN min(val)

The lowest of all the values in the mixed list — in this case, the string value "1" — is returned. Note that the (numeric) value 0.2, which may appear at first glance to be the lowest value in the list, is considered to be a higher value than "1" as the latter is a string.

Table 3.138. Result
min(val)

1 row

"1"

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val RETURN min(val)

Query. 

MATCH (n:Person)
RETURN min(n.age)

The lowest of all the values in the property age is returned.

Table 3.139. Result
min(n.age)

1 row

13

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN min(n.age)

3.4.3.6. percentileCont()

percentileCont() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them. For nearest values using a rounding method, see percentileDisc.

Syntax: percentileCont(expression, percentile)

Returns:

A Float.

Arguments:

Name Description

expression

A numeric expression.

percentile

A numeric value between 0.0 and 1.0

Considerations:

Any null values are excluded from the calculation.

percentileCont(null, percentile) returns null.

Query. 

MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)

The 40th percentile of the values in the property age is returned, calculated with a weighted average. In this case, 0.4 is the median, or 40th percentile.

Table 3.140. Result
percentileCont(n.age, 0.4)

1 row

29.0

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN percentileCont(n.age, 0.4)

3.4.3.7. percentileDisc()

percentileDisc() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method and calculates the nearest value to the percentile. For interpolated values, see percentileCont.

Syntax: percentileDisc(expression, percentile)

Returns:

Either an Integer or a Float, depending on the values returned by expression and whether or not the calculation overflows.

Arguments:

Name Description

expression

A numeric expression.

percentile

A numeric value between 0.0 and 1.0

Considerations:

Any null values are excluded from the calculation.

percentileDisc(null, percentile) returns null.

Query. 

MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)

The 50th percentile of the values in the property age is returned.

Table 3.141. Result
percentileDisc(n.age, 0.5)

1 row

33

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN percentileDisc(n.age, 0.5)

3.4.3.8. stDev()

stDev() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N - 1 as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard variation of the entire population is being calculated, stdDevP should be used.

Syntax: stDev(expression)

Returns:

A Float.

Arguments:

Name Description

expression

A numeric expression.

Considerations:

Any null values are excluded from the calculation.

stDev(null) returns 0.

Query. 

MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDev(n.age)

The standard deviation of the values in the property age is returned.

Table 3.142. Result
stDev(n.age)

1 row

15.716233645501712

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n) WHERE n.name IN ['A', 'B', 'C'] RETURN stDev(n.age)

3.4.3.9. stDevP()

stDevP() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard variation of only a sample of the population is being calculated, stDev should be used.

Syntax: stDevP(expression)

Returns:

A Float.

Arguments:

Name Description

expression

A numeric expression.

Considerations:

Any null values are excluded from the calculation.

stDevP(null) returns 0.

Query. 

MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDevP(n.age)

The population standard deviation of the values in the property age is returned.

Table 3.143. Result
stDevP(n.age)

1 row

12.832251036613439

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n) WHERE n.name IN ['A', 'B', 'C'] RETURN stDevP(n.age)

3.4.3.10. sum()

sum() returns the sum of a set of numeric values.

Syntax: sum(expression)

Returns:

Either an Integer or a Float, depending on the values returned by expression.

Arguments:

Name Description

expression

An expression returning a set of numeric values.

Considerations:

Any null values are excluded from the calculation.

sum(null) returns 0.

Query. 

MATCH (n:Person)
RETURN sum(n.age)

The sum of all the values in the property age is returned.

Table 3.144. Result
sum(n.age)

1 row

90

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN sum(n.age)