3.4.3. Aggregating functions

To calculate aggregated data, Cypher offers aggregation, much like SQL’s GROUP BY.

Aggregate functions take multiple input values and calculate an aggregated value from them. Examples are avg() that calculates the average of multiple numeric values, or min() that finds the smallest numeric value in a set of values.

Aggregation can be done over all the matching subgraphs, or it can be further divided by introducing key values. These are non-aggregate expressions, that are used to group the values going into the aggregate functions.

So, if the return statement looks something like this:

RETURN n, count(*)

We have two return expressions: n, and count(*). The first, n, is not an aggregate function, and so it will be the grouping key. The latter, count(*) is an aggregate expression. So the matching subgraphs will be divided into different buckets, depending on the grouping key. The aggregate function will then run on these buckets, calculating the aggregate values.

If you want to use aggregations to sort your result set, the aggregation must be included in the RETURN to be used in your ORDER BY.

The last piece of the puzzle is the DISTINCT operator. It is used to make all values unique before running them through an aggregate function. More information about DISTINCT may be found here.

The following graph is used for the examples below:

Figure 3.22. Graph
alt

3.4.3.1. avg()

avg() calculates the average of a numeric column.

Syntax: avg(expression)

Arguments:

Name Description

expression

A numeric expression.

Query. 

MATCH (n:Person)
RETURN avg(n.age)

The average of all the values in the property age is returned by the example query.

Table 3.129. Result
avg(n.age)

1 row

30.0

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN avg(n.age)

3.4.3.2. collect()

collect() collects all the values into a list. It will ignore null values.

Syntax: collect(expression)

Arguments:

Name Description

expression

An expression.

Query. 

MATCH (n:Person)
RETURN collect(n.age)

Returns a single row, with all the values collected.

Table 3.130. Result
collect(n.age)

1 row

[13,33,44]

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN collect(n.age)

3.4.3.3. count()

count() is used to count the number of rows. count() can be used in two forms — count(*) which just counts the number of matching rows, and count(<expression>), which counts the number of non-null values in <expression>.

Syntax: count(expression)

Arguments:

Name Description

expression

An expression.

Count nodes

count(*) can be used to count the number of nodes; for example, the number of nodes connected to one node.

Query. 

MATCH (n { name: 'A' })-->(x)
RETURN labels(n), n.age, count(*)

This returns the start node and the count of related nodes.

Table 3.131. Result
labels(n) n.age count(*)

1 row

["Person"]

13

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n {name: 'A'})-->(x) RETURN labels(n), n.age, count(*)

Group count relationship types

To count the groups of relationship types, return the types and count them with count(*).

Query. 

MATCH (n { name: 'A' })-[r]->()
RETURN type(r), count(*)

The relationship types and their group count is returned.

Table 3.132. Result
type(r) count(*)

1 row

"KNOWS"

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n {name: 'A'})-[r]->() RETURN type(r), count(*)

Count entities

Instead of counting the number of results with count(*), it might be more expressive to include the name of the variable you care about.

Query. 

MATCH (n { name: 'A' })-->(x)
RETURN count(x)

The number of nodes connected to the start node is returned.

Table 3.133. Result
count(x)

1 row

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n {name: 'A'})-->(x) RETURN count(x)

Count non-null values

The non-null values can be counted by using count(<expression>).

Query. 

MATCH (n:Person)
RETURN count(n.age)

The count of related nodes with the age property set is returned.

Table 3.134. Result
count(n.age)

1 row

3

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN count(n.age)

Counting with and without duplicates

In this example we are trying to find all our friends of friends, and count them. The first aggregate function, count(DISTINCT friend_of_friend), will only see a friend_of_friend once — DISTINCT removes the duplicates. The latter aggregate function, count(friend_of_friend), might very well see the same friend_of_friend multiple times. In this case, both B and C know D and thus D will get counted twice, when not using DISTINCT.

Query. 

MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

Table 3.135. Result
count(DISTINCT friend_of_friend) count(friend_of_friend)

1 row

1

2

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person) WHERE me.name = 'A' RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

3.4.3.4. max()

max() takes a numeric property as input, and returns the highest value in that column.

Syntax: max(expression)

Arguments:

Name Description

expression

A numeric expression.

Query. 

MATCH (n:Person)
RETURN max(n.age)

The highest of all the values in the property age is returned.

Table 3.136. Result
max(n.age)

1 row

44

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN max(n.age)

3.4.3.5. min()

min() takes a numeric property as input, and returns the lowest value in that column.

Syntax: min(expression)

Arguments:

Name Description

expression

A numeric expression.

Query. 

MATCH (n:Person)
RETURN min(n.age)

The lowest of all the values in the property age is returned.

Table 3.137. Result
min(n.age)

1 row

13

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN min(n.age)

3.4.3.6. percentileCont()

percentileCont() calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values, if the desired percentile lies between them. For nearest values using a rounding method, see percentileDisc.

Syntax: percentileCont(expression, percentile)

Arguments:

Name Description

expression

A numeric expression.

percentile

A numeric value between 0.0 and 1.0

Query. 

MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)

The 40th percentile of the values in the property age is returned by the example query, calculated with a weighted average.

Table 3.138. Result
percentileCont(n.age, 0.4)

1 row

29.0

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN percentileCont(n.age, 0.4)

3.4.3.7. percentileDisc()

percentileDisc() calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method, returning the nearest value to the percentile. For interpolated values, see percentileCont.

Syntax: percentileDisc(expression, percentile)

Arguments:

Name Description

expression

A numeric expression.

percentile

A numeric value between 0.0 and 1.0

Query. 

MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)

The 50th percentile of the values in the property age is returned by the example query. In this case, 0.5 is the median, or 50th percentile.

Table 3.139. Result
percentileDisc(n.age, 0.5)

1 row

33

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN percentileDisc(n.age, 0.5)

3.4.3.8. stDev()

stDev() calculates the standard deviation for a given value over a group. It uses a standard two-pass method, with N - 1 as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard variation of the entire population is being calculated, stdDevP should be used.

Syntax: stDev(expression)

Arguments:

Name Description

expression

A numeric expression.

Query. 

MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDev(n.age)

The standard deviation of the values in the property age is returned.

Table 3.140. Result
stDev(n.age)

1 row

15.716233645501712

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n) WHERE n.name IN ['A', 'B', 'C'] RETURN stDev(n.age)

3.4.3.9. stDevP()

stDevP() calculates the standard deviation for a given value over a group. It uses a standard two-pass method, with N as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard variation of only a sample of the population is being calculated, stDev should be used.

Syntax: stDevP(expression)

Arguments:

Name Description

expression

A numeric expression.

Query. 

MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDevP(n.age)

The population standard deviation of the values in the property age is returned by the example query.

Table 3.141. Result
stDevP(n.age)

1 row

12.832251036613439

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n) WHERE n.name IN ['A', 'B', 'C'] RETURN stDevP(n.age)

3.4.3.10. sum()

sum() simply adds up all the numeric values it encounters. Any null values are silently dropped.

Syntax: sum(expression)

Arguments:

Name Description

expression

A numeric expression.

Query. 

MATCH (n:Person)
RETURN sum(n.age)

This returns the sum of all the values in the property age.

Table 3.142. Result
sum(n.age)

1 row

90

Try this query live.  CREATE (a:Person {name: 'A', age: 13}), (b:Person {name: 'B', age: 33, eyes: 'blue'}), (c:Person {name: 'C', age: 44, eyes: 'blue'}), (d1:Person {name: 'D', eyes: 'brown'}), (d2:Person {name: 'D'}), (a)-[:KNOWS]->(d1), (a)-[:KNOWS]->(c), (a)-[:KNOWS]->(b), (c)-[:KNOWS]->(d2), (b)-[:KNOWS]->(d2) MATCH (n:Person) RETURN sum(n.age)