Aggregating functions

Functions:

avg() - Numeric values
avg() - Durations
collect()
count()
max()
min()
percentileCont()
percentileDisc()
stDev()
stDevP()
sum() - Numeric values
sum() - Durations

Aggregation can be computed over all the matching paths, or it can be further divided by introducing grouping keys. Grouping keys are non-aggregate expressions, that are used to group the values going into the aggregate functions.

Assume we have the following return statement:

RETURN n, count(*)

We have two return expressions: n, and count(*). The first, n, is not an aggregate function, so it will be the grouping key. The latter, count(*) is an aggregate expression. The matching paths will be divided into different buckets, depending on the grouping key. The aggregate function will then be run on these buckets, calculating an aggregate value per bucket.

To use aggregations to sort the result set, the aggregation must be included in the RETURN to be used in the ORDER BY.

The DISTINCT operator works in conjunction with aggregation. It is used to make all values unique before running them through an aggregate function. More information about DISTINCT may be found in Syntax → Aggregation operators.

The following graph is used for the examples below:

Graph

  N0 [
    label = "{Person|name = \'A\'\lage = 13\l}"
  ]
  N0 -> N1 [
    color = "#2e3436"
    fontcolor = "#2e3436"
    label = "KNOWS\n"
  ]
  N0 -> N2 [
    color = "#2e3436"
    fontcolor = "#2e3436"
    label = "KNOWS\n"
  ]
  N0 -> N3 [
    color = "#2e3436"
    fontcolor = "#2e3436"
    label = "KNOWS\n"
  ]
  N0 -> N5 [
    color = "#4e9a06"
    fontcolor = "#4e9a06"
    label = "READS\n"
  ]
  N1 [
    label = "{Person|name = \'B\'\lage = 33\leyes = \'blue\'\l}"
  ]
  N1 -> N4 [
    color = "#2e3436"
    fontcolor = "#2e3436"
    label = "KNOWS\n"
  ]
  N2 [
    label = "{Person|name = \'C\'\lage = 44\leyes = \'blue\'\l}"
  ]
  N2 -> N4 [
    color = "#2e3436"
    fontcolor = "#2e3436"
    label = "KNOWS\n"
  ]
  N3 [
    label = "{Person|name = \'D\'\leyes = \'brown\'\l}"
  ]
  N4 [
    label = "{Person|name = \'D\'\l}"
  ]
  N5 [
    label = "{Book|name = \'Cypher\'\l}"
  ]

avg() - Numeric values

The function avg() returns the average of a set of numeric values.

Syntax: avg(expression)

Returns:

Either an Integer or a Float, depending on the values returned by expression and whether or not the calculation overflows.

Arguments:

Name Description

Name	Description
`expression`	An expression returning a set of numeric values.

expression

An expression returning a set of numeric values.

Considerations:

Any null values are excluded from the calculation.

avg(null) returns null.

Query

MATCH (n:Person)
RETURN avg(n.age)

The average of all the values in the property age is returned.

Table 1. Result
avg(n.age)
`30.0`
Rows: 1

avg() - Durations

The function avg() returns the average of a set of Durations.

Syntax: avg(expression)

Returns:

A Duration.

Arguments:

Name Description

Name	Description
`expression`	An expression returning a set of Durations.

expression

An expression returning a set of Durations.

Considerations:

Any null values are excluded from the calculation.

avg(null) returns null.

Query

UNWIND [duration('P2DT3H'), duration('PT1H45S')] AS dur
RETURN avg(dur)

The average of the two supplied Durations is returned.

Table 2. Result
avg(dur)
`P1DT2H22.5S`
Rows: 1

collect()

The function collect() returns a single aggregated list containing the values returned by an expression.

Syntax: collect(expression)

Returns:

A list containing heterogeneous elements; the types of the elements are determined by the values returned by expression.

Arguments:

Name Description

Name	Description
`expression`	An expression returning a set of values.

expression

An expression returning a set of values.

Considerations:

Any null values are ignored and will not be added to the list.

collect(null) returns an empty list.

Query

MATCH (n:Person)
RETURN collect(n.age)

All the values are collected and returned in a single list.

Table 3. Result
collect(n.age)
`[13,33,44]`
Rows: 1

count()

The function count() returns the number of values or rows, and appears in two variants:

count(*): returns the number of matching rows.
count(expr): returns the number of non-null values returned by an expression.

Syntax: count(expression)

Returns:

An Integer.

Arguments:

Name Description

Name	Description
`expression`	An expression.

expression

An expression.

Considerations:

count(*) includes rows returning null.

count(expr) ignores null values.

count(null) returns 0.

Using `count(*)` to return the number of nodes

The function count(*) can be used to return the number of nodes; for example, the number of nodes connected to some node n.

Query

MATCH (n {name: 'A'})-->(x)
RETURN labels(n), n.age, count(*)

The labels and age property of the start node n and the number of nodes related to n are returned.

Table 4. Result
labels(n)	n.age	count(*)
`["Person"]`	`13`	`4`
Rows: 1

Using `count(*)` to group and count relationship types

The function count(*) can be used to group the type of matched relationships and return the number.

Query

MATCH (n {name: 'A'})-[r]->()
RETURN type(r), count(*)

The type of matched relationships are grouped and the group count are returned.

Table 5. Result
type(r)	count(*)
`"KNOWS"`	`3`
`"READS"`	`1`
Rows: 2

Using `count(expression)` to return the number of values

Instead of simply returning the number of rows with count(*), it may be more useful to return the actual number of values returned by an expression.

Query

MATCH (n {name: 'A'})-->(x)
RETURN count(x)

The number of nodes that are connected directly (one relationship) to the node, with the name 'A', is returned.

Table 6. Result
count(x)
`4`
Rows: 1

Counting non-`null` values

The function count(expression) can be used to return the number of non-null values returned by the expression.

Query

MATCH (n:Person)
RETURN count(n.age)

The number of nodes with the label Person and a property age is returned. (If you want the sum, use sum(n.age))

Table 7. Result
count(n.age)
`3`
Rows: 1

Counting with and without duplicates

In this example we are trying to find all our friends of friends, and count them:

count(DISTINCT friend_of_friend): Will only count a friend_of_friend once, as DISTINCT removes the duplicates.
count(friend_of_friend): Will consider the same friend_of_friend multiple times.

Query

MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)

Both B and C know D and thus D will get counted twice when not using DISTINCT.

Table 8. Result
count(DISTINCT friend_of_friend)	count(friend_of_friend)
`1`	`2`
Rows: 1

max()

The function max() returns the maximum value in a set of values.

Syntax: max(expression)

Returns:

A property type, or a list, depending on the values returned by expression.

Arguments:

Name Description

Name	Description
`expression`	An expression returning a set containing any combination of property types and lists thereof.

expression

An expression returning a set containing any combination of property types and lists thereof.

Considerations:

Any null values are excluded from the calculation.

In a mixed set, any numeric value is always considered to be higher than any string value, and any string value is always considered to be higher than any list.

Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.

max(null) returns null.

Query

UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val
RETURN max(val)

The highest of all the values in the mixed set — in this case, the numeric value 1 — is returned.

The value '99' (a string), is considered to be a lower value than 1 (an integer), because '99' is a string.

Table 9. Result
max(val)
`1`
Rows: 1

Query

UNWIND [[1, 'a', 89], [1, 2]] AS val
RETURN max(val)

The highest of all the lists in the set — in this case, the list [1, 2] — is returned, as the number 2 is considered to be a higher value than the string 'a', even though the list [1, 'a', 89] contains more elements.

Table 10. Result
max(val)
`[1,2]`
Rows: 1

Query

MATCH (n:Person)
RETURN max(n.age)

The highest of all the values in the property age is returned.

Table 11. Result
max(n.age)
`44`
Rows: 1

min()

The function min() returns the minimum value in a set of values.

Syntax: min(expression)

Returns:

A property type, or a list, depending on the values returned by expression.

Arguments:

Name Description

Name	Description
`expression`	An expression returning a set containing any combination of property types and lists thereof.

expression

An expression returning a set containing any combination of property types and lists thereof.

Considerations:

Any null values are excluded from the calculation.

In a mixed set, any string value is always considered to be lower than any numeric value, and any list is always considered to be lower than any string.

Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end.

min(null) returns null.

Query

UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val
RETURN min(val)

The lowest of all the values in the mixed set — in this case, the string value "1" — is returned. Note that the (numeric) value 0.2, which may appear at first glance to be the lowest value in the list, is considered to be a higher value than "1" as the latter is a string.

Table 12. Result
min(val)
`"1"`
Rows: 1

Query

UNWIND ['d', [1, 2], ['a', 'c', 23]] AS val
RETURN min(val)

The lowest of all the values in the set — in this case, the list ['a', 'c', 23] — is returned, as (i) the two lists are considered to be lower values than the string "d", and (ii) the string "a" is considered to be a lower value than the numerical value 1.

Table 13. Result
min(val)
`["a","c",23]`
Rows: 1

Query

MATCH (n:Person)
RETURN min(n.age)

The lowest of all the values in the property age is returned.

Table 14. Result
min(n.age)
`13`
Rows: 1

percentileCont()

The function percentileCont() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them. For nearest values using a rounding method, see percentileDisc.

Syntax: percentileCont(expression, percentile)

Returns:

A Float.

Arguments:

Name Description

Name	Description
`expression`	A numeric expression.
`percentile`	A numeric value between 0.0 and 1.0

expression

A numeric expression.

percentile

A numeric value between 0.0 and 1.0

Considerations:

Any null values are excluded from the calculation.

percentileCont(null, percentile) returns null.

Query

MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)

The 40th percentile of the values in the property age is returned, calculated with a weighted average.

Table 15. Result
percentileCont(n.age, 0.4)
`29.0`
Rows: 1

percentileDisc()

The function percentileDisc() returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method and calculates the nearest value to the percentile. For interpolated values, see percentileCont.

Syntax: percentileDisc(expression, percentile)

Returns:

Either an Integer or a Float, depending on the values returned by expression and whether or not the calculation overflows.

Arguments:

Name Description

Name	Description
`expression`	A numeric expression.
`percentile`	A numeric value between 0.0 and 1.0

expression

A numeric expression.

percentile

A numeric value between 0.0 and 1.0

Considerations:

Any null values are excluded from the calculation.

percentileDisc(null, percentile) returns null.

Query

MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)

The 50th percentile of the values in the property age is returned.

Table 16. Result
percentileDisc(n.age, 0.5)
`33`
Rows: 1

stDev()

The function stDev() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N - 1 as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard variation of the entire population is being calculated, stdDevP should be used.

Syntax: stDev(expression)

Returns:

A Float.

Arguments:

Name Description

Name	Description
`expression`	A numeric expression.

expression

A numeric expression.

Considerations:

Any null values are excluded from the calculation.

stDev(null) returns 0.

Query

MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDev(n.age)

The standard deviation of the values in the property age is returned.

Table 17. Result
stDev(n.age)
`15.716233645501712`
Rows: 1

stDevP()

The function stDevP() returns the standard deviation for the given value over a group. It uses a standard two-pass method, with N as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard variation of only a sample of the population is being calculated, stDev should be used.

Syntax: stDevP(expression)

Returns:

A Float.

Arguments:

Name Description

Name	Description
`expression`	A numeric expression.

expression

A numeric expression.

Considerations:

Any null values are excluded from the calculation.

stDevP(null) returns 0.

Query

MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDevP(n.age)

The population standard deviation of the values in the property age is returned.

Table 18. Result
stDevP(n.age)
`12.832251036613439`
Rows: 1

sum() - Numeric values

The function sum() returns the sum of a set of numeric values.

Syntax: sum(expression)

Returns:

Either an Integer or a Float, depending on the values returned by expression.

Arguments:

Name Description

expression

An expression returning a set of numeric values.

Considerations:

Any null values are excluded from the calculation.

sum(null) returns 0.

Query

MATCH (n:Person)
RETURN sum(n.age)

The sum of all the values in the property age is returned.

Table 19. Result
sum(n.age)
`90`
Rows: 1

sum() - Durations

The function sum() returns the sum of a set of durations.

Syntax: sum(expression)

Returns:

A Duration.

Arguments:

Name Description

expression

An expression returning a set of Durations.

Considerations:

Any null values are excluded from the calculation.

Query

UNWIND [duration('P2DT3H'), duration('PT1H45S')] AS dur
RETURN sum(dur)

The sum of the two supplied Durations is returned.

Table 20. Result
sum(dur)
`P2DT4H45S`
Rows: 1

Aggregating functions

avg() - Numeric values

avg() - Durations

collect()

count()

Using count(*) to return the number of nodes

Using count(*) to group and count relationship types

Using count(expression) to return the number of values

Counting non-null values

Counting with and without duplicates

max()

min()

percentileCont()

percentileDisc()

stDev()

stDevP()

sum() - Numeric values

sum() - Durations

Using `count(*)` to return the number of nodes

Using `count(*)` to group and count relationship types

Using `count(expression)` to return the number of values

Counting non-`null` values