# Aggregating functions

Aggregating functions take a set of values and calculate an aggregated value over them.

Functions:

Aggregation can be computed over all the matching paths, or it can be further divided by introducing grouping keys. Grouping keys are non-aggregate expressions, that are used to group the values going into the aggregate functions.

Assume we have the following return statement:

``RETURN n, count(*)``

We have two return expressions: `n`, and `count(*)`. The first, `n`, is not an aggregate function, so it will be the grouping key. The latter, `count(*)` is an aggregate expression. The matching paths will be divided into different buckets, depending on the grouping key. The aggregate function will then be run on these buckets, calculating an aggregate value per bucket.

The input expression of an aggregation function can contain any expression, including expressions that are not grouping keys. However, not all expressions can be composed with aggregation functions. The example below will throw an error since we compose `n.x`, which is not a grouping key, with the aggregation expression `count(*)`. For more information see Grouping keys.

``RETURN n.x + count(*)``

To use aggregations to sort the result set, the aggregation must be included in the `RETURN` to be used in the `ORDER BY`.

The `DISTINCT` operator works in conjunction with aggregation. It is used to make all values unique before running them through an aggregate function. More information about `DISTINCT` may be found in Syntax → Aggregation operators.

The following graph is used for the examples below: ## avg() - Numeric values

The function `avg()` returns the average of a set of numeric values.

Syntax:

``avg(expression)``

Returns:

 Either an Integer or a Float, depending on the values returned by `expression` and whether or not the calculation overflows.

Arguments:

Name Description

`expression`

An expression returning a set of numeric values.

Considerations:

 Any `null` values are excluded from the calculation. `avg(null)` returns `null`.
Example 1. avg()
Query
``````MATCH (n:Person)
RETURN avg(n.age)``````

The average of all the values in the property `age` is returned.

Table 1. Result
avg(n.age)

`30.0`

Rows: 1

## avg() - Durations

The function `avg()` returns the average of a set of Durations.

Syntax:

``avg(expression)``

Returns:

 A Duration.

Arguments:

Name Description

`expression`

An expression returning a set of Durations.

Considerations:

 Any `null` values are excluded from the calculation. `avg(null)` returns `null`.
Example 2. avg()
Query
``````UNWIND [duration('P2DT3H'), duration('PT1H45S')] AS dur
RETURN avg(dur)``````

The average of the two supplied Durations is returned.

Table 2. Result
avg(dur)

`P1DT2H22.5S`

Rows: 1

## collect()

The function `collect()` returns a single aggregated list containing the values returned by an expression.

Syntax:

``collect(expression)``

Returns:

 A list containing heterogeneous elements; the types of the elements are determined by the values returned by `expression`.

Arguments:

Name Description

`expression`

An expression returning a set of values.

Considerations:

 Any `null` values are ignored and will not be added to the list. `collect(null)` returns an empty list.
Example 3. collect()
Query
``````MATCH (n:Person)
RETURN collect(n.age)``````

All the values are collected and returned in a single list.

Table 3. Result
collect(n.age)

`[13,33,44]`

Rows: 1

## count()

The function `count()` returns the number of values or rows, and appears in two variants:

`count(*)`

returns the number of matching rows.

`count(expr)`

returns the number of non-`null` values returned by an expression.

Syntax:

``count(expression)``

Returns:

 An Integer.

Arguments:

Name Description

`expression`

An expression.

Considerations:

 `count(*)` includes rows returning `null`. `count(expr)` ignores `null` values. `count(null)` returns `0`.

### Using `count(*)` to return the number of nodes

The function `count(*)` can be used to return the number of nodes; for example, the number of nodes connected to some node `n`.

Example 4. count()
Query
``````MATCH (n {name: 'A'})-->(x)
RETURN labels(n), n.age, count(*)``````

The labels and `age` property of the start node `n` and the number of nodes related to `n` are returned.

Table 4. Result
labels(n) n.age count(*)

`["Person"]`

`13`

`4`

Rows: 1

### Using `count(*)` to group and count relationship types

The function `count(*)` can be used to group the type of matched relationships and return the number.

Example 5. count()
Query
``````MATCH (n {name: 'A'})-[r]->()
RETURN type(r), count(*)``````

The type of matched relationships are grouped and the group count are returned.

Table 5. Result
type(r) count(*)

`"KNOWS"`

`3`

`"READS"`

`1`

Rows: 2

### Counting non-`null` values

Instead of simply returning the number of rows with `count(*)`, the function `count(expression)` can be used to return the number of non-`null` values returned by the expression.

Example 6. count()
Query
``````MATCH (n:Person)
RETURN count(n.age)``````

The number of nodes with the label `Person` and a property `age` is returned. (If you want the sum, use `sum(n.age)`)

Table 6. Result
count(n.age)

`3`

Rows: 1

### Counting with and without duplicates

In this example we are trying to find all our friends of friends, and count them:

`count(DISTINCT friend_of_friend)`

Will only count a `friend_of_friend` once, as `DISTINCT` removes the duplicates.

`count(friend_of_friend)`

Will consider the same `friend_of_friend` multiple times.

Example 7. count()
Query
``````MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)``````

Both `B` and `C` know `D` and thus `D` will get counted twice when not using `DISTINCT`.

Table 7. Result
count(DISTINCT friend_of_friend) count(friend_of_friend)

`1`

`2`

Rows: 1

## max()

The function `max()` returns the maximum value in a set of values.

Syntax:

``max(expression)``

Returns:

 A property type, or a list, depending on the values returned by `expression`.

Arguments:

Name Description

`expression`

An expression returning a set containing any combination of property types and lists thereof.

Considerations:

 Any `null` values are excluded from the calculation. In a mixed set, any numeric value is always considered to be higher than any string value, and any string value is always considered to be higher than any list. Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end. `max(null)` returns `null`.
Example 8. max()
Query
``````UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val
RETURN max(val)``````

The highest of all the values in the mixed set — in this case, the numeric value `1` — is returned.

 The value `'99'` (a string), is considered to be a lower value than `1` (an integer), because `'99'` is a string.
Table 8. Result
max(val)

`1`

Rows: 1

Example 9. max()
Query
``````UNWIND [[1, 'a', 89], [1, 2]] AS val
RETURN max(val)``````

The highest of all the lists in the set — in this case, the list `[1, 2]` — is returned, as the number `2` is considered to be a higher value than the string `'a'`, even though the list `[1, 'a', 89]` contains more elements.

Table 9. Result
max(val)

`[1,2]`

Rows: 1

Example 10. max()
Query
``````MATCH (n:Person)
RETURN max(n.age)``````

The highest of all the values in the property `age` is returned.

Table 10. Result
max(n.age)

`44`

Rows: 1

## min()

The function `min()` returns the minimum value in a set of values.

Syntax:

``min(expression)``

Returns:

 A property type, or a list, depending on the values returned by `expression`.

Arguments:

Name Description

`expression`

An expression returning a set containing any combination of property types and lists thereof.

Considerations:

 Any `null` values are excluded from the calculation. In a mixed set, any string value is always considered to be lower than any numeric value, and any list is always considered to be lower than any string. Lists are compared in dictionary order, i.e. list elements are compared pairwise in ascending order from the start of the list to the end. `min(null)` returns `null`.
Example 11. min()
Query
``````UNWIND [1, 'a', null, 0.2, 'b', '1', '99'] AS val
RETURN min(val)``````

The lowest of all the values in the mixed set — in this case, the string value `"1"` — is returned. Note that the (numeric) value `0.2`, which may appear at first glance to be the lowest value in the list, is considered to be a higher value than `"1"` as the latter is a string.

Table 11. Result
min(val)

`"1"`

Rows: 1

Example 12. min()
Query
``````UNWIND ['d', [1, 2], ['a', 'c', 23]] AS val
RETURN min(val)``````

The lowest of all the values in the set — in this case, the list `['a', 'c', 23]` — is returned, as (i) the two lists are considered to be lower values than the string `"d"`, and (ii) the string `"a"` is considered to be a lower value than the numerical value `1`.

Table 12. Result
min(val)

`["a","c",23]`

Rows: 1

Example 13. min()
Query
``````MATCH (n:Person)
RETURN min(n.age)``````

The lowest of all the values in the property `age` is returned.

Table 13. Result
min(n.age)

`13`

Rows: 1

## percentileCont()

The function `percentileCont()` returns the percentile of the given value over a group, with a percentile from `0.0` to `1.0`. It uses a linear interpolation method, calculating a weighted average between two values if the desired percentile lies between them. For nearest values using a rounding method, see `percentileDisc`.

Syntax:

``percentileCont(expression, percentile)``

Returns:

 A Float.

Arguments:

Name Description

`expression`

A numeric expression.

`percentile`

A numeric value between `0.0` and `1.0`.

Considerations:

 Any `null` values are excluded from the calculation. `percentileCont(null, percentile)` returns `null`.
Example 14. percentileCont()
Query
``````MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)``````

The 40th percentile of the values in the property `age` is returned, calculated with a weighted average.

Table 14. Result
percentileCont(n.age, 0.4)

`29.0`

Rows: 1

## percentileDisc()

The function `percentileDisc()` returns the percentile of the given value over a group, with a percentile from `0.0` to `1.0`. It uses a rounding method and calculates the nearest value to the percentile. For interpolated values, see `percentileCont`.

Syntax:

``percentileDisc(expression, percentile)``

Returns:

 Either an Integer or a Float, depending on the values returned by `expression` and whether or not the calculation overflows.

Arguments:

Name Description

`expression`

A numeric expression.

`percentile`

A numeric value between `0.0` and `1.0`.

Considerations:

 Any `null` values are excluded from the calculation. `percentileDisc(null, percentile)` returns `null`.
Example 15. percentileDisc()
Query
``````MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)``````

The 50th percentile of the values in the property `age` is returned.

Table 15. Result
percentileDisc(n.age, 0.5)

`33`

Rows: 1

## stDev()

The function `stDev()` returns the standard deviation for the given value over a group. It uses a standard two-pass method, with `N - 1` as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard variation of the entire population is being calculated, `stdDevP` should be used.

Syntax:

``stDev(expression)``

Returns:

 A Float.

Arguments:

Name Description

`expression`

A numeric expression.

Considerations:

 Any `null` values are excluded from the calculation. `stDev(null)` returns `0`.
Example 16. stDev()
Query
``````MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDev(n.age)``````

The standard deviation of the values in the property `age` is returned.

Table 16. Result
stDev(n.age)

`15.716233645501712`

Rows: 1

## stDevP()

The function `stDevP()` returns the standard deviation for the given value over a group. It uses a standard two-pass method, with `N` as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard variation of only a sample of the population is being calculated, `stDev` should be used.

Syntax:

``stDevP(expression)``

Returns:

 A Float.

Arguments:

Name Description

`expression`

A numeric expression.

Considerations:

 Any `null` values are excluded from the calculation. `stDevP(null)` returns `0`.
Example 17. stDevP()
Query
``````MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stDevP(n.age)``````

The population standard deviation of the values in the property `age` is returned.

Table 17. Result
stDevP(n.age)

`12.832251036613439`

Rows: 1

## sum() - Numeric values

The function `sum()` returns the sum of a set of numeric values.

Syntax:

``sum(expression)``

Returns:

 Either an Integer or a Float, depending on the values returned by `expression`.

Arguments:

Name Description

`expression`

An expression returning a set of numeric values.

Considerations:

 Any `null` values are excluded from the calculation. `sum(null)` returns `0`.
Example 18. sum()
Query
``````MATCH (n:Person)
RETURN sum(n.age)``````

The sum of all the values in the property `age` is returned.

Table 18. Result
sum(n.age)

`90`

Rows: 1

## sum() - Durations

The function `sum()` returns the sum of a set of durations.

Syntax:

``sum(expression)``

Returns:

 A Duration.

Arguments:

Name Description

`expression`

An expression returning a set of Durations.

Considerations:

 Any `null` values are excluded from the calculation.
Example 19. sum()
Query
``````UNWIND [duration('P2DT3H'), duration('PT1H45S')] AS dur
RETURN sum(dur)``````

The sum of the two supplied Durations is returned.

Table 19. Result
sum(dur)

`P2DT4H45S`

Rows: 1

## Grouping keys

Aggregation expressions are expressions which contain one or more aggregation functions. A simple aggregation expression consists of a single aggregation function. For instance, `SUM(x.a)` is an aggregation expression that only consists of the aggregation function `SUM( )` with `x.a` as its argument. Aggregation expressions are also allowed to be more complex, where the result of one or more aggregation functions are input arguments to other expressions. For instance, `0.1 * (SUM(x.a) / COUNT(x.b))` is an aggregation expression that contains two aggregation functions, `SUM( )` with `x.a` as its argument and `COUNT( )` with `x.b` as its argument. Both are input arguments to the division expression.

For aggregation expressions to be correctly computable for the buckets formed by the grouping key(s), they have to fulfill some requirements. Specifically, each sub expression in an aggregation expression has to be either:

• an aggregation function, e.g. `SUM(x.a)`,

• a constant, e.g. `0.1`,

• a parameter, e.g. `\$param`,

• a grouping key, e.g. the `a` in `RETURN a, count(*)`

• a local variable, e.g. the `x` in `count(*) + size([ x IN range(1, 10) | x ])`, or

• a subexpression, all whose operands are operands allowed in an aggregation expression.

### Examples of aggregation expressions.

Example 20. Simple aggregation without any grouping keys:
Query
``MATCH (p: Person) RETURN max(p.age)``
Table 20. Result
max(p.age)

`44`

Rows: 1

Example 21. Addition of an aggregation and a constant, without any grouping keys:
Query
``MATCH (p: Person) RETURN max(p.age) + 1``
Table 21. Result
max(p.age) + 1

`45`

Rows: 1

Example 22. Subtraction of a property access and an aggregation.

Note that `n` is a grouping key:

Query
``MATCH (n: Person{name:"A"})-[:KNOWS]-(f:Person) RETURN n, n.age - max(f.age)``
Table 22. Result
n n.age - max(f.age)

`Node{age:13,name:"A"}`

`-31`

Rows: 1

Example 23. Subtraction of a property access and an aggregation.

Note that `n.age` is a grouping key:

Query
``MATCH (n: Person{name:"A"})-[:KNOWS]-(f:Person) RETURN n.age, n.age - max(f.age)``
Table 23. Result
n.age n.age - max(f.age)

`13`

`-31`

Rows: 1

Grouping keys themselves can be complex expressions. For better query readability, Cypher only recognizes a sub-expression in aggregation expressions as a grouping key if the grouping key is either:

• A variable - e.g. the `n` in `RETURN n, n.age - max(f.age)`

• A property access - e.g. the `n.age` in `RETURN n.age, n.age - max(f.age)`

• A map access - e.g. the `n.age` in `WITH {age: 34, name:Chris} AS n RETURN n.age, n.age - max(n.age)`

If more complex grouping keys are needed as operands in aggregation expression, it is always possible to project them in advance with `WITH`.

Using the property `n.age` will throw an exception, since `n.age` is not a grouping key:

Query
``MATCH (n: Person{name:"A"})-[:KNOWS]-(f:Person) RETURN n.age - max(f.age)``

`n.age + n.age` is not a valid grouping key, since the expression is not a variable, property access or map access. It can therefore not be used in the expression which contains the aggregation function:

Query
``MATCH (n: Person{name:"A"})-[:KNOWS]-(f:Person) RETURN n.age + n.age, n.age + n.age - max(f.age)``

The above query could be rewritten to:

Query
``MATCH (n: Person{name:"A"})-[:KNOWS]-(f:Person) WITH n.age + n.age AS groupingKey, f RETURN groupingKey, groupingKey - max(f.age)``