## 11.5. Aggregation

### Introduction

To calculate aggregated data, Cypher offers aggregation, much like SQL’s `GROUP BY`.

Aggregate functions take multiple input values and calculate an aggregated value from them. Examples are `avg` that calculates the average of multiple numeric values, or `min` that finds the smallest numeric value in a set of values.

Aggregation can be done over all the matching subgraphs, or it can be further divided by introducing key values. These are non-aggregate expressions, that are used to group the values going into the aggregate functions.

So, if the return statement looks something like this:

`RETURN n, count(*)`

We have two return expressions: `n`, and `count(*)`. The first, `n`, is no aggregate function, and so it will be the grouping key. The latter, `count(*)` is an aggregate expression. So the matching subgraphs will be divided into different buckets, depending on the grouping key. The aggregate function will then run on these buckets, calculating the aggregate values.

If you want to use aggregations to sort your result set, the aggregation must be included in the `RETURN` to be used in your `ORDER BY`.

The last piece of the puzzle is the `DISTINCT` keyword. It is used to make all values unique before running them through an aggregate function.

An example might be helpful. In this case, we are running the query against the following data:

Query

```MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person)
WHERE me.name = 'A'
RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)```

In this example we are trying to find all our friends of friends, and count them. The first aggregate function, `count(DISTINCT friend_of_friend)`, will only see a `friend_of_friend` once — `DISTINCT` removes the duplicates. The latter aggregate function, `count(friend_of_friend)`, might very well see the same `friend_of_friend` multiple times. In this case, both `B` and `C` know `D` and thus `D` will get counted twice, when not using `DISTINCT`.

Result

count(distinct friend_of_friend)count(friend_of_friend)
1 row

`1`

`2`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 MATCH (me:Person)-->(friend:Person)-->(friend_of_friend:Person) WHERE me.name = 'A'RETURN count(distinct friend_of_friend), count(friend_of_friend)

The following examples are assuming the example graph structure below.

### COUNT

`COUNT` is used to count the number of rows.

`COUNT` can be used in two forms — `COUNT(*)` which just counts the number of matching rows, and `COUNT(<identifier>)`, which counts the number of non-`NULL` values in `<identifier>`.

#### Count nodes

To count the number of nodes, for example the number of nodes connected to one node, you can use `count(*)`.

Query

```MATCH (n { name: 'A' })-->(x)
RETURN n, count(*)```

This returns the start node and the count of related nodes.

Result

ncount(*)
1 row

`Node[1]{name:"A",property:13}`

`3`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n {name: 'A'})-->(x) return n, count(*)

#### Group Count Relationship Types

To count the groups of relationship types, return the types and count them with `count(*)`.

Query

```MATCH (n { name: 'A' })-[r]->()
RETURN type(r), count(*)```

The relationship types and their group count is returned by the query.

Result

type(r)count(*)
1 row

`"KNOWS"`

`3`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n {name: 'A'})-[r]->() return type(r), count(*)

#### Count entities

Instead of counting the number of results with `count(*)`, it might be more expressive to include the name of the identifier you care about.

Query

```MATCH (n { name: 'A' })-->(x)
RETURN count(x)```

The example query returns the number of connected nodes from the start node.

Result

count(x)
1 row

`3`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n {name: 'A'})-->(x) return count(x)

#### Count non-null values

You can count the non-`NULL` values by using `count(<identifier>)`.

Query

```MATCH (n:Person)
RETURN count(n.property)```

The count of related nodes with the `property` property set is returned by the query.

Result

count(n.property)
1 row

`3`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return count(n.property)

### Statistics

#### sum

The `sum` aggregation function simply sums all the numeric values it encounters. `NULL`s are silently dropped.

Query

```MATCH (n:Person)
RETURN sum(n.property)```

This returns the sum of all the values in the property `property`.

Result

sum(n.property)
1 row

`90`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return sum(n.property)

#### avg

`avg` calculates the average of a numeric column.

Query

```MATCH (n:Person)
RETURN avg(n.property)```

The average of all the values in the property `property` is returned by the example query.

Result

avg(n.property)
1 row

`30.0`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return avg(n.property)

#### percentileDisc

`percentileDisc` calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method, returning the nearest value to the percentile. For interpolated values, see `percentileCont`.

Query

```MATCH (n:Person)
RETURN percentileDisc(n.property, 0.5)```

The 50th percentile of the values in the property `property` is returned by the example query. In this case, 0.5 is the median, or 50th percentile.

Result

percentileDisc(n.property, 0.5)
1 row

`33`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return percentileDisc(n.property, 0.5)

#### percentileCont

`percentileCont` calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values, if the desired percentile lies between them. For nearest values using a rounding method, see `percentileDisc`.

Query

```MATCH (n:Person)
RETURN percentileCont(n.property, 0.4)```

The 40th percentile of the values in the property `property` is returned by the example query, calculated with a weighted average.

Result

percentileCont(n.property, 0.4)
1 row

`29.0`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return percentileCont(n.property, 0.4)

#### stdev

`stdev` calculates the standard deviation for a given value over a group. It uses a standard two-pass method, with `N - 1` as the denominator, and should be used when taking a sample of the population for an unbiased estimate. When the standard variation of the entire population is being calculated, `stdevp` should be used.

Query

```MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stdev(n.property)```

The standard deviation of the values in the property `property` is returned by the example query.

Result

stdev(n.property)
1 row

`15.716233645501712`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n) where n.name IN ['A','B','C'] return stdev(n.property)

#### stdevp

`stdevp` calculates the standard deviation for a given value over a group. It uses a standard two-pass method, with `N` as the denominator, and should be used when calculating the standard deviation for an entire population. When the standard variation of only a sample of the population is being calculated, `stdev` should be used.

Query

```MATCH (n)
WHERE n.name IN ['A', 'B', 'C']
RETURN stdevp(n.property)```

The population standard deviation of the values in the property `property` is returned by the example query.

Result

stdevp(n.property)
1 row

`12.832251036613439`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n) where n.name IN ['A','B','C'] return stdevp(n.property)

#### max

`max` find the largest value in a numeric column.

Query

```MATCH (n:Person)
RETURN max(n.property)```

The largest of all the values in the property `property` is returned.

Result

max(n.property)
1 row

`44`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return max(n.property)

#### min

`min` takes a numeric property as input, and returns the smallest value in that column.

Query

```MATCH (n:Person)
RETURN min(n.property)```

This returns the smallest of all the values in the property `property`.

Result

min(n.property)
1 row

`13`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return min(n.property)

### collect

`collect` collects all the values into a list. It will ignore `NULL`s.

Query

```MATCH (n:Person)
RETURN collect(n.property)```

Returns a single row, with all the values collected.

Result

collect(n.property)
1 row

`[13,33,44]`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (n:Person) return collect(n.property)

### DISTINCT

All aggregation functions also take the `DISTINCT` modifier, which removes duplicates from the values. So, to count the number of unique eye colors from nodes related to `a`, this query can be used:

Query

```MATCH (a:Person { name: 'A' })-->(b)
RETURN count(DISTINCT b.eyes)```

Returns the number of eye colors.

Result

count(distinct b.eyes)
1 row

`2`

Try this query live create (_0:`Person` {`eyes`:"brown", `name`:"D"}) create (_1:`Person` {`name`:"A", `property`:13}) create (_2:`Person` {`eyes`:"blue", `name`:"B", `property`:33}) create (_3:`Person` {`eyes`:"blue", `name`:"C", `property`:44}) create _1-[:`KNOWS`]->_0 create _1-[:`KNOWS`]->_3 create _1-[:`KNOWS`]->_2 match (a:Person {name: 'A'})-->(b) return count(distinct b.eyes)