Parallel Cypher Execution

This section describes procedures and functions for parallel execution of Cypher statements.

Procedure and Function Overview

The available procedures and functions are described below:

Qualified Name Type Release

apoc.cypher.parallel

- executes fragments in parallel through a list defined in paramMap with a key keyList

Procedure

APOC Full

apoc.cypher.parallel2

- executes fragments in parallel batches through a list defined in paramMap with a key keyList

Procedure

APOC Full

apoc.cypher.mapParallel

apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value - executes fragment in parallel batches with the list segments being assigned to _

Procedure

APOC Full

apoc.cypher.mapParallel2

apoc.cypher.mapParallel2(fragment, params, list-to-parallelize) yield value - executes fragment in parallel batches with the list segments being assigned to _

Procedure

APOC Full

apoc.cypher.parallel

Given this dataset:

UNWIND range(0, 9999) as idx CREATE (:Person {name: toString(idx)})

we can execute parallel statements through (:Person) nodes with this procedure:

MATCH (p:Person) WITH collect(p) as people
CALL apoc.cypher.parallel('RETURN a.name + t as title', {a: people, t: ' - suffix'}, 'a')
YIELD value RETURN value.title as title

In the above query, we passed a map as a second parameter and a string from the previous map as a third parameter. The value with key 'a' will be the list to cycle in parallel. Note that it is not needed to pass a and t as query parameters (that is $a and $t) because, under the hood, the procedure will prepend them in the query WITH $parameterName as parameterName. So in this case, WITH $a as a, $t as t.

In this example, we execute multiple queries in parallel WITH $a as a, $t as t RETURN a.name + t as title, where a is one of the (:Person) nodes included in people list.

The result of the procedure is:

Table 1. Result
title

"0 - suffix"

"1 - suffix"

"2 - suffix"

"3 - suffix"

"4 - suffix"

…​

…​

…​

…​

apoc.cypher.parallel2

This procedure is similar to apoc.cypher.parallel2, but works differently under the hood (see below). With the previous dataset, we can execute:

MATCH (p:Person) WITH collect(p) as people
CALL apoc.cypher.parallel('RETURN a.name + t as title', {a: people, t: $suffix}, 'a')
YIELD value RETURN value.title as title

The result of the procedure is:

Table 2. Result
title

"0 - suffix"

"1 - suffix"

"2 - suffix"

"3 - suffix"

"4 - suffix"

…​

…​

…​

…​

The parallel put the collection to parallelize - in this case, people in a java.util.parallelStream() - and then executed multiple queries like this: WITH $a as a, $t as t RETURN a.name + t as title.

In the parallel2 transformation example, the fragment parameter first split the collection people into batchSizes of total / partitions, where partitions are 100 * number of processors available to the JVM (or 1 if total / partitions < 1). Then, it created a java.util.concurrent.Future for each batch, where each Future executed a query like this: WITH $t AS t UNWIND $a AS a RETURN a.name + $t as title (where $a is the current batch of people). Finally, it computed the futures.

Generally, the apoc.cypher.parallel2 procedure is more recommended than the apoc.cypher.parallel.