Parallel Cypher Execution
This section describes procedures and functions for parallel execution of Cypher statements.
Procedure and Function Overview
The available procedures and functions are described below:
| Qualified Name | Type | Release |
|---|---|---|
|
- executes fragments in parallel through a list defined in |
|
|
|
- executes fragments in parallel batches through a list defined in |
|
|
|
|
|
|
|
|
|
|
apoc.cypher.parallel
Given this dataset:
UNWIND range(0, 9999) as idx CREATE (:Person {name: toString(idx)})
we can execute parallel statements through (:Person) nodes with this procedure:
MATCH (p:Person) WITH collect(p) as people
CALL apoc.cypher.parallel('RETURN a.name + t as title', {a: people, t: ' - suffix'}, 'a')
YIELD value RETURN value.title as title
In the above query, we passed a map as a second parameter and a string from the previous map as a third parameter.
The value with key 'a' will be the list to cycle in parallel.
Note that it is not needed to pass a and t as query parameters (that is $a and $t) because, under the hood, the procedure will prepend them in the query WITH $parameterName as parameterName. So in this case, WITH $a as a, $t as t.
In this example, we execute multiple queries in parallel WITH $a as a, $t as t RETURN a.name + t as title,
where a is one of the (:Person) nodes included in people list.
The result of the procedure is:
| title |
|---|
"0 - suffix" |
"1 - suffix" |
"2 - suffix" |
"3 - suffix" |
"4 - suffix" |
… |
… |
… |
… |
apoc.cypher.parallel2
This procedure is similar to apoc.cypher.parallel2, but works differently under the hood (see below).
With the previous dataset, we can execute:
MATCH (p:Person) WITH collect(p) as people
CALL apoc.cypher.parallel('RETURN a.name + t as title', {a: people, t: $suffix}, 'a')
YIELD value RETURN value.title as title
The result of the procedure is:
| title |
|---|
"0 - suffix" |
"1 - suffix" |
"2 - suffix" |
"3 - suffix" |
"4 - suffix" |
… |
… |
… |
… |
The parallel put the collection to parallelize - in this case, people in a java.util.parallelStream() - and then executed multiple queries like this: WITH $a as a, $t as t RETURN a.name + t as title.
In the parallel2 transformation example, the fragment parameter first split the collection people into batchSizes of total / partitions,
where partitions are 100 * number of processors available to the JVM (or 1 if total / partitions < 1). Then, it created a java.util.concurrent.Future for each batch, where each Future executed a query like this: WITH $t AS t UNWIND $a AS a RETURN a.name + $t as title (where $a is the current batch of people). Finally, it computed the futures.
Generally, the apoc.cypher.parallel2 procedure is more recommended than the apoc.cypher.parallel.