Knowledge Base

Large Delete Transaction Best Practices in Neo4j

In order to achieve the best performance, and avoid negative effects on the rest of the system, consider these best practices when processing large deletes.

Start by identifying which situation you are in:

  • Deleting the entire graph database, so you can rebuild from scratch.

  • Deleting a large section of the graph, or a large number of nodes/relationships that are identified in a MATCH statement.

  • Depending on the situation, there may be a different recommendation. Going through them in order:

When deleting the entire graph database,

In Neo4j 4.x EE, you can just switch to the system database and issue an DROP DATABASE xxx; and CREATE DATABASE xxx; statements.

For Community Edition and older editions: stop the database, rename/delete the graph store

  • 4.x CE data/databases/neo4j and data/transactions/neo4j

  • pre 4.x data/databases/graph.db

  • pre 3.x data/graph.db

directory, and start the database.

This will build a fresh, empty database for you.

If you need to delete some large number of objects from the graph, one needs to be mindful of the not building up such a large single transaction such that a Java OUT OF HEAP Error will be encountered.

Use the following example to delete subsets of the matched records in batches until the full delete is complete:

If your nodes have more than 100 relationships per node ((100+1)*10k=>1010k deletes) reduce the batch size or see the recommendations at the bottom.

With 4.4 and newer versions you can utilize the CALL {} IN TRANSACTIONS syntax.

MATCH (n:Foo) where n.foo='bar'
CALL { WITH n
DETACH DELETE n
} IN TRANSACTIONS OF 10000 ROWS;

With 3.x forward and using APOC

call apoc.periodic.iterate("MATCH (n:Foo) where n.foo='bar' return id(n) as id", "MATCH (n) WHERE id(n) = id DETACH DELETE n", {batchSize:10000})
yield batches, total return batches, total

Pre 3.x

// Find the nodes you want to delete
MATCH (n:Foo) where n.foo = 'bar'

// Take the first 10k nodes and their rels (if more than 100 rels / node on average lower this number)
WITH n LIMIT 10000
DETACH DELETE n
RETURN count(*);

Run this until the statement returns 0 (zero) records.

For versions before Neo4j 2.3 run:

// Find the nodes you want to delete
MATCH (n:Foo) where n.foo = 'bar'

// Take the first 10k nodes and their rels (if more than 100 rels / node on average lower this number)
WITH n LIMIT 10000
MATCH (n)-[r]-()
DELETE n,r
RETURN count(*);

In all of the examples we are performing a delete in batch sizes of 10k. This may still lead to out of heap errors if the nodes eligible for delete have a significant number of relationsips.

For example if a node to be deleted as 1 million :FOLLOWS relationships then the delete of this single node will include the removal of this 1 node and the 1 million :FOLLOWS relationships.

For nodes with many relationships or widely varying degrees of nodes a single DETACH DELETE can still exceed the transaction heap boundaries. In these cases it’s better to delete the relationships first and then the nodes in batches. Here is an example for Neo4j 4.4.x and later

MATCH (n:Foo)-[r]-() where n.foo='bar'

// delete relationships
CALL { WITH r
DELETE r
} IN TRANSACTIONS OF 10000 ROWS

// reduce cardinality
WITH distinct n
// delete nodes
CALL { WITH n
DELETE n
} IN TRANSACTIONS OF 10000 ROWS;

Also, please consider reviewing KB document How to avoid using excessive memory on deletes involving dense nodes for other considerations.