In order to achieve the best performance, and avoid negative effects on the rest of the system, consider these best practices when processing large deletes.
Start by identifying which situation you are in:
- Deleting the entire graph database, so you can rebuild from scratch.
- Deleting a large section of the graph, or a large number of nodes/relationships that are identified in a MATCH statement.
- Depending on the situation, there may be a different recommendation. Going through them in order:
When deleting the entire graph database, by far the best way is to simply stop the database, rename/delete the graph store
data/graph.db (pre v3.x) or
data/databases/graph.db (3.x forward) or similar) directory, and start the database.
This will build a fresh, empty database for you.
If you need to delete some large number of objects from the graph, one needs to be mindful of the not building up such a large single transaction such that a Java out of heap error will be encountered. Use the following example to delete subsets of the matched records in batches until the full delete is complete:
With 3.x forward and using APOC
Run this until the statement returns 0 (zero) records.
For versions before Neo4j 2.3 run:
In all of the examples we are performing a delete in batch sizes of 10k. This may still lead to out of heap errors if the nodes
eligible for delete have a significant number of relationsips. For example if a node to be deleted as 1 million
relationships then the delete of this single node will include the removal of this 1 node and the 1 million