In order to achieve the best performance, and avoid negative effects on the rest of the system, consider these best practices when processing large deletes.

Start by identifying which situation you are in:

  • Deleting the entire graph database, so you can rebuild from scratch.
  • Deleting a large section of the graph, or a large number of nodes/relationships that are identified in a MATCH statement.
  • Depending on the situation, there may be a different recommendation. Going through them in order:

When deleting the entire graph database, by far the best way is to simply stop the database, rename/delete the graph store (data/graph.db or similar) directory, and start the database. This will build a fresh, empty database for you.

If you need to delete some large number of objects from the graph, use the following example to delete subsets of the matched records in batches until the full delete is complete:

// Find the nodes you want to delete
MATCH (n:Foo) where n.foo = "bar"

// Take the first 10k nodes and their rels (if more than 100 rels / node on average lower this number)
WITH n LIMIT 10000
DETACH DELETE n
RETURN count(*);

Run this until the statement returns 0 (zero) records.

For versions before Neo4j 2.3 run:

// Find the nodes you want to delete
MATCH (n:Foo) where n.foo = "bar"

// Take the first 10k nodes and their rels (if more than 100 rels / node on average lower this number)
WITH n LIMIT 10000
MATCH (n)-[r]-()
DELETE n,r
RETURN count(*);

Details


Author:
Dave Gordon
Applicable versions:
2.3
Keywords:
cyphergarbage collectionheapmemoryneo4j-2.3transaction