Neo4j uses logical deletes to delete from the database to achieve maximum performance and scalability. To understand how this might appear to an operator of the database, lets take a simple case of loading data into Neo4j. When you start loading data, you can see
the nodes are stored in a file called
neostore.nodestore.db. As you keep loading, the file will keep growing.
However, once you start deleting nodes, you can verify that the file
neostore.nodestore.db does not reduce in size. In fact, not only
does the size remain the same, but you will also start to see the file
neostore.nodestore.db.id grow – and keep growing for all records deleted.
This happens because of id re-use. Deletes in Neo4j do not physically delete the records, but rather just flip the bit from available
to unavailable. We keep the deleted (but available to reuse) IDs in
neostore.nodestore.db.id. This means the
neostore.nodestore.db.id file acts sort of like a “recycle bin” where it stores all the deleted ids.
Now you’ve deleted the data and
neostore.nodestore.db is the same size as before the delete, the
neostore.nodestore.db.id file is
larger than before the delete operation. How do you reclaim this space?
When you start loading new data after the deletes, Neo4j starts using the ids recorded in
neostore.nodestore.db.id and thus the
neostore.nodestore.db file does not grow in size and the file
neostore.nodestore.db.id starts decreasing until it’s completely
If you do not plan to add more nodes but still want to shrink the size of the database on disk, you can use the copy store util. This utility will read an offline database, copy it to a new one, and leave out data that is no longer in use (and also the list of eligible ids to re-use).
Large deletes can generate a lot of transaction logs. You should be aware of this when doing mass delete operations otherwise – ironically – your filesystem can potentially fill up.