14.8. Space reuse

This section describes how Neo4j handles data deletion and storage space.

Neo4j uses logical deletes to remove data from the database to achieve maximum performance and scalability. A logical delete means that all relevant records are marked as deleted, but the space they occupy is not immediately returned to the operating system. Instead, it is subsequently reused by the transactions creating data.

Marking a record as deleted requires writing a record update command to the transaction log, as when something is created or updated. Therefore, when deleting large amounts of data, this leads to a storage usage growth of that particular database, because Neo4j writes records for all deleted nodes, their properties, and relationships to the transaction log.

Keep in mind that when doing DETACH DELETE on many nodes, those deletes can take up more space in the in-memory transaction state and the transaction log than you might expect.

Transactions are eventually pruned out of the transaction log, bringing the storage usage of the log back down to the expected level. The store files, on the other hand, do not shrink when data is deleted. The space that the deleted records take up is kept in the store files. Until the space is reused, the store files are sparse and fragmented, but the performance impact of this is usually minimal.

14.8.1. ID files

Neo4j uses .id files for managing the space that can be reused. These files contain the set of IDs for all the deleted records in their respective files. The ID of the record uniquely identifies it within the store file. For instance, the neostore.nodestore.db.id contains the IDs of all deleted nodes.

These .id files are maintained as part of the write transactions that interact with them. When a write transaction commits a deletion, the record’s ID is buffered in memory. The buffer keeps track of all overlapping unfinished transactions. When they complete, the ID becomes available for reuse.

The buffered IDs are flushed to the .id files as part of the checkpointing. Concurrently, the .id file changes (the ID additions and removals) are inferred from the transaction commands. This way, the recovery process ensures that the .id files are always in-sync with their store files. The same process also ensures that clustered databases have precise and transactional space reuse.

If you want to shrink the size of your database, do not delete the .id files. The store files must only be modified by the Neo4j database and the neo4j-admin tools.

14.8.2. Reclaim unused space

You can use the neo4j-admin copy tool to create a defragmented copy of your database. The copy command creates and entirely new and independent database. If you want to run that database in a cluster, you have to re-seed the existing cluster, or seed a new cluster from that copy.