This section describes how Neo4j handles data deletion and storage space.
Neo4j uses logical deletes to remove data from the database to achieve maximum performance and scalability. A logical delete means that all relevant records are marked as deleted, but the space they occupy is not immediately returned to the operating system. Instead, it is subsequently reused by the transactions creating data.
Marking a record as deleted requires writing a record update command to the transaction log, as when something is created or updated. Therefore, when deleting large amounts of data, this leads to a storage usage growth of that particular database, because Neo4j writes records for all deleted nodes, their properties, and relationships to the transaction log.
Keep in mind that when doing
Transactions are eventually pruned out of the transaction log, bringing the storage usage of the log back down to the expected level. The store files, on the other hand, do not shrink when data is deleted. The space that the deleted records take up is kept in the store files. Until the space is reused, the store files are sparse and fragmented, but the performance impact of this is usually minimal.
Neo4j uses .id files for managing the space that can be reused.
These files contain the set of IDs for all the deleted records in their respective files.
The ID of the record uniquely identifies it within the store file.
For instance, the
neostore.nodestore.db.id contains the IDs of all deleted nodes.
These .id files are maintained as part of the write transactions that interact with them. When a write transaction commits a deletion, the record’s ID is buffered in memory. The buffer keeps track of all overlapping unfinished transactions. When they complete, the ID becomes available for reuse.
The buffered IDs are flushed to the .id files as part of the checkpointing. Concurrently, the .id file changes (the ID additions and removals) are inferred from the transaction commands. This way, the recovery process ensures that the .id files are always in-sync with their store files. The same process also ensures that clustered databases have precise and transactional space reuse.
If you want to shrink the size of your database, do not delete the .id files.
The store files must only be modified by the Neo4j database and the
You can use the
neo4j-admin copy tool to create a defragmented copy of your database.
copy command creates and entirely new and independent database.
If you want to run that database in a cluster, you have to re-seed the existing cluster, or seed a new cluster from that copy.