This article will try to guide you through Neo4j’s commit and replication processes both for single instances and causal clusters.
When you call tx.commit(), the transaction will go through the Storage Engine which will transform that transaction into a Transaction Representation. This is similar to what you get when you dump a transaction log and contains all of the commands generated by that transaction:
Image 1 – Storage Engine
On a single instance, this Transaction Representation is then passed on to the Transaction Commit Process which will effectively write that transaction to the transaction log. This internally calls appendToLog(). After that, the Transaction Representation will go to the Record Store Engine which then persists that transaction to disk (applyToStore())
applyToStore() doesn’t necessarily happen together with appendToLog() but rather happens during a checkpoint operation or when a dirty page is flushed from the pagecache.
Image 2 – Transaction Commit Process
Image 3 – Record Storage Engine
This is the process for a single instance which is fairly simple. Naturally, it doesn’t involve any RAFT components.
For a Causal Cluster, the work will be done on the Leader. Everything in the process is the same, but the Transaction Commit Process is intercepted before flushing the transaction to the log:
Image 4 – Transaction Commit Process
The Transaction Representation is intercepted by the Replicated Transaction Commit Process which turns the Transaction Representation into a Raft Message (commit()). It is then replicated by a component called Raft Replicator (replicate()). The way this replication occurs is the following:
1) The Leader will send an append to to followers saying it’s got a new message
2) Followers append that message to their own RAFT logs and send a response back saying it’s been appended
3) The Leader then gets that message and sends a commit message saying all is ok in both sides and it’s safe to commit
Image 5 – Replication
After this happens, the Transaction Representation goes through to a queue of Transaction Representations we call the Replicated Transaction State Machine (applyCommand()) and this keeps track of the transactions and what order they need to be applied to the store.
Image 6 – Replicated Transaction State Machine
From there, these Transaction Representations will go through the Commit Process which will then connect back to the Transaction Commit Process (image 2) in order to flush to the transaction log and finally apply to store (image 3)