This section describes the architecture of a Neo4j HA cluster.
A Neo4j cluster is comprised of a single master instance and zero or more slave instances. All instances in the cluster have full copies of the data in their local database files. The basic cluster configuration consists of three instances:
Each instance contains the logic needed in order to coordinate with the other members of the cluster for data replication and election management, represented by the green arrows in the picture above. Each slave instance that is not an arbiter instance (see below) communicates with the master to keep databases up to date, as represented by the blue arrows in the picture above.
A special case of a slave instance is the arbiter instance. The arbiter instance comprises the full Neo4j software running in an arbiter mode, such that it participates in cluster communication, but it does not replicate a copy of the datastore.
Write transactions performed directly on the master will execute as though the instance were running in non-cluster mode. On success the transaction will be pushed out to a configurable number of slaves. This is done optimistically, meaning that if the push fails, the transaction will still be successful.
When performing a write transaction on a slave each write operation will be synchronized with the master. Locks will be acquired on both master and slave. When the transaction commits it will first be committed on the master and then, if successful, on the slave. To ensure consistency, a slave must be up to date with the master before performing a write operation. The automatic updating of slaves is built into the communication protocol between slave and master.
Whenever a Neo4j database becomes unavailable, for example caused by hardware failure or network outage, the other instances in the cluster will detect that and mark it as temporarily failed. A database instance that becomes available after an outage will automatically catch up with the cluster.
If the master goes down another member will be elected and have its role switched from slave to master after quorum (see below) has been reached within the cluster. When the new master has performed its role switch, it will broadcast its availability to all the other members of the cluster. Normally a new master is elected and started within seconds. During this time no writes can take place
A cluster must have quorum to elect a new master. Quorum is defined as: more than 50% of active cluster members. A simple rule of thumb when designing a cluster is: A cluster that must be able to tolerate n master instance failures requires 2n+1 instances to satisfy quorum and allow elections to take place. Therefore, the simplest valid cluster size is three instances, which allows for a single master failure.
Data branching can be caused in two different ways:
The database makes the best of the situation by creating a directory with the contents of the database files from before branching took place so that it can be reviewed and the situation be resolved. Data branching does not occur under normal operations.
All this can be summarized as: