Reconciler
Enterprise Edition

In Neo4j, administrative operations such as database creation do not happen synchronously. Instead, the changes are first written to the system database, recording the requested state of the DBMS. Each server has a reconciler, an internal component which observes the requested state and makes changes to the local server to match that state. For example, the system database may record that the database bar is expected to be running on server-1. When the reconciler on server-1 becomes aware of that, it starts bar. This means the loss of any one server cannot prevent all progress on an operation.

Executing an operation and getting back a successful response means that the request is safely committed to the system database, and will be processed by each member of the cluster at some point, assuming the server is healthy enough.

Servers can become aware the same operation at different times, since the system database is just another Raft group, where followers and replicas can lag behind the leader in some situations. Eventually, the new state gets propagated to all healthy servers, and they take action on it.

If you want to be sure that each server has processed an administrative operation before running the next action, you can use the WAIT keyword. WAIT makes the statement not return until all servers have seen the transaction, and their reconciler has finished processing it.

Example of database creation with WAIT

CREATE DATABASE foo WAIT

When a statement with WAIT returns, it does not mean that the operation was necessarily successful everywhere. WAIT just ensures the operation has been processed. For example, a request to start a database is considered processed if the reconciler tried to start it, but the database failed and went into the dirty state.

Errors

An operation might only succeed on some servers. For instance, some servers might be offline when a database is created, or a disk error might cause a server to fail to start a database. In this situation, the operation has not failed as a whole, since the database may even be running (although with less fault tolerance than you intended).

Some failed operations can just be re-tried: for example, START DATABASE can be re-run if not all members started successfully, and you think the cause of that has been resolved. Other failures only require the underlying problem to be resolved. When the issue is fixed, the reconciler will make the intended change, because it is still trying to achieve the requested state. For example, if a new server fails during a DEALLOCATE DATABASES FROM SERVER (meaning a database cannot be safely moved), fixing the new server should be enough to resolve the problem, as the new server will start its copy of the database as soon as the target is ready, allowing the deallocating server to shut down its copy.

ReconcilerEnterprise Edition

Errors

Reconciler
Enterprise Edition