Now that you have gained experience managing a Neo4j instance and database , you will learn how to get started with creating and managing Neo4j Causal Clusters.
At the end of this module, you should be able to:
This module covers the basics of clusters in Neo4j. Later in this training, you will learn more about security and encryption related to clusters.
Neo4j’s clustering architecture enables an enterprise to utilize a Neo4j database in production. First, it provides a high-available solution whereby if a Neo4j instance has a failure, another Neo4j instance can take over automatically. Secondly, it provides a highly-scalable database whereby some parts of the application update the data, but other parts of the application are widely distributed and do not need immediate access to new or updated data. That is, read latency is acceptable. Causal consistency in Neo4j means that an application is guaranteed to be able to consistently read all data that it has written.
Most Neo4j applications use clustering to ensure high-availability. The scalability of the Neo4j database is used by applications that utilize multiple data centers.
There are two types of Neo4j instances in a cluster architecture: core servers and read replica servers.
Core servers are used for read and write access to the database. The core servers are used to synchronize updates to the database, regardless of the number and physical locations of the Neo4j instances. By default, in a cluster architecture, a transaction is committed if a majority (quorum) of the core servers defined as the minimum required for the cluster have written the data to the physical database. This coordination between core servers is implemented using the Raft protocol. You can have a large number of core servers, but the more core servers in the application architecture, the longer a “majority” commit will take. At a minimum, an application should use three core servers to be considered fault-tolerant. If one of the three servers fail, the cluster is still operable for updates to the database. If you want an architecture that can support two servers failing, then you must configure five core servers. You cannot configure a cluster with two core servers because if one server fails, the second server is automatically set to be read-only, leaving your database to be inoperable for updates.
Read replica servers are used to scale data across a distributed network. They only support read access to the data. The read replica servers regularly poll the core servers for updates to the database by obtaining the transaction log from a core server. You can think of a read replica as a highly scalable and distributed cache of the database. If a read replica fails, a new read replica can be started with no impact on the data and just a slight impact for the application that can be written to reconnect to a different read replica server.
Here is an example where the core servers are located in one data center, but the read replicas are located in many distributed data centers.
An application can create a bookmark that is used to mark the the last transaction committed to the database. In a subsequent read, the bookmark can be used to ensure that the appropriate core servers are used to ensure that only committed data will be read by the application.
As an administrator, you must determine the physical locations of the servers that will be used as core servers and read replica servers. You configure the casual cluster by updating the neo4j.conf file on each server so that they can operate together as a cluster. The types of properties that you configure for a cluster include, but are not limited to:
When a core server starts, it first uses a discovery protocol to join the network. At some point it will be running with the other members of the core membership. In a cluster, exactly one core server is elected to be the LEADER. The LEADER is the coordinator of all communication between the core servers. All of the other core servers are FOLLOWERS as the servers in the cluster use the Raft protocol to synchronize updates. If a core server joins the network after the other core servers have been running and updating data, the late-joining core server must use the catchup protocol to get to a point where it is synchronized as the other FOLLOWERS are.
When a core server shuts down, the shutdown may be initiated by an administrator, or it may be due to a hardware or network failure. If the core server that is a FOLLOWER shuts down, the LEADER detects the shutdown and incorporates this information into its operations with the other core servers. If the core server that is the LEADER shuts down, the remaining core servers communicate with each other and an existing FOLLOWER is promoted to the LEADER.
If a core server shutdown leaves the cluster below a configured threshold for the number of core servers required for the cluster, then the LEADER becomes inoperable for writing to the database. This is a serious matter that needs to be addressed by you as the administrator.
A core server updates its database based upon the requests from clients. The client’s transaction is not complete until a quorum of core servers have updated their databases. Subsequent to the completion of the transaction, the remaining core servers will also be updated. Core servers use the Raft protocol to share updates. Application clients can use the bolt protocol to send updates to a particular core server’s database, but the preferred protocol for an cluster is the bolt+routing protocol. With this protocol, applications can write to any core server in the cluster, but the LEADER will always coordinate updates.
Here are some common tasks for managing and monitoring clustering:
In your real application, you set up the core and read replica Neo4j instances on separate physical servers that are networked and where you have installed Enterprise Edition of Neo4j. In a real application, all configuration for clustering is done by modifying the neo4j.conf file.
Please refer to the Neo4j Operations Manual for greater detail about the settings for configuring clustering.
When setting up clustering, you should first identify at least three machines that will host core servers. For these machines, you should make sure these properties are set in neo4j.conf where XXXX is the IP address of the machine on the network and XXX1, XXX2, XXX3 are the IP addresses of the machines that will participate in the cluster. These machines must be network accessible.
The machines that you designate to run core servers must be reachable from each other. This means that the core machines are part of the membership of the cluster:
Here are some of the settings that you may use for your core servers, depending on whether the addresses are known in the network. You may have to specify advertised addresses in addition to the actual addresses.
The minimum_core_cluster_size_at_formation property specifies the number of core servers that must be running before the database is operable for updates. These core servers, when started, ensure that they are caught up with each other. After all core servers are caught up, then the cluster is operable for updates.
The minimum_core_cluster_size_at_runtime property specifies the number of servers that will actively participate in the cluster at runtime.
If the number of core servers started at formation is greater than the number required at runtime, then some started core servers are not considered essential and the cluster can still be operable if some of the core servers stop running. In this example, the size at formation and the runtime minimum are different. Most deployments set these two properties to be the same.
The minimum number of core servers at runtime in a fault-tolerant cluster is three, which is the default setting for clustering. If you require more than three core servers, you must adjust the values in the clustering configuration section where you specify the size and the members of the cluster.
After you have modified the neo4j.conf files for the cluster, you start each Neo4j instance. When you start a set of core servers, it doesn’t matter what order they are started. The cluster is not considered started until the number of core servers specified in causal_clustering.minimum_core_cluster_size_at_formation have started. One of the members of the core group will automatically be elected as the LEADER. Note that which core server is the LEADER could change at any time. You should observe the log output for each core server instance to ensure that it started with no errors.
|There is a configuration property (causal_clustering.refuse_to_be_leader) that you can set to true in the neo4j.conf file that specifies that this particular core server will never be a leader. It is not recommended that you set this property.|
After you have started the core servers in the cluster, you can access status information about the cluster from
cypher-shell on any of the core servers in the cluster. You simply enter
CALL dbms.cluster.overview(); and it returns information about the servers in the cluster, specifically, which ones are FOLLOWERS and which one is the LEADER.
For this training, you will gain experience managing and monitoring a Neo4j Causal Cluster using Docker. You will create and run Docker containers using a Neo4j Enterprise Docker image. This will enable you to start and manage multiple Neo4j instances used for clustering on your local machine. The published Neo4j Enterprise Edition 3.5.0 Docker image (from DockerHub.com) is pre-configured so that its instances can be easily replicated in a Docker environment that uses clustering. Using a Docker image, you create Docker containers that run on your local system. Each Docker container is a Neo4j instance.
For example, here are the settings in the neo4j.conf file for the Neo4j instance container named core3 when it starts as a Docker container:
Some of these settings are for applications that use the high availability (ha) features of Neo4j. With clustering, we use the core servers for fault-tolerance rather than the high availability features of Neo4j. The setting dbms.connectors.default_listen_address=0.0.0.0 is important. This setting enables the instance to communicate with other applications and servers in the network (for example, using a Web browser to access the http port for the server). Notice that the instance has a number of causal_clustering settings that are pre-configured. These are default settings for clustering that you can override when you create the Docker container for the first time. Some of the other default settings are recommended settings for a Neo4j instance, whether it is part of a cluster or not.
When you create Docker Neo4j containers using
docker run, you specify additional clustering configuration as parameters, rather than specifying them in the neo4j.conf file. Here is an example of the parameters that are specified when creating the Docker container named core3:
In this example, the name of the Docker container is core3. We map the conf, data, and logs folders for the Neo4j instance when it starts to our local filesystem. We map the http and bolt ports to values that will be unique on our system (13474 and 13687). We specify the bolt address to use. The name of the Docker network that is used for this cluster is training-cluster. ACCEPT_LICENSE_AGREEMENT is required. The size of the cluster is three core servers and the names of the [potential] members are specified as core1, core2, core3, core4, and core5. These servers use port 5000 for the discovery listen address. This instance will be used as a core server (dbms.mode=CORE). The container is started in this script detached, meaning that no output or interaction will be produced. And finally the ID of the Neo4j Enterprise 3.5.0 container is specified (b4ca2f886837). When you specify the Neo4j parameters for starting the container (
docker run), you always prefix them with “–env=NEO4J_”. In addition, you specify the underscore character for the dot character and a double underscore for the single underscore character instead of what you would use in the Neo4j configuration file.
|When using the Neo4j Docker instance, a best practice is to specify more members in the cluster, but not require them to be started when the cluster forms. This will enable you to later add core servers to the cluster.|
In this Exercise, you will gain experience with a simple cluster using Docker containers. You will not use Neo4j instances running on your system, but rather Neo4j instances running in Docker containers where you have installed Docker on your system.
Before you begin
docker --version). Here is information about downloading and installing Docker.
curl -O https://s3-us-west-1.amazonaws.com/data.neo4j.com/admin-neo4j/neo4j-docker.zip
docker pull neo4j:3.5.0-enterprise).
sudo usermod -aG docker <username>. You will have to log in and log out to use the new privileges.
docker images. Note that you will have a different Image ID.
sudo ./create_initial_cores.sh <Image ID>providing as an argument the Image ID of the Neo4j Docker image.
docker ps -a
docker start -a coreX. The instance should be started. These instances are set up so that the default browser port on localhost will be 11474, 12474, and 13474 on each instance respectively. Notice that each instance uses it’s own database as the active database. For example, here is the result of starting the core server containers. Notice that each server starts as part of the cluster. The servers are not fully started until all catchup has been done between the servers and the Started record is shown. The databases will not be accessible by clients until all core members of the cluster have successfully started.
docker ps -a.
docker exec -it core1 /var/lib/neo4j/bin/cypher-shell -u neo4j -p neo4j
cypher-shell. In this image, core1 is the LEADER:
docker stop core1 core2 core3
You have now successfully configured, started, and accessed core servers (as Docker containers) running in a causal cluster.
When setting up a cluster for your application, you must ensure that the database that will be used in the cluster has been populated with your application data. In a cluster, each Neo4j instance has its own database, but the data in the databases for each core server that is actively running in the cluster is identical.
Before you seed the data for each core server that is part of a cluster, you must unbind it from the cluster. To unbind the core server, the instance must be stopped, then you run
neo4j-admin unbind --database=<database-name>.
When you seed the data for the cluster, you can do any of the following, but you must do the same on each of the core servers of the cluster to create the production database. Note that the core servers must be down for these tasks. You learned how to do these tasks in the previous module.
If the amount of application data is relatively small (less than 10M nodes) you can also load .csv data into a running core server in the cluster where all core servers are started and actively part of the cluster. This will propagate the data to all databases in the cluster.
In this Exercise, you will populate the databases in the cluster that you created earlier. Because you are using Docker containers for learning about clustering, you cannot perform the normal seeding procedures as you would in your real production environment because when using the Neo4j Docker containers, the Neo4j instance is already started when you start the container. Instead, you will simply start the core servers in the cluster and connect to one of them. Then you will use
cypher-shell to load the Movie data into the database and the data will be propagated to the other core servers.
Before you begin
Ensure that you have performed the steps in Exercise 1 where you set up the core servers as Docker containers. Note that you can perform the steps of this Exercise in a single terminal window.
docker start core1 core2 core3. This will start the core servers in background mode where the log is not attached to STDOUT. If you want to see what is happening with a particular core server, you can always view the messages in <coreX>/logs/debug.log.
docker exec -it <core server> /bin/bash) and entering the following command:
echo "CALL dbms.cluster.overview();" | /var/lib/neo4j/bin/cypher-shell -u neo4j -p training-helps. In this example, core1 is the LEADER:
cypher-shellspecifying that the movie.cypher statements will be run. Hint: You can do this with a single command line:
/var/lib/neo4j/bin/cypher-shell -u neo4j -p training-helps < /var/lib/neo4j/data/movieDB.cypher
cypher-shelland confirm that the data has been loaded into the database.
cypher-shell. Hint: For example, you can log in to core2 with
cypher-shellwith the following command:
docker exec -it core2 /var/lib/neo4j/bin/cypher-shell -u neo4j -p training-helps
You have now seen the cluster in action. Any modification to one database in the core server cluster is propagated to the other core servers.
In a cluster, all write operations must be coordinated by the LEADER in the cluster. Which core server is designated as the LEADER could change at any time in the event of a failure or a network slowdown. Applications that access the database can automatically route their write operations to whatever LEADER is available as this functionality is built into the Neo4j driver libraries. The Neo4j driver code obtains the routing table and automatically updates it as necessary if the endpoints in the cluster change. To implement the automatic routing, application clients that will be updating the database must use the bolt+routing protocol when they connect to any of the core servers in the cluster.
Applications that update the database should always use bolt+routing when accessing the core servers in a cluster. Using this protocol, applications gain:
For example, if you have a cluster with three core servers and core1 is the LEADER, your application can only write to core1 using the bolt protocol and bolt port for core1. An easy way to see this restriction is if you use the default address for
cypher-shell on the system where a FOLLOWER is running. If you connect via
cypher-shell to the server on core2 and attempt to update the database, you receive an error:
When using clustering, all application code that updates the application should use the bolt+routing protocol which will enable applications to be able to write to the database, even in the event of a failure of one of the core servers. Applications should be written with the understanding that transactions are automatically retried.
In this Exercise, you gain some experience with bolt+routing by running two stand-alone Java applications: one that reads from the database and one that writes to the database.
Before you begin
./testRead.sh bolt 12687. You should be able to successfully read from each server. Here is an example of running the script against the core2 server which currently is a FOLLOWER in the cluster:
./testWrite.sh bolt 11687. What you should see is that you can only use the bolt protocol for writing against the LEADER.
./testWrite.sh bolt+routing 12687. With this protocol, all writes are routed to the LEADER and the application can write to the database.
./addPerson.sh bolt+routing 13687 "Willie". This will add a Person node to the database for core3.
./readPerson.sh bolt 12687 "Willie"to confirm that the data was added to core2.
You have now seen how updates to the core servers in a cluster must be coordinated by the server that is currently the LEADER and how reads and writes are performed in a cluster using the bolt and bolt+routing protocols.
You configure read replica servers on host systems where you want the data to be distributed. Read replica servers know about the cluster, but whether they are running or not has no effect on the health of the cluster. In a production environment, you can add many read replicas to the cluster. They will have no impact on the performance of the cluster.
Here are the configuration settings you use for a read replica server:
Just like the configuration for a core server, you must specify the bolt advertised address, as well as the addresses for the servers that are the members of the cluster. However, you can add as many read replica servers and they will not impact the functioning of the cluster.
There can be many read replica servers in a cluster. When they start, they register with a core server that maintains a shared whiteboard (cache) that can be used by multiple read replica servers. As part of the startup process, the read replica catches up to the core server. The read replicas do not use the Raft protocol. Instead, they poll the core servers to obtain the updates to the database that they must apply locally.
Here is what you would see if you had a cluster with three core servers and two read replica servers running:
Unlike core servers where applications use bolt+routing to access the database, clients of read replica servers use bolt.
Since the read replica servers are considered “transient”, when they shut down, there is no effect to the operation of the cluster. Of course, detection of a shutdown when it is related to a hardware or network failure must be detected so that a new read replica server can be started as clients depend on read access can continue their work.
In this Exercise, you will see how read replica servers can be used to retrieve changed data from the core servers.
Before you begin
docker start replica1 replica2.
docker exec -it replica2 /var/lib/neo4j/bin/cypher-shell -u neo4j -p training-helps "CALL dbms.cluster.overview();". Do you see all three core servers and the two read replica servers?
./addPerson.sh bolt+routing 13687 "Kong". This will add a Person node to the database.
./readPerson.sh bolt 22687 "Kong"to confirm that the data is available.
You have now seen how updates to the core servers in a cluster must be coordinated by the server that is currently the LEADER and how reads and writes are performed in a cluster using the bolt and bolt+routing protocols against the core servers and reads are performed in a cluster using the bolt protocol against the read replica servers.
The minimum_core_cluster_size_at_runtime property specifies the number of servers that will actively participate in the cluster at runtime. The number of core servers that start and join the cluster is used to calculate what the quorum is for the cluster. For example, if the number of core servers started is three, then quorum is two. If the number of core servers started is four, then quorum is three. If the number of core servers started is five, then quorum is three. Quorum is important in a cluster as it dictates the behavior of the cluster when core servers are added to or removed from the cluster at runtime.
As an administrator, you must understand which core servers are participating in the cluster and in particular, what the current quorum is for the cluster.
If a core server shuts down, the cluster can still operate provided the number of core servers is equal to or greater than quorum. For example, if the current number of core servers is three, quorum is two. Provided the cluster has two core servers, it is considered operational for updates. If the cluster maintains quorum, then it is possible to add a different core server to the cluster since a quorum must exist for voting in a new core server.
If the LEADER core server shuts down, then one of the other FOLLOWER core servers assumes the role of LEADER, provided a quorum still exists for the cluster. If a cluster is left with only FOLLOWER core servers, this is because quorum no longer exists and as a result, the database is read-only. As an administrator, you must ensure that your cluster always has a LEADER.
The core servers that are used to start the cluster (membership) are important. Only core servers that originally participated in the cluster can be running in order to add a new core server to the cluster.
Follow this video to understand the life-cycle of a cluster and how quorum is used for fault-tolerance for a cluster:
If a core server goes down and you cannot restart it, you have two options:
Option (1) is much easier so a best practice is to always specify additional hosts that could be used as replacement core servers in the membership list for a cluster. This will enable you to add core servers to the cluster without needing to stop the cluster.
In addition to using Cypher to retrieve the overview state of the cluster, there are also REST APIs for accessing information about a particular server. For example, you can query the status of the cluster as follows:
curl -u neo4j:training-helps localhost:11474/db/manage/server/causalclustering/status where this query is made against the core1 server:
Or, if you want to see it a particular server is writable (part of a “healthy” cluster). For example, you can get that information as follows:
curl -u neo4j:training-helps localhost:11474/db/manage/server/causalclustering/writable where this query is made against the core1 server:
Using the REST API enables you as an administrator to script checks against the cluster to ensure that it is running properly and available to the clients.
The Neo4j Operations Manual documents many properties that are related to clusters. Here are a few you may want to consider for your deployment:
TRUEcan reduce the number of LEADER switches, especially when a new member is introduced to the cluster.
In this Exercise, you gain some experience monitoring the cluster as servers shut down and as servers are added.
Before you begin
Ensure that you have performed the steps in Exercise 4 and you have a cluster with core1, core2, and core3 started, as well as replica1 and replica2.
docker exec -it core1 /var/lib/neo4j/bin/cypher-shell -u neo4j -p training-helps "CALL dbms.cluster.overview();". Make a note of which core server is the LEADER. In this example, core3 is the LEADER.
docker exec -it replica1 /var/lib/neo4j/bin/cypher-shell -u neo4j -p training-helps "CALL dbms.cluster.overview();". Do you see that another core server has assumed the LEADER role?
Many large enterprises deploy large datasets that need to be distributed to many physical locations. To deploy a cluster to more than one physical location, a best practice is to host the core servers in one data center, and host the read replicas in another data center. Neo4j also supports hosting core servers in multiple locations. To host Neo4j servers that are geographically distributed, you need a multi-data center license and you must configure it in your neo4j.conf file by setting the multi_dc_license property to
true. When doing so, there is more configuration that you must do to ensure that clients are routed to the servers that are physically closest to them. You do this by configuring policy groups for your cluster. If policy groups have been configured for the servers, the application drivers instances must be created to use the policy groups. With policy groups, writes are always routed to the LEADER, but reads are routed to any FOLLOWER that is available. This enables the cluster and driver to automatically perform load balancing.
Here is a video that shows you why you might want to configure a Causal Cluster to operate in multiple data centers:
Read more about configuring clusters for multi-data center in the Neo4j Operations Manual
Both read and write client applications can create bookmarks within a session that enable them to quickly access a location in the database. The bookmarks can be passed between sessions. See the Neo4j Developer Manual for details about writing code that uses bookmarks.
The database for a cluster is backed up online. You must specify
dbms.backup.enabled=true in the configuration for each core server in the cluster.
The core server can use its transaction port or the backup port for backup operations. You typically use the backup port. Here is the setting that you must add to the configuration:
A best practice is to create and use a read replica server for the backup. In doing so, the read replica server will be in catchup mode with the core servers but can ideally keep up with the committed transactions on the core servers. You can check to see what the last transaction ID is on a core server vs. a read replica by executing the Cypher statement:
CALL dbms.listTransactions() YIELD transactionId; on each server. Each server will have a different last transaction ID, but as many transactions are performed against the cluster, you should see these values increasing at the same rate. If you find that the read replica is far behind in catching up, you may want to consider using a core server for the backup. If you use a core server for a backup, it could degrade performance of the cluster. If you want to use a core server for backup, you should increase the number of core servers in the cluster, for example from three to five.
For backing up a cluster, you must first decide which server and port will be used for the backup (backup client). You can backup using either a backup port or a transaction port. In addition, in a real application you will want the backup to be encrypted. For this you must use SSL. Security and encryption is covered later in this training.
You log in to the server from where you will be performing the backup (typically a read replica server) and then you perform the backup with these suggested settings:
neo4j-admin backup --backup-dir=<backup-path> --name=<backup-name> --from=<core-server:backup-port> --protocol=catchup --check-consistency=true
You can add more to the backup command as you can read about in the Neo4j Operations Manual.
In this example, we have logged in to the read replica server and we perform the backup using the address and backup port for the LEADER, bcore3. We also specify the location of the backup files and also that we want the backup to be checked for consistency.
Note that this is not an encrypted backup. You will learn about encryption later in this training when you learn about security.
In this Exercise, you gain some experience backing up a cluster. Because the Docker containers are created without backup configured, in order to back up a cluster, you will need to create a different network that will be used for testing backups. Then you will create the core servers and read replica server to work with backing up the database.
Before you begin
Stop all core and read replica servers.
neo4j-adminspecifying the LEADER port for the backup, use the catchup protocol, and place the backup the logs/backups folder, naming the backup backup1.
Suppose you want to set up a cluster that can survive at least two failures and still be considered fault-tolerant. How many LEADERS and FOLLOWERS will this cluster have at a minimum?
Select the correct answer.
What protocol should application clients use to update a database in the cluster?
Select the correct answer.
In a cluster, which servers have their own databases?
Select the correct answers.
You should now be able to: