Appendix B. Tutorial

Table of Contents

This chapter contains tutorials for deploying and operating Neo4j.

B.1. Set up a Neo4j cluster

This guide will give step-by-step instructions for setting up a basic cluster of three separate machines. For a description of the clustering architecture and related design considerations, refer to Introduction.

B.1.1. Important configuration settings

Each instance in a Neo4j HA cluster must be assigned an integer ID, which serves as its unique identifier. At startup, a Neo4j instance contacts the other instances specified in the ha.initial_hosts configuration option.

When an instance establishes a connection to any other, it determines the current state of the cluster and ensures that it is eligible to join. To be eligible the Neo4j instance must host the same database store as other members of the cluster (although it is allowed to be in an older state), or be a new deployment without a database store.

Please note that IP Addresses or Hostnames should be explicitly configured for the machines participating in the cluster. Neo4j will attempt to configure IP addresses for itself in the absence of explicit configuration.

B.1.1.1. dbms.mode

dbms.mode configures the operating mode of the database.

For cluster mode it is set to: dbms.mode=HA

B.1.1.2. ha.server_id

ha.server_id is the cluster identifier for each instance. It must be a positive integer and must be unique among all Neo4j instances in the cluster.

For example, ha.server_id=1.

B.1.1.3. ha.host.coordination

ha.host.coordination is an address/port setting that specifies where the Neo4j instance will listen for cluster communications (like hearbeat messages). The default port is 5001. In the absence of a specified IP address, Neo4j will attempt to find a valid interface for binding. While this behavior typically results in a well-behaved server, it is strongly recommended that users explicitly choose an IP address bound to the network interface of their choosing to ensure a coherent cluster deployment.

For example, ha.host.coordination=192.168.33.22:5001 will listen for cluster communications on the network interface bound to the 192.168.33.0 subnet on port 5001.

B.1.1.4. ha.initial_hosts

ha.initial_hosts is a comma separated list of address/port pairs, which specify how to reach other Neo4j instances in the cluster (as configured via their ha.host.coordination option). These hostname/ports will be used when the Neo4j instances start, to allow them to find and join the cluster. Specifying an instance’s own address is permitted. Do not use any whitespace in this configuration option.

For example, ha.initial_hosts=192.168.33.22:5001,192.168.33.21:5001 will attempt to reach Neo4j instances listening on 192.168.33.22 on port 5001 and 192.168.33.21 on port 5001 on the 192.168.33.0 subnet.

B.1.1.5. ha.host.data

ha.host.data is an address/port setting that specifies where the Neo4j instance will listen for transactions from the cluster master. The default port is 6001. In the absence of a specified IP address, Neo4j will attempt to find a valid interface for binding. While this behavior typically results in a well-behaved server, it is strongly recommended that users explicitly choose an IP address bound to the network interface of their choosing to ensure a coherent cluster topology.

ha.host.data must use a different port to ha.host.coordination.

For example, ha.host.data=192.168.33.22:6001 will listen for transactions from the cluster master on the network interface bound to the 192.168.33.0 subnet on port 6001.

The ha.host.coordination and ha.host.data configuration options are specified as <IP address>:<port>.

For ha.host.data the IP address must be the address assigned to one of the host’s network interfaces.

For ha.host.coordination the IP address must be the address assigned to one of the host’s network interfaces, or the value 0.0.0.0, which will cause Neo4j to listen on every network interface.

Either the address or the port can be omitted, in which case the default for that part will be used. If the address is omitted, then the port must be preceded with a colon (eg. :5001).

The syntax for setting the port range is: <hostname>:<first port>[-<second port>]. In this case, Neo4j will test each port in sequence, and select the first that is unused. Note that this usage is not permitted when the hostname is specified as 0.0.0.0 (the "all interfaces" address).

B.1.2. Download and configure

  • Download Neo4j Enterprise Edition from the Neo4j download site, and unpack on three separate machines.
  • Configure the HA related settings for each installation as outlined below. Note that all three installations have the same configuration except for the ha.server_id property.
Neo4j instance #1 — neo4j-01.local

conf/neo4j.conf. 

# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 1

# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001

# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA

dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.address=0.0.0.0:7474

Neo4j instance #2 — neo4j-02.local

conf/neo4j.conf. 

# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 2

# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001

# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA

dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.address=0.0.0.0:7474

Neo4j instance #3 — neo4j-03.local

conf/neo4j.conf. 

# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 3

# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001

# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA

dbms.connector.http.type=HTTP
dbms.connector.http.enabled=true
dbms.connector.http.address=0.0.0.0:7474

B.1.3. Start the Neo4j Servers

Start the Neo4j servers as usual. Note that the startup order does not matter.

neo4j-01$ ./bin/neo4j start
neo4j-02$ ./bin/neo4j start
neo4j-03$ ./bin/neo4j start

When running in HA mode, the startup script returns immediately instead of waiting for the server to become available. This is because the instance does not accept any requests until a cluster has been formed. In the example above this happens when you start the second instance. To keep track of the startup state you can follow the messages in neo4j.log — the path is printed before the startup script returns.

Now, you should be able to access the three servers and check their HA status. Open the locations below in a web browser and issue the following command in the editor after having set a password for the database: :play sysinfo

You can replace database #3 with an 'arbiter' instance, see Section 2.4.2, “Arbiter instances”.

That’s it! You now have a Neo4j HA cluster of three instances running. You can start by making a change on any instance and those changes will be propagated between them. For more HA related configuration options take a look at Section 2.4.1, “Setup and configuration”.

B.2. Set up a local cluster

If you want to start a cluster similar to the one described above, but for development and testing purposes, it is convenient to run all Neo4j instances on the same machine. This is easy to achieve, although it requires some additional configuration as the defaults will conflict with each other. Furthermore, the default dbms.memory.pagecache.size assumes that Neo4j has the machine to itself. If we in this example assume that the machine has 4 gigabytes of memory, and that each JVM consumes 500 megabytes of memory, then we can allocate 500 megabytes of memory to the page cache of each server.

B.2.1. Download and configure

  1. Download Neo4j Enterprise Edition from the Neo4j download site, and unpack into three separate directories on your test machine.
  2. Configure the HA related settings for each installation as outlined below.

    Neo4j instance #1 — ~/neo4j-01

    conf/neo4j.conf. 

    # Reduce the default page cache memory allocation
    dbms.memory.pagecache.size=500m
    
    # Port to listen to for incoming backup requests.
    dbms.backup.address = 127.0.0.1:6366
    
    # Unique server id for this Neo4j instance
    # can not be negative id and must be unique
    ha.server_id = 1
    
    # List of other known instances in this cluster
    ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating cluster information
    # with the other neo4j instances in the cluster.
    ha.host.coordination = 127.0.0.1:5001
    
    # IP and port for this instance to bind to for communicating data with the
    # other neo4j instances in the cluster.
    ha.host.data = 127.0.0.1:6363
    
    # HA - High Availability
    # SINGLE - Single mode, default.
    dbms.mode=HA
    
    dbms.connector.http.type=HTTP
    dbms.connector.http.enabled=true
    dbms.connector.http.address=0.0.0.0:7474

    Neo4j instance #2 — ~/neo4j-02

    conf/neo4j.conf. 

    # Reduce the default page cache memory allocation
    dbms.memory.pagecache.size=500m
    
    # Port to listen to for incoming backup requests.
    dbms.backup.address = 127.0.0.1:6367
    
    # Unique server id for this Neo4j instance
    # can not be negative id and must be unique
    ha.server_id = 2
    
    # List of other known instances in this cluster
    ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating cluster information
    # with the other neo4j instances in the cluster.
    ha.host.coordination = 127.0.0.1:5002
    
    # IP and port for this instance to bind to for communicating data with the
    # other neo4j instances in the cluster.
    ha.host.data = 127.0.0.1:6364
    
    # HA - High Availability
    # SINGLE - Single mode, default.
    dbms.mode=HA
    
    dbms.connector.http.type=HTTP
    dbms.connector.http.enabled=true
    dbms.connector.http.address=0.0.0.0:7475

    Neo4j instance #3 — ~/neo4j-03

    conf/neo4j.conf. 

    # Reduce the default page cache memory allocation
    dbms.memory.pagecache.size=500m
    
    # Port to listen to for incoming backup requests.
    dbms.backup.address = 127.0.0.1:6368
    
    # Unique server id for this Neo4j instance
    # can not be negative id and must be unique
    ha.server_id = 3
    
    # List of other known instances in this cluster
    ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating cluster information
    # with the other neo4j instances in the cluster.
    ha.host.coordination = 127.0.0.1:5003
    
    # IP and port for this instance to bind to for communicating data with the
    # other neo4j instances in the cluster.
    ha.host.data = 127.0.0.1:6365
    
    # HA - High Availability
    # SINGLE - Single mode, default.
    dbms.mode=HA
    
    dbms.connector.http.type=HTTP
    dbms.connector.http.enabled=true
    dbms.connector.http.address=0.0.0.0:7476

B.2.1.1. Start the Neo4j Servers

Start the Neo4j servers as usual. Note that the startup order does not matter.

localhost:~/neo4j-01$ ./bin/neo4j start
localhost:~/neo4j-02$ ./bin/neo4j start
localhost:~/neo4j-03$ ./bin/neo4j start

Now, you should be able to access the three servers and check their HA status. Open the locations below in a web browser and issue the following command in the editor after having set a password for the database: :play sysinfo

License

Creative Commons 3.0

You are free to
Share
copy and redistribute the material in any medium or format
Adapt
remix, transform, and build upon the material

for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms
Attribution
You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
ShareAlike
If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
No additional restrictions
You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Notices.  You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.

No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

See http://creativecommons.org/licenses/by-sa/3.0/ for further details. The full license text is available at http://creativecommons.org/licenses/by-sa/3.0/legalcode.