Cluster server discovery

In order to form or connect to a running cluster, any new member needs to know the addresses of at least some of the other servers in the cluster. This information is used to bind to the servers in order to run the discovery protocol and get the full information about the cluster. The best way to do this depends on the configuration in each specific case.

If the addresses of the other cluster members are known upfront, they can be listed explicitly. This is convenient, but has limitations:

  • If servers are replaced and the new members have different addresses, the list becomes outdated. An outdated list can be avoided by ensuring that the new members can be reached via the same address as the old members, but this is not always practical.

  • Under some circumstances the addresses are unknown when configuring the cluster. This can be the case, for example, when using container orchestration to deploy a cluster.

Additional mechanisms for using DNS are provided for the cases where it is not practical or possible to explicitly list the addresses of cluster members to discover.

The discovery configuration is just used for initial discovery and a running cluster continuously exchanges information about changes to the topology. The behavior of the initial discovery is determined by the parameters dbms.cluster.discovery.resolver_type and dbms.cluster.discovery.endpoints, and is described in the following sections.

Discovery using a list of server addresses

If the addresses of the other cluster members are known upfront, they can be listed explicitly. Use the default dbms.cluster.discovery.resolver_type=LIST and hard code the addresses in the configuration of each machine. This alternative is illustrated by Configure a cluster with three servers example.

Discovery using DNS with multiple records

When using initial discovery with DNS, a DNS record lookup is performed when a server starts up. Once a server has joined a cluster, further membership changes are communicated amongst the servers in the cluster as part of the discovery service.

The following DNS-based mechanisms can be used to get the addresses of other servers in the cluster for discovery:

dbms.cluster.discovery.resolver_type=DNS

With this configuration, the initial discovery members are resolved from DNS A records to find the IP addresses to contact. The value of dbms.cluster.discovery.endpoints should be set to a single domain name and the port of the discovery service. For example: dbms.cluster.discovery.endpoints=cluster01.example.com:5000. The domain name should return an A record for every server in the cluster when a DNS lookup is performed. Each A record returned by DNS should contain the IP address of the server in the cluster. The configured server uses all the IP addresses from the A records to join or form a cluster.

The discovery port must be the same on all servers when using this configuration. If this is not possible, consider using the discovery type SRV instead.

dbms.cluster.discovery.resolver_type=SRV

With this configuration, the initial discovery members are resolved from DNS SRV records to find the IP addresses/hostnames and discovery service ports to contact. The value of dbms.cluster.discovery.endpoints should be set to a single domain name and the port set to 0. For example: dbms.cluster.discovery.endpoints=cluster01.example.com:0. The domain name should return a single SRV record when a DNS lookup is performed. The SRV record returned by DNS should contain the IP address or hostname, and the discovery port for the servers to be discovered. The configured server uses all the addresses from the SRV record to join or form a cluster.

Discovery in Kubernetes

A special case is when a cluster is running in Kubernetes and each server is running as a Kubernetes service. Then, the addresses of the other servers can be obtained using the List Service API, as described in the Kubernetes API documentation.

The following settings are used to configure for this scenario:

With this configuration, dbms.cluster.discovery.endpoints is not used and any value assigned to it is ignored.

  • The pod running Neo4j must use a service account that has permission to list services. For further information, see the Kubernetes documentation on RBAC authorization or ABAC authorization.

  • The configured server.discovery.advertised_address must exactly match the Kubernetes-internal DNS name, which is of the form <service-name>.<namespace>.svc.cluster.local.

As with DNS-based methods, the Kubernetes record lookup is only performed at startup.