Clustering glossary

Term Description

Allocator

A component in the cluster that allocates databases to servers according to the topology constraints specified and an allocation strategy.

Asynchronous replication

Enables efficient scale-out of secondary database copies but offers no guarantees under fault conditions. The data present in the secondary copy is not guaranteed to be up-to-date with a majority of the database’s primary copies.

Availability

The ability to access data in a database. A database can be available for read-write, read-only, or altogether unavailable. A clustered database is fault-tolerant, i.e. it can maintain both read and write availability if some primaries fail (see Fault tolerance for more information). If the number of failed primaries exceeds the fault tolerance limit, the database becomes read-only. Should all copies fail, the database becomes unavailable.

Bookmark

A marker the client can request from the cluster to ensure that it is able to read its own writes so that the application’s state is consistent and only databases that have a copy of the bookmark are permitted to respond.

Causal consistency

When a client (driver) creates a session and executes a query, the responding server issues the client a bookmark. This reflects the state of the database copy on that server at the time the query was executed. The bookmark is passed along and updated by all subsequent queries in the session, regardless of which server executes what query. A bookmark can only be updated monotonically increasing. If a server is behind the state in the bookmark, it waits until it has caught up, or time out the query. Thus, clients executing queries within a session are guaranteed to read their own writes, and only see successively later states of the database. This is sometimes also referred to as session consistency.

Cluster

A collection of servers running Neo4j that are configured to communicate with each other. These may be used to host databases and the databases may be configured to replicate across servers in the cluster thus achieving read scalability or high availability. A minimum of three servers is required for the cluster to be fault-tolerant.

Database

The data store for the nodes, relationships, and properties that make up the graph. Multiple databases can be hosted on a Database Management Server (DBMS).

Database Management System (DBMS)

The Neo4j services and system database running on an instance of a single server or cluster to provide one or more databases.

Deallocate

An act of safely removing (i.e. without loss of data or reduced fault tolerance) a database from a server, or removing a server from a cluster.

Disaster recovery

A manual intervention to restore availability of a cluster, or databases within a cluster.

Election

In the event that a leader becomes unresponsive, followers automatically trigger an election and vote for a new leader. A majority is required for the vote to be successful.

Fault tolerance

A guarantee that a database can maintain persistence and availability in the event of one or more failures. The number of failures f that can be tolerated is dependent on the number of primaries n for the database and follows the formula f = (n-1)/2. In the event that more than f primaries fail, the database can no longer process write transactions and becomes read-only.

Follower

A primary copy of a database acting as a follower, receives and acknowledges synchronous writes from the leader.

Leader

A single primary copy of a database is designated as the leader. It receives all write transactions from clients and replicates writes synchronously to followers and asynchronously to secondary copies of the database. Each database can have a different leader within the cluster.

Primary

A copy of the database that is able to process write transactions and is eligible to be elected as a leader. It participates in fault tolerant writes as it is part of the majority required to acknowledge and commit write transactions.

Read scaling

Adding secondary copies of the database to the cluster can offload read queries from the primary databases and thus reduce the load and aid write performance of the cluster.

Secondary

An asynchronously replicated copy of the database that provides read scaling within the cluster. It is also suitable for running graph analytic workloads in a cluster using Graph Data Science and taking backups without incurring load on the primary.

Seed

A file used to create a copy of a database on a single instance or on a member of a cluster. This can be a database dump or a database backup. Seed can also be used as a verb to describe the act seeding a cluster from a backup.

Server

A physical machine, a virtual machine, or a container running Neo4j DBMS. The server can be standalone or part of a cluster.

Session consistency

An alternative name for Neo4j’s causal consistency.

Standalone server

A single server, or container, running Neo4j DBMS and not part of a cluster.

Synchronous replication

When attempting to commit a transaction, the leader primary replicates the transaction and block, requiring the follower primaries to acknowledge the replication before allowing the commit to proceed. This blocking replication is known as synchronous, and ensures data durability and consistency within the cluster. See also asynchronous replication.

Topology

A configuration that describes how the copies of a database should be spread across the servers in a cluster.