Neo4j Glossary

The Neo4j glossary contains definitions of terms that are specific to Neo4j and Cypher®.

A

allocator (cluster)

A component in the cluster that allocates databases to servers according to the topology constraints specified and an allocation strategy.

asynchronous replication (cluster), see also synchronous replication

Asynchronous replication is used by secondary copies to poll for new transactions, which means they cannot be guaranteed to have received the most recent transactions. This enables efficient scale-out of read-performance.

Aura instance

A fully-managed database represented by a single DBID, that is running in the Neo4j Aura cloud.

auto-commit transaction

An automatically committed transaction that contains a single query.

B

Bolt protocol

Bolt is a protocol used for interaction between Neo4j instances and drivers.

bookmark, see also: causal consistency

A marker the client can request from the cluster to ensure that it is able to read its own writes so that the application’s state is consistent and only databases that have a copy of the bookmark are permitted to respond.

C

category (Bloom)

A category is based on a node label and is defined in a Perspective as a way of visually distinguishing nodes with the same label(s).

causal consistency, see also bookmark

All servers in a cluster agree on the order in which transactions take place. The position of a server on the causal chain can be guaranteed using a bookmark.

cluster, see also: fault tolerance, standalone

A Neo4j DBMS that spans multiple servers working together to increase fault tolerance and/or read scalability. Databases on a cluster may be configured to replicate across servers in the cluster thus achieving read scalability or high availability.

client application

Software that interacts with a Neo4j server.

commit

A commit is the successful completion of a transaction, which ensures durability of any changes made. For more details, visit Operations Manual → Transaction management.

Composite database, see also: fabric

Composite databases are the means to access partitioned graph data with a single Cypher query.

constraint

Constraints are sets of data modeling rules that ensure the data is consistent and reliable.

Core server (Neo4j 4), see also: primary vs secondary

A server in a cluster operating in read/write mode. This is replaced in Neo4j 5 by the database-level configuration of primary and secondary databases.

Cypher®

Neo4j’s graph query language.

D

data model

A data model defines how information is organized in a database. A good data model will make querying and understanding your data easier. In Neo4j, the data models have a graph structure.

database, see also: database vs graph

A database is a container used by the DBMS to manage and store graph data. The physical structure of data is controlled by the database.

database vs graph

Databases are the physical containers of graph data. Graphs are the logical structure of data in Neo4j.

Database Management System

Database Management System, or DBMS, capable of managing multiple databases. A DBMS may run on a single server, or span several servers configured as a cluster.

database schema, see also: data model

The prescribed property existence and datatypes for nodes and relationships.

deallocate (cluster)

An act of removing a database from a server or a server from a cluster without loss of data or reduced fault tolerance.

degree (of a node)

The number of relationships of a specific node; loops are counted twice.

disaster recovery (cluster)

A manual intervention to restore availability of a cluster, or databases within a cluster.

driver

A software library that provides access to Neo4j from a particular programming language.

E

election (cluster), see also Raft protocol

In the event that the Raft leader becomes unresponsive, followers automatically trigger an election and vote for a new leader.

entity

A node or a relationship.

expression (Cypher)

A component of a Cypher query which produces values. It may be used in projections, as a predicate, or when setting properties on graph elements.

F

fabric, see also: composite database

Fabric is the architectural design of a unified system that provides a single access point to local or distributed graph data.

fault tolerance (cluster), see also: primary

A guarantee that a cluster can maintain a database’s persistence and availability in the event of one or more servers failing.

follower (cluster), see also leader

A primary copy of a database acting as a follower, receives and acknowledges synchronous writes from the leader.

G

Generative AI (GenAI)

A type of artificial intelligence (AI) system that generates text, images, or other media in response to prompts.

Generative Pre-Trained Transformer (GPT)

A type of GenAI model that combines two forms of training to produce foundation models. Specifically:

  • Pre-Training: Training general purpose capabilities on vast quantities of data.

  • Fine-Tuning: Training a finite number of supervised ML tasks on a small amount of hand-picked data.

graph, see also: database vs graph

A logical representation of a set of nodes where some pairs are connected by relationships.

I

index

Data structure that improves read performance of a database.

K

knowledge graph

A specific type of graph that has an organizing principle so that a user (or a computer system) can reason about the underlying data. The organizing principle provides an additional layer of structure that adds context to support knowledge discovery.

L

label

Marks a node as a member of a named and indexed subset. A node may be assigned zero or more labels.

Language Model (LM)

An ML approach that models the probability distribution over a sequence of words. Predicts the probabilities of next word/character in a sequence. Applications in GenAI as well as embedding, classification, and other ML tasks.

Large Language Model (LLM)

LMs consisting of large neural networks (billions of parameters) trained on large quantities of data often using self-supervised/semi-supervised approaches. Trained for general tasks and currently seen as the "GenAI for language/text".

LLM Hallucination

Language models generate text that is incorrect, nonsensical, or unreal.

  • Appear to answer questions confidently even if they do not have facts.

  • May provide contradicting or inconsistent responses to similar prompts.

leader (cluster), see also: follower

A single primary copy of a database is designated as the leader. It receives all write transactions from clients and replicates writes synchronously to followers and asynchronously to secondary copies of the database.

M

motif, see also: path and pattern

A description of a specific pattern within a graph.

N

node, see also: relationship

A node represents an entity or discrete object in your graph data model. Nodes can be connected by relationships, hold data in properties, and are classified by labels.

O

operator

A symbol representing a mathematical or logical operation.

P

parameter

Named value provided when running a Cypher statement.

path

A sequence of nodes and the relationships connecting them, that does not contain duplicate relationships. Several paths can match a pattern.

pattern, see also: motif

A specific arrangement of nodes and relationships that can be matched in a graph. A pattern follows a motif.

primary (cluster), see also: secondary

A copy of the database that is able to process write transactions and is eligible to be elected as a leader. It participates in fault tolerant writes as it is part of the majority required to acknowledge and commit write transactions.

primary vs secondary (cluster)

In a cluster, databases can operate in either primary or secondary mode. Primary databases are able to process write and read transactions, ensuring fault tolerance. Secondary databases are replicated asynchronously from primaries, and their main purpose is to provide read scaling within the cluster.

property

Properties are key-value pairs that are used for storing data on nodes and relationships.

Q

query (Cypher)

A statement that retrieves or writes information to a database.

R

Raft group

A group of servers that are participating in hosting a particular database in primary mode.

Raft group member

A server that is participating in a Raft group. A server can be a member of one or more groups.

Raft log

A shared log between all Raft group members that is guaranteed to be consistently updated and viewed by those members. The log contains both database data and operational state of the Raft group.

Raft protocol

The networking mechanism that enables a database to replicate its data across multiple servers to give high availability for accessing the data and high durability to the data stored.

Read Replica (Neo4j v4), see also: Core Server

A server in a cluster operating in read-only mode. This is replaced in Neo4j 5 by the database-level configuration of primary and secondary databases.

read scaling, see also: primary vs secondary

Distributing query load by creating additional database copies hosted in secondary mode (read-only).

relationship, see also: node

A relationship represents a connection between nodes in your graph data model. Relationships connect a source node to a target node, hold data in properties, and are classified by type.

S

secondary (cluster), see also: primary

An asynchronously replicated copy of the database that provides read scaling within the cluster.

seed (cluster)

A seed is a database dump or a full backup used to create a database on a cluster. This is sometimes called seeding.

server

A physical machine, a virtual machine, or a container running an instance of Neo4j. Servers can be standalone or part of a cluster.

standalone, see also: server

A single server running Neo4j and not part of a cluster.

system database

A database used by Neo4j to store system information.

synchronous replication (cluster), see also asynchronous replication

Synchronous replication requires the leader primary to replicate a transaction and block the commit until a quorum of the follower primaries acknowledges that the transaction is successfully replicated. Once the transaction is replicated, the commit is allowed to proceed. This ensures data durability and consistency within the cluster.

T

tenant (Aura)

An isolated environment that contains its own database instances, configurations, and resources.

topology (cluster)

A configuration that describes how the copies of a database should be spread across the servers in a cluster, see primary mode and secondary mode.

transaction

A transaction comprises a unit of work performed against a database. It is treated in a coherent and reliable way, independent of other transactions. Transactions comply with the ACID consistency model (atomic, consistent, isolated, and durable).

allocator (cluster)

A component in the cluster that allocates databases to servers according to the topology constraints specified and an allocation strategy.

asynchronous replication (cluster), see also synchronous replication

Asynchronous replication is used by secondary copies to poll for new transactions, which means they cannot be guaranteed to have received the most recent transactions. This enables efficient scale-out of read-performance.

Aura instance

A fully-managed database represented by a single DBID, that is running in the Neo4j Aura cloud.

auto-commit transaction

An automatically committed transaction that contains a single query.

Bolt protocol

Bolt is a protocol used for interaction between Neo4j instances and drivers.

bookmark, see also: causal consistency

A marker the client can request from the cluster to ensure that it is able to read its own writes so that the application’s state is consistent and only databases that have a copy of the bookmark are permitted to respond.

category (Bloom)

A category is based on a node label and is defined in a Perspective as a way of visually distinguishing nodes with the same label(s).

causal consistency, see also bookmark

All servers in a cluster agree on the order in which transactions take place. The position of a server on the causal chain can be guaranteed using a bookmark.

cluster, see also: fault tolerance, standalone

A Neo4j DBMS that spans multiple servers working together to increase fault tolerance and/or read scalability. Databases on a cluster may be configured to replicate across servers in the cluster thus achieving read scalability or high availability.

client application

Software that interacts with a Neo4j server.

commit

A commit is the successful completion of a transaction, which ensures durability of any changes made. For more details, visit Operations Manual → Transaction management.

Composite database, see also: fabric

Composite databases are the means to access partitioned graph data with a single Cypher query.

constraint

Constraints are sets of data modeling rules that ensure the data is consistent and reliable.

Core server (Neo4j 4), see also: primary vs secondary

A server in a cluster operating in read/write mode. This is replaced in Neo4j 5 by the database-level configuration of primary and secondary databases.

Cypher®

Neo4j’s graph query language.

data model

A data model defines how information is organized in a database. A good data model will make querying and understanding your data easier. In Neo4j, the data models have a graph structure.

database, see also: database vs graph

A database is a container used by the DBMS to manage and store graph data. The physical structure of data is controlled by the database.

database vs graph

Databases are the physical containers of graph data. Graphs are the logical structure of data in Neo4j.

Database Management System

Database Management System, or DBMS, capable of managing multiple databases. A DBMS may run on a single server, or span several servers configured as a cluster.

database schema, see also: data model

The prescribed property existence and datatypes for nodes and relationships.

deallocate (cluster)

An act of removing a database from a server or a server from a cluster without loss of data or reduced fault tolerance.

degree (of a node)

The number of relationships of a specific node; loops are counted twice.

disaster recovery (cluster)

A manual intervention to restore availability of a cluster, or databases within a cluster.

driver

A software library that provides access to Neo4j from a particular programming language.

election (cluster), see also Raft protocol

In the event that the Raft leader becomes unresponsive, followers automatically trigger an election and vote for a new leader.

entity

A node or a relationship.

expression (Cypher)

A component of a Cypher query which produces values. It may be used in projections, as a predicate, or when setting properties on graph elements.

fabric, see also: composite database

Fabric is the architectural design of a unified system that provides a single access point to local or distributed graph data.

fault tolerance (cluster), see also: primary

A guarantee that a cluster can maintain a database’s persistence and availability in the event of one or more servers failing.

follower (cluster), see also leader

A primary copy of a database acting as a follower, receives and acknowledges synchronous writes from the leader.

Generative AI (GenAI)

A type of artificial intelligence (AI) system that generates text, images, or other media in response to prompts.

Generative Pre-Trained Transformer (GPT)

A type of GenAI model that combines two forms of training to produce foundation models. Specifically:

  • Pre-Training: Training general purpose capabilities on vast quantities of data.

  • Fine-Tuning: Training a finite number of supervised ML tasks on a small amount of hand-picked data.

graph, see also: database vs graph

A logical representation of a set of nodes where some pairs are connected by relationships.

index

Data structure that improves read performance of a database.

knowledge graph

A specific type of graph that has an organizing principle so that a user (or a computer system) can reason about the underlying data. The organizing principle provides an additional layer of structure that adds context to support knowledge discovery.

label

Marks a node as a member of a named and indexed subset. A node may be assigned zero or more labels.

Language Model (LM)

An ML approach that models the probability distribution over a sequence of words. Predicts the probabilities of next word/character in a sequence. Applications in GenAI as well as embedding, classification, and other ML tasks.

Large Language Model (LLM)

LMs consisting of large neural networks (billions of parameters) trained on large quantities of data often using self-supervised/semi-supervised approaches. Trained for general tasks and currently seen as the "GenAI for language/text".

LLM Hallucination

Language models generate text that is incorrect, nonsensical, or unreal.

  • Appear to answer questions confidently even if they do not have facts.

  • May provide contradicting or inconsistent responses to similar prompts.

leader (cluster), see also: follower

A single primary copy of a database is designated as the leader. It receives all write transactions from clients and replicates writes synchronously to followers and asynchronously to secondary copies of the database.

motif, see also: path and pattern

A description of a specific pattern within a graph.

node, see also: relationship

A node represents an entity or discrete object in your graph data model. Nodes can be connected by relationships, hold data in properties, and are classified by labels.

operator

A symbol representing a mathematical or logical operation.

parameter

Named value provided when running a Cypher statement.

path

A sequence of nodes and the relationships connecting them, that does not contain duplicate relationships. Several paths can match a pattern.

pattern, see also: motif

A specific arrangement of nodes and relationships that can be matched in a graph. A pattern follows a motif.

primary (cluster), see also: secondary

A copy of the database that is able to process write transactions and is eligible to be elected as a leader. It participates in fault tolerant writes as it is part of the majority required to acknowledge and commit write transactions.

primary vs secondary (cluster)

In a cluster, databases can operate in either primary or secondary mode. Primary databases are able to process write and read transactions, ensuring fault tolerance. Secondary databases are replicated asynchronously from primaries, and their main purpose is to provide read scaling within the cluster.

property

Properties are key-value pairs that are used for storing data on nodes and relationships.

query (Cypher)

A statement that retrieves or writes information to a database.

Raft group

A group of servers that are participating in hosting a particular database in primary mode.

Raft group member

A server that is participating in a Raft group. A server can be a member of one or more groups.

Raft log

A shared log between all Raft group members that is guaranteed to be consistently updated and viewed by those members. The log contains both database data and operational state of the Raft group.

Raft protocol

The networking mechanism that enables a database to replicate its data across multiple servers to give high availability for accessing the data and high durability to the data stored.

Read Replica (Neo4j v4), see also: Core Server

A server in a cluster operating in read-only mode. This is replaced in Neo4j 5 by the database-level configuration of primary and secondary databases.

read scaling, see also: primary vs secondary

Distributing query load by creating additional database copies hosted in secondary mode (read-only).

relationship, see also: node

A relationship represents a connection between nodes in your graph data model. Relationships connect a source node to a target node, hold data in properties, and are classified by type.

secondary (cluster), see also: primary

An asynchronously replicated copy of the database that provides read scaling within the cluster.

seed (cluster)

A seed is a database dump or a full backup used to create a database on a cluster. This is sometimes called seeding.

server

A physical machine, a virtual machine, or a container running an instance of Neo4j. Servers can be standalone or part of a cluster.

standalone, see also: server

A single server running Neo4j and not part of a cluster.

synchronous replication (cluster), see also asynchronous replication

Synchronous replication requires the leader primary to replicate a transaction and block the commit until a quorum of the follower primaries acknowledges that the transaction is successfully replicated. Once the transaction is replicated, the commit is allowed to proceed. This ensures data durability and consistency within the cluster.

system database

A database used by Neo4j to store system information.

tenant (Aura)

An isolated environment that contains its own database instances, configurations, and resources.

topology (cluster)

A configuration that describes how the copies of a database should be spread across the servers in a cluster, see primary mode and secondary mode.

transaction

A transaction comprises a unit of work performed against a database. It is treated in a coherent and reliable way, independent of other transactions. Transactions comply with the ACID consistency model (atomic, consistent, isolated, and durable).