This chapter provides instructions for installation and basic usage of the Neo4j Graph Data Science library.

The Neo4j Graph Data Science (GDS) library is delivered as a plugin to the Neo4j Graph Database. The plugin needs to be installed into the database and added to the allowlist in the Neo4j configuration. There are two main ways of achieving this, which we will detail in this chapter.

1. Supported Neo4j versions

Below is the compatibility matrix for The GDS library vs Neo4j. In general, you can count on the latest version of GDS supporting the latest version of Neo4j and vice versa, and we recommend you always upgrade to that combination.

We list software with major and minor version only, e.g. GDS library 1.5. You should read that as any patch version of that major+minor version, but again, do upgrade to the latest patch always, to ensure you get all bug fixes included.

Not finding your version of GDS or Neo4j listed? Time to upgrade!

Neo4j Graph Data Science Neo4j version




4.1 [1]


1. There is a bug in Neo4j 4.1.1 that can lead to an exception when using Cypher projection. If possible, use the lastest patch version.

2. Neo4j Desktop

The most convenient way of installing the GDS library is through the Neo4j Desktop plugin called Neo4j Graph Data Science. The plugin can be found in the 'Plugins' tab of a database.

neo4j desktop gds

The installer will download the GDS library and install it in the 'plugins' directory of the database. It will also add the following entry to the settings file:*

This configuration entry is necessary because the GDS library accesses low-level components of Neo4j to maximise performance.

If the procedure allowlist is configured, make sure to also include procedures from the GDS library:*
Before Neo4j 4.2, the configuration setting is called

3. Neo4j Server

The GDS library is intended to be used on a standalone Neo4j server.

Running the GDS library in a Neo4j Causal Cluster is not supported. Read more about how to use GDS in conjunction with Neo4j Causal Cluster deployment below.

On a standalone Neo4j Server, the library will need to be installed and configured manually.

  1. Download neo4j-graph-data-science-[version].jar from the Neo4j Download Center and copy it into the $NEO4J_HOME/plugins directory.

  2. Add the following to your $NEO4J_HOME/conf/neo4j.conf file:*

    This configuration entry is necessary because the GDS library accesses low-level components of Neo4j to maximise performance.

  3. Check if the procedure allowlist is enabled in the $NEO4J_HOME/conf/neo4j.conf file and add the GDS library if necessary:*
    Before Neo4j 4.2, the configuration setting is called
  4. Restart Neo4j

3.1. Verifying installation

To verify your installation, the library version can be printed by entering into the browser in Neo4j Desktop and calling the gds.version() function:

RETURN gds.version()

To list all installed algorithms, run the gds.list() procedure:

CALL gds.list()

4. Enterprise Edition Configuration

Unlocking the Enterprise Edition of the Neo4j Graph Data Science library requires a valid license key. To register for a license, please contact Neo4j at

The license is issued in the form of a license key file, which needs to be placed in a directory accessible by the Neo4j server. You can configure the location of the license key file by setting the gds.enterprise.license_file option in the neo4j.conf configuration file of your Neo4j installation. The location must be specified using an absolute path. It is necessary to restart the database when configuring the license key for the first time and every time the license key is changed, e.g., when a new license key is added or the location of the key file changes.

Example configuration for the license key file:


If the gds.enterprise.license_file setting is set to a non-empty value, the Neo4j Graph Data Science library will verify that the license key file is accessible and contains a valid license key. When a valid license key is configured, all Enterprise Edition features are unlocked. In case of a problem, e.g, when the license key file is inaccessible, the license has expired or is invalid for any other reason, all calls to the Neo4j Graph Data Science Library will result in an error, stating the problem with the license key.

5. Neo4j Docker

The Neo4j Graph Data Science library is available as a plugin for Neo4j on Docker. The plugins guide for Docker is found at the operations manual.

To run a Neo4j Container with GDS available, you can run

docker run -it --rm \
  --publish=7474:7474 --publish=7687:7687 \
  --user="$(id -u):$(id -g)" \
  -e NEO4J_AUTH=none \
  --env NEO4JLABS_PLUGINS='["graph-data-science"]' \

6. Neo4j Causal Cluster

A Neo4j Causal Cluster consists of multiple machines that together support a highly available database management system. The GDS library uses main memory on a single machine for hosting graphs in the graph catalog and computing algorithms over these. These two architectures are not compatible and should not be used in conjunction. A GDS workload will attempt to consume most of the system resources of the machine during runtime, which may make the machine unresponsive for extended periods of time. For these reasons, we strongly advise against running GDS in a cluster as this potentially leads to data corruption or cluster outage.

To make use of GDS on graphs hosted by a Neo4j Causal Cluster deployment, these graphs should be detached from the running cluster. This can be accomplished in several ways, including:

  1. Dumping a snapshot of the Neo4j store and importing it in a separate standalone Neo4j server.

  2. Adding a Read Replica to the Neo4j Causal Cluster and then detaching it to safely operate GDS on a snapshot in separation from the Neo4j Causal Cluster.

  3. Adding a Read Replica to the Neo4j Causal Cluster and configuring it for GDS workloads. Be aware that the in-memory graph and the underlying database will eventually become out of sync due to updates to the Read Replica. Since GDS can consume all available resources, responsiveness of the Read Replica might decrease and its state might fall behind the cluster. Using GDS in this scenario requires:

    • installing GDS on the Read Replica

    • using mutate or stream invocation modes

    • consuming results from GDS workloads directly via Cypher (see Utility functions)

    • not using GDS write-back features (writing triggers many large transactions and will potentially terminate the cluster)

After the GDS workload has finished on a detached machine (for cases 1. and 2.) it now contains out-of-sync results written to its copied version of the graph from the Neo4j Causal Cluster. To integrate these results back to the cluster, custom programs are necessary.

7. Additional configuration options

In order to make use of certain features of the GDS library, additional configuration is necessary. Configuration is done in the neo4j.conf configuration file before starting the DBMS. The following features require such additional configuration:

7.1. Graph export

Exporting graphs to CSV files requires the configuration parameter gds.export.location to be set to the absolut path to the folder in which exported graphs will be stored. This directory has to be writable by the Neo4j process.

7.2. Model persistence

The model persistence feature requires the configuration parameter gds.model.store_location to be set to the absolut path to the folder in which the models will be stored. This directory has to be writable by the Neo4j process.

8. System Requirements

8.1. Main Memory

The GDS library runs within a Neo4j instance and is therefore subject to the general Neo4j memory configuration.

memory usage
Figure 1. GDS heap memory usage

8.1.1. Heap size

The heap space is used for storing graph projections in the graph catalog and algorithm state. When writing algorithm results back to Neo4j, heap space is also used for handling transaction state (see dbms.tx_state.memory_allocation). For purely analytical workloads, a general recommendation is to set the heap space to about 90% of the available main memory. This can be done via dbms.memory.heap.initial_size and dbms.memory.heap.max_size.

To better estimate the heap space required to create in-memory graphs and run algorithms, consider the Memory Estimation feature. The feature estimates the memory consumption of all involved data structures using information about number of nodes and relationships from the Neo4j count store.

8.1.2. Page cache

The page cache is used to cache the Neo4j data and will help to avoid costly disk access.

For purely analytical workloads using native projections, it is recommended to decrease dbms.memory.pagecache.size in favor of an increased heap size. However, setting a minimum page cache size is still important while creating in-memory graphs:

  • For native projections, the minimum page cache size for creating the in-memory graph can be roughly estimated by 8KB * 100 * readConcurrency.

  • For Cypher projections, a higher page cache is required depending on the query complexity.

However, if it is required to write algorithm results back to Neo4j, the write performance is highly depended on store fragmentation as well as the number of properties and relationships to write. We recommend starting with a page cache size of roughly 250MB * writeConcurrency and evaluate write performance and adapt accordingly. Ideally, if the memory estimation feature has been used to find a good heap size, the remaining memory can be used for page cache and OS.

Decreasing the page cache size in favor of heap size is not recommended if the Neo4j instance runs both, operational and analytical workloads at the same time. See Neo4j memory configuration for general information about page cache sizing.

8.2. CPU

The library uses multiple CPU cores for graph projections, algorithm computation, and results writing. Configuring the workloads to make best use of the available CPU cores in your system is important to achieve maximum performance. The concurrency used for the stages of projection, computation and writing is configured per algorithm execution, see Common Configuration parameters

The default concurrency used for most operations in the Graph Data Science library is 4.

The maximum concurrency that can be used is limited depending on the license under which the library is being used:

  • Neo4j Graph Data Science Library - Community Edition (GDS CE)

    • The maximum concurrency in the library is limited to 4.

  • Neo4j Graph Data Science Library - Enterprise Edition (GDS EE)

Concurrency limits are determined based on whether you have a GDS EE license, or if you are using GDS CE. The maximum concurrency limit in the graph data science library is not set based on your edition of the Neo4j database.