Chapter 2. Installation

This chapter provides instructions for installation and basic usage of the Neo4j Graph Data Science library.

The Neo4j Graph Data Science (GDS) library is delivered as a plugin to the Neo4j Graph Database. The plugin needs to be installed into the database and whitelisted in the Neo4j configuration. There are two main ways of achieving this, which we will detail in this chapter.

2.1. Supported Neo4j versions

The GDS library supports the following Neo4j versions:

Neo4j Graph Data Science Neo4j version

1.2.x

4.0.0

4.0.1

4.0.2

4.0.3

4.0.4

1.0.x, 1.1.x

3.5.9

3.5.10

3.5.11

3.5.12

3.5.13

3.5.14

3.5.15

3.5.16

3.5.17

2.2. Neo4j Desktop

The most convenient way of installing the GDS library is through the Neo4j Desktop plugin called Neo4j Graph Data Science. The plugin can be found in the 'Plugins' tab of a database.

neo4j desktop gds

The installer will download the GDS library and install it in the 'plugins' directory of the database. It will also add the following entry to the settings file:

dbms.security.procedures.unrestricted=gds.*

This configuration entry is necessary because the GDS library accesses low-level components of Neo4j to maximise performance.

If the procedure whitelist is configured, make sure to also include procedures from the GDS library:

dbms.security.procedures.whitelist=gds.*

2.3. Neo4j Server

If you are using a standalone Neo4j Server, the library will need to be installed and configured manually.

  1. Download neo4j-graph-data-science-[version]-standalone.jar from the Neo4j Download Center and copy it into the $NEO4J_HOME/plugins directory.
  2. Add the following to your $NEO4J_HOME/conf/neo4j.conf file:

    dbms.security.procedures.unrestricted=gds.*

    This configuration entry is necessary because the GDS library accesses low-level components of Neo4j to maximise performance.

  3. Check if the procedure whitelist is enabled in the $NEO4J_HOME/conf/neo4j.conf file and add the GDS library if necessary:

    dbms.security.procedures.whitelist=gds.*
  4. Restart Neo4j

2.3.1. Verifying installation

To verify your installation, the library version can be printed by entering into the browser in Neo4j Desktop and calling the gds.version() function:

RETURN gds.version()

To list all installed algorithms, run the gds.list() procedure:

CALL gds.list()

2.4. System Requirements

2.4.1. Main Memory

The GDS library runs within a Neo4j instance and is therefore subject to the general Neo4j memory configuration.

Figure 2.1. GDS heap memory usage
memory usage

2.4.1.1. Heap size

The heap space is used for storing graph projections in the graph catalog and algorithm state. When writing algorithm results back to Neo4j, heap space is also used for handling transaction state (see dbms.tx_state.memory_allocation). For purely analytical workloads, a general recommendation is to set the heap space to about 90% of the available main memory. This can be done via dbms.memory.heap.initial_size and dbms.memory.heap.max_size.

To better estimate the heap space required to create in-memory graphs and run algorithms, consider the Section 3.1, “Memory Estimation” feature. The feature estimates the memory consumption of all involved data structures using information about number of nodes and relationships from the Neo4j count store.

2.4.1.2. Page cache

The page cache is used to cache the Neo4j data and will help to avoid costly disk access.

For purely analytical workloads using native projections, it is recommended to decrease dbms.memory.pagecache.size in favor of an increased heap size. However, setting a minimum page cache size is still important while creating in-memory graphs:

  • For native projections, the minimum page cache size for creating the in-memory graph can be roughly estimated by 8KB * 100 * readConcurrency.
  • For Cypher projections, a higher page cache is required depending on the query complexity.

However, if it is required to write algorithm results back to Neo4j, the write performance is highly depended on store fragmentation as well as the number of properties and relationships to write. We recommend starting with a page cache size of roughly 250MB * writeConcurrency and evaluate write performance and adapt accordingly. Ideally, if the memory estimation feature has been used to find a good heap size, the remaining memory can be used for page cache and OS.

Decreasing the page cache size in favor of heap size is not recommended if the Neo4j instance runs both, operational and analytical workloads at the same time. See Neo4j memory configuration for general information about page cache sizing.

2.4.2. CPU

The library uses multiple CPU cores for graph projections, algorithm computation, and results writing. Configuring the workloads to make best use of the available CPU cores in your system is important to achieve maximum performance. The concurrency used for the stages of projection, computation and writing is configured per algorithm execution, see Section 5.1.1, “Configuration options”

The maximum concurrency that can be used is limited depending on the license under which the library is being used:

  • Neo4j Community Edition

    • The maximum concurrency in the library is 4.
  • Neo4j Enterprise Edition

    • The maximum concurrency in the library is 4.
  • Neo4j Graph Data Science Edition