Chapter 1. Introduction

This chapter provides an introduction to the available graph algorithms, and instructions for installation and use.

This is documentation for the Graph Algorithms Library, which has been deprecated by the Graph Data Science Library (GDS).

This library provides efficiently implemented, parallel versions of common graph algorithms for Neo4j 3.x, exposed as Cypher procedures.

1.1. Algorithms

Graph algorithms are used to compute metrics for graphs, nodes, or relationships.

They can provide insights on relevant entities in the graph (centralities, ranking), or inherent structures like communities (community-detection, graph-partitioning, clustering).

Many graph algorithms are iterative approaches that frequently traverse the graph for the computation using random walks, breadth-first or depth-first searches, or pattern matching.

Due to the exponential growth of possible paths with increasing distance, many of the approaches also have high algorithmic complexity.

Fortunately, optimized algorithms exist that utilize certain structures of the graph, memoize already explored parts, and parallelize operations. Whenever possible, we’ve applied these optimizations.

The Neo4j Graph Algorithms library contains the following algorithms, grouped by category:

Centrality algorithms: Determines the importance of distinct nodes in a network.

Community detection algorithms: Evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart.

1.2. Installation

1.2.1. Neo4j Desktop

If we are using the Neo4j Desktop, the library can be installed from the 'Plugins' tab of a database.

neo4j desktop

The installer will download a copy of the graph algorithms library and place it in the 'plugins' directory of the database. It will also add the following entry to the settings file:*

1.2.2. Neo4j Server

If we are using a standalone Neo4j Server, the library will need to be installed and configured manually.

  1. Download neo4j-graph-algorithms-[version]-standalone.jar from the Neo4j Download Center and copy it into the $NEO4J_HOME/plugins directory. We can work out which release to download by referring to the versions file.
  2. Add the following to your $NEO4J_HOME/conf/neo4j.conf file:*

    We need to give the library unrestricted access because the algorithms use the lower level Kernel API to read from, and to write to Neo4j.

  3. Restart Neo4j

1.2.3. Verifying installation

Once we’ve installed the library, to see a list of all the algorithms, run the following query:

CALL algo.list()

1.3. Usage

These algorithms are exposed as Neo4j procedures. They can be called directly from Cypher in your Neo4j Browser, from cypher-shell, or from your client code.

For most algorithms there are two procedures:

  • algo.<name> - this procedure writes results back to the graph as node-properties, and reports statistics.
  • algo.<name>.stream - this procedure returns a stream of data. For example, node-ids and computed values.

    For large graphs, the streaming procedure might return millions, or even billions of results. In this case it may be more convenient to store the results of the algorithm, and then use them with later queries.

The execution of any algorithm can be canceled by terminating the cypher transaction that is executing the procedure call. For more on how transactions are used, see Transaction Handling.

1.4. System Requirements

1.4.1. Memory

The library operates completely on heap memory, which means we’ll need to configure our Neo4j Server with a much larger heap size than we would for transactional workloads. The diagram below shows how the library uses heap memory.

memory usage

See Section 2.4, “Memory Requirements” for more details on the memory used by the projected graph model.

1.4.2. CPU

The library uses multiple CPUs for data loading, processing, and writing back results.

Up to 4 CPUs will be used when running the library on Neo4j Community Edition. All CPUs will be used when running on Neo4j Enterprise Edition.

Most of the algorithms in the library run in parallel, by splitting up the nodes and assigning them to different threads that run part of the algorithm for that node.