3.1. Memory Estimation

This section describes how to estimate memory requirements for the projected graph model used by the Neo4j Graph Data Science library.

The graph algorithms library operates completely on the heap, which means we’ll need to configure our Neo4j Server with a much larger heap size than we would for transactional workloads. The diagram belows shows how memory is used by the projected graph model:

graph model memory

The model contains three types of data:

Memory configuration depends on the graph projection that we’re using.

This section includes:

3.1.1. Estimating memory requirements for algorithms

In many use cases it will be useful to estimate the required memory of a graph and an algorithm before running it in order to make sure that the workload can run on the available hardware. To make this process easier every algorithm supports the .estimate mode, which returns an estimate of the amount of memory required to run graph algorithms.

Syntax. 

CALL gds.<ALGO>.<MODE>.estimate(graphNameOrConfig: String|Map, configuration: Map)
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, heapPercentageMin, heapPercentageMax, nodeCount, relationshipCount

Table 3.1. Parameters
Name Type Default Optional Description

graphNameOrConfig

String or Map

-

no

The name of the projected graph or the algorithm configuration in case of implicit loading.

configuration

Map

{}

yes

If the first parameter is the name of a projected graph, this parameter is the algorithm config, otherwise it needs to be null or an empty map.

The configuration parameter accepts the same configuration parameters as the estimated algorithm. See the algorithm documentation for more information.

Table 3.2. Results
Name Type Description

requiredMemory

String

An estimation of the required memory in a human readable format.

treeView

String

A more detailed, human readable representation of the required memory, including estimates of the different components.

mapView

String

A more detailed representation of the required memory, including estimates of the different components.

bytesMin

Integer

The minimum number of bytes required.

bytesMax

Integer

The maximum number of bytes required.

heapPercentageMin

Float

The minimum percentage of the configured maximum heap required.

heapPercentageMax

Float

The maximum percentage of the configured maximum heap required.

nodeCount

Integer

The estimated number of nodes in the graph

relationshipCount

Integer

The estimated number of relationships in the graph

3.1.2. Estimating memory requirements for graphs

The gds.graph.create procedures also support .estimate to estimate memory usage for just the graph. Those procedures don’t accept the graph name as the first argument, as they don’t actually create the graph.

Syntax. 

CALL gds.graph.create.estimate(nodeProjection: String|List|Map, relationshipProjection: String|List|Map, configuration: Map})
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, heapPercentageMin, heapPercentageMax, nodeCount, relationshipCount

The nodeProjection and relationshipProjection parameters follow the same syntax as in gds.graph.create.

Table 3.3. Parameters
Name Type Default Optional Description

nodeProjection

String or List or Map

-

no

The node projection to estimate for.

relationshipProjection

String or List or Map

-

no

The relationship projection to estimate for.

configuration

Map

{}

yes

Additional configuration, such as concurrency.

The result of running gds.graph.create.estimate has the same form as the algorithm memory estimation results above.

It is also possible to estimate the memory of a fictive graph, by explicitly specifying its node and relationship count. Using this feature, one can estimate the memory consumption of an arbitrarily sized graph.

To achieve this, use the following configuration options:

Table 3.4. Configuration
Name Type Default Optional Description

nodeCount

Integer

0

yes

The number of nodes in a fictive graph.

relationshipCount

Integer

0

yes

The number of relationships in a fictive graph.

When estimating a fictive graph, syntactically valid nodeProjection and relationshipProjection must be specified. However, it is recommended to specify '*' for both in the fictive graph case as this does not interfere with the specified values above.

The query below is an example of estimating a fictive graph with 100 nodes and 1000 relationships.

Example. 

CALL gds.graph.create.estimate('*', '*', {
  nodeCount: 100,
  relationshipCount: 1000,
  nodeProperties: 'foo',
  relationshipProperties: 'bar'
})
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, nodeCount, relationshipCount

Table 3.5. Results
requiredMemory bytesMin bytesMax nodeCount relationshipCount

"[561 KiB …​ 564 KiB]"

574768

577952

100

1000

The gds.graph.create.cypher procedure has to execute both, the nodeQuery and relationshipQuery, in order to count the number of nodes and relationships of the graph.

Syntax. 

CALL gds.graph.create.cypher.estimate(nodeQuery: String, relationshipQuery: String, configuration: Map})
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, heapPercentageMin, heapPercentageMax, nodeCount, relationshipCount

Table 3.6. Parameters
Name Type Default Optional Description

nodeQuery

String

-

no

The node query to estimate for.

relationshipQuery

String

-

no

The relationship query to estimate for.

configuration

Map

{}

yes

Additional configuration, such as concurrency.

3.1.3. Automatic estimation and execution blocking

All algorithm procedures in the GDS library, including graph creation, will do an estimation check at the beginning of their execution. This includes all execution modes, but not the estimate procedures themselves.

If the estimation check can determine that the current amount of free memory is insufficient to carry through the operation, the operation will be aborted and an error will be reported. The error will contain details of the estimation and the free memory at the time of estimation.

This heap control logic is restrictive in the sense that it only blocks executions that are certain to not fit into memory. It does not guarantee that an execution that passed the heap control will succeed without depleting memory. Thus, it is still useful to first run the estimation mode before running an algorithm or graph creation on a large data set, in order to view all details of the estimation.

The free memory taken into consideration is based on the Java runtime system information. The amount of free memory can be increased by either dropping unused graphs from the catalog, or by increasing the maximum heap size prior to starting the Neo4j instance.