Memory Estimation
This section describes how to estimate memory requirements for the projected graph model used by the Neo4j Graph Data Science library.
The graph algorithms library operates completely on the heap, which means we’ll need to configure our Neo4j Server with a much larger heap size than we would for transactional workloads. The diagram belows shows how memory is used by the projected graph model:
The model contains three types of data:

Node ids  up to 2^{45} ("35 trillion")

Relationships  pairs of node ids. Relationships are stored twice if
orientation: "UNDIRECTED"
is used. 
Weights  stored as doubles (8 bytes per node) in an arraylike data structure next to the relationships
Memory configuration depends on the graph projection that we’re using.
1. Estimating memory requirements for algorithms
In many use cases it will be useful to estimate the required memory of a graph and an algorithm before running it in order to make sure that the workload can run on the available hardware.
To make this process easier every algorithm supports the .estimate
mode, which returns an estimate of the amount of memory required to run graph algorithms.
CALL gds.<ALGO>.<MODE>.estimate(graphNameOrConfig: StringMap, configuration: Map)
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, heapPercentageMin, heapPercentageMax, nodeCount, relationshipCount
Name  Type  Default  Optional  Description 

graphNameOrConfig 
String or Map 
 
no 
The name of the projected graph or the algorithm configuration in case of implicit loading. 
configuration 
Map 
{} 
yes 
If the first parameter is the name of a projected graph, this parameter is the algorithm config, otherwise it needs to be null or an empty map. 
The configuration parameter accepts the same configuration parameters as the estimated algorithm. See the algorithm documentation for more information.
Name  Type  Description 

requiredMemory 
String 
An estimation of the required memory in a human readable format. 
treeView 
String 
A more detailed, human readable representation of the required memory, including estimates of the different components. 
mapView 
String 
A more detailed representation of the required memory, including estimates of the different components. 
bytesMin 
Integer 
The minimum number of bytes required. 
bytesMax 
Integer 
The maximum number of bytes required. 
heapPercentageMin 
Float 
The minimum percentage of the configured maximum heap required. 
heapPercentageMax 
Float 
The maximum percentage of the configured maximum heap required. 
nodeCount 
Integer 
The estimated number of nodes in the graph 
relationshipCount 
Integer 
The estimated number of relationships in the graph 
2. Estimating memory requirements for graphs
The gds.graph.create
procedures also support .estimate
to estimate memory usage for just the graph.
Those procedures don’t accept the graph name as the first argument, as they don’t actually create the graph.
CALL gds.graph.create.estimate(nodeProjection: StringListMap, relationshipProjection: StringListMap, configuration: Map})
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, heapPercentageMin, heapPercentageMax, nodeCount, relationshipCount
The nodeProjection
and relationshipProjection
parameters follow the same syntax as in gds.graph.create
.
Name  Type  Default  Optional  Description 

nodeProjection 
String or List or Map 
 
no 
The node projection to estimate for. 
relationshipProjection 
String or List or Map 
 
no 
The relationship projection to estimate for. 
configuration 
Map 
{} 
yes 
Additional configuration, such as concurrency. 
The result of running gds.graph.create.estimate
has the same form as the algorithm memory estimation results above.
It is also possible to estimate the memory of a fictive graph, by explicitly specifying its node and relationship count. Using this feature, one can estimate the memory consumption of an arbitrarily sized graph.
To achieve this, use the following configuration options:
Name  Type  Default  Optional  Description 

nodeCount 
Integer 
0 
yes 
The number of nodes in a fictive graph. 
relationshipCount 
Integer 
0 
yes 
The number of relationships in a fictive graph. 
When estimating a fictive graph, syntactically valid nodeProjection
and relationshipProjection
must be specified.
However, it is recommended to specify '*'
for both in the fictive graph case as this does not interfere with the specified values above.
The query below is an example of estimating a fictive graph with 100 nodes and 1000 relationships.
CALL gds.graph.create.estimate('*', '*', {
nodeCount: 100,
relationshipCount: 1000,
nodeProperties: 'foo',
relationshipProperties: 'bar'
})
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, nodeCount, relationshipCount
requiredMemory  bytesMin  bytesMax  nodeCount  relationshipCount 

"593 KiB" 
607576 
607576 
100 
1000 
The gds.graph.create.cypher
procedure has to execute both, the nodeQuery
and relationshipQuery
, in order to count the number of nodes and relationships of the graph.
CALL gds.graph.create.cypher.estimate(nodeQuery: String, relationshipQuery: String, configuration: Map})
YIELD requiredMemory, treeView, mapView, bytesMin, bytesMax, heapPercentageMin, heapPercentageMax, nodeCount, relationshipCount
Name  Type  Default  Optional  Description 

nodeQuery 
String 
 
no 
The node query to estimate for. 
relationshipQuery 
String 
 
no 
The relationship query to estimate for. 
configuration 
Map 
{} 
yes 
Additional configuration, such as concurrency. 
3. Automatic estimation and execution blocking
All algorithm procedures in the GDS library, including graph creation, will do an estimation check at the beginning of their execution.
This includes all execution modes, but not the estimate
procedures themselves.
If the estimation check can determine that the current amount of free memory is insufficient to carry through the operation, the operation will be aborted and an error will be reported. The error will contain details of the estimation and the free memory at the time of estimation.
This heap control logic is restrictive in the sense that it only blocks executions that are certain to not fit into memory. It does not guarantee that an execution that passed the heap control will succeed without depleting memory. Thus, it is still useful to first run the estimation mode before running an algorithm or graph creation on a large data set, in order to view all details of the estimation.
The free memory taken into consideration is based on the Java runtime system information. The amount of free memory can be increased by either dropping unused graphs from the catalog, or by increasing the maximum heap size prior to starting the Neo4j instance.
