Monitoring system

GDS supports multiple users concurrently working on the same system. Typically, GDS procedures are resource heavy in the sense that they may use a lot of memory and/or many CPU cores to do their computation. To know whether it is a reasonable time for a user to run a GDS procedure it is useful to know the current capacity of the system hosting Neo4j and GDS, as well as the current GDS workload on the system. Graphs and models are not shared between non-admin users by default, however GDS users on the same system will share its capacity.

System monitor procedure

To be able to get an overview of the system’s current capacity and its analytics workload one can use the procedure gds.systemMonitor. It will give you information on the capacity of the DBMS’s JVM instance in terms of memory and CPU cores, and an overview of the resources consumed by the GDS procedures currently being run on the system.

Syntax

Monitor the system capacity and analytics workload:
CALL gds.systemMonitor()
YIELD
  freeHeap,
  totalHeap,
  maxHeap,
  jvmAvailableCpuCores,
  availableCpuCoresNotRequested,
  jvmHeapStatus,
  ongoingGdsProcedures
Table 1. Results
Name Type Description

freeHeap

Integer

The amount of currently free memory in bytes in the Java Virtual Machine hosting the Neo4j instance.

totalHeap

Integer

The total amount of memory in bytes in the Java virtual machine hosting the Neo4j instance. This value may vary over time, depending on the host environment.

maxHeap

Integer

The maximum amount of memory in bytes that the Java virtual machine hosting the Neo4j instance will attempt to use.

jvmAvailableCpuCores

Integer

The number of logical CPU cores currently available to the Java virtual machine. This value may change vary over the lifetime of the DBMS.

availableCpuCoresNotRequested

Integer

The number of logical CPU cores currently available to the Java virtual machine that are not requested for use by currently running GDS procedures. Note that this number may be negative in case there are fewer available cores to the JVM than there are cores being requested by ongoing GDS procedures.

jvmHeapStatus

Map

The above-mentioned heap metrics in human-readable form.

ongoingGdsProcedures

List of Map

A list of maps containing resource usage and progress information for all GDS procedures (of all users) currently running on the Neo4j instance. Each map contains the name of the procedure, how far it has progressed, its estimated memory usage as well as how many CPU cores it will try to use at most.

freeHeap is influenced by ongoing GDS procedures, graphs stored in the Graph catalog and the underlying Neo4j DBMS. Stored graphs can take up a significant amount of heap memory. To inspect the graphs in the graph catalog you can use the Graph list procedure.

Example

First let us assume that we just started gds.node2vec.stream procedure with some arbitrary parameters.

We can have a look at the status of the JVM heap.

Monitor JVM heap status:
CALL gds.systemMonitor()
YIELD
  freeHeap,
  totalHeap,
  maxHeap
Table 2. Results
freeHeap totalHeap maxHeap

1234567

2345678

3456789

We can see that there currently is around 1.23 MB free heap memory in the JVM instance running our Neo4j DBMS. This may increase independently of any procedures finishing their execution as totalHeap is currently smaller than maxHeap. We can also inspect CPU core usage as well as the status of currently running GDS procedures on the system.

Monitor CPU core usage and ongoing GDS procedures:
CALL gds.systemMonitor()
YIELD
  availableCpuCoresNotRequested,
  jvmAvailableCpuCores,
  ongoingGdsProcedures
Table 3. Results
jvmAvailableCpuCores availableCpuCoresNotRequested ongoingGdsProcedures

100

84

[{ username: "bob", jobId: "42", procedure: "Node2Vec", progress: "33.33%", estimatedMemoryRange: "[123 kB …​ 234 kB]", requestedNumberOfCpuCores: "16" }]

Here we can note that there is only one GDS procedure currently running, namely the Node2Vec procedure we just started. It has finished around 33.33% of its execution already. We also see that it may use up to an estimated 234 kB of memory. Note that it may not currently be using that much memory and so it may require more memory later in its execution, thus possible lowering our current freeHeap. Apparently it wants to use up to 16 CPU cores, leaving us with a total of 84 currently available cores in the system not requested by any GDS procedures.

Memory reporting procedures

To facilitate memory management, the Neo4j GDS Library offers two memory reporting procedures, alongside the aforementioned monitoring capabilities.

Detailed memory listing

Through the gds.memory.list procedure, a user can obtain a combined list of their projected graphs and running tasks alongside their memory footprint.

For graphs, the size of the in-memory graph is returned, whereas for algorithms the memory estimation i.e., the maximum memory needs that this task is expected to require over the course of its execution.

This procedure can help a user realize which of his operations might be memory demanding and plan accordingly (e.g., drop a graph, or termination a transaction).

A user can access only his own graphs or tasks, but an admin can obtain information for all users.

Syntax

List the graphs and tasks of a user along with their memory footprint.
CALL gds.memory.list()
YIELD
  user: String,
  name: String,
  entity: String
  memoryInBytes: Integer
Table 4. Results
Name Type Description

user

String

The username

name

String

The name of the graph or running task.

entity

String

If the reporting entity is a task, this corresponds to its job id. Otherwise, it is set to "graph".

memoryInBytes

Integer

The occupying memory for a graph or the estimation of a task.

Example

List running tasks and graphs for Bob.
CALL gds.memory.list()
YIELD
  user, name, entity, memoryInBytes
Table 5. Results
user name entity memoryInBytes

"Bob"

"my-graph"

"graph"

20

"Bob"

"Node2Vec"

111-222-333

234

As we can see, Bob has projected a graph named 'my-graph' occupying 200 bytes memory, and is currently running Node2Vec which has a memory footprint of 234 bytes.

Memory Summary

The gds.memory.summary procedure provides an overview of the total graph memory, and the total memory requirements of running tasks for a given users.

A user can access only his own graphs or tasks, but an admin can obtain information for all users.

Syntax

List the graphs and tasks of a user along with their memory footprint.
CALL gds.memory.summary()
YIELD
  user: String,
  totalGraphsMemory : Intger,
  totalTasksMemory : Integer
Table 6. Results
Name Type Description

user

String

The username.

totalGraphsMemory

Integer

The total requirement for that user’s graphs.

totalTasksMemory

Integer

The total requirements of that user’s tasks.

Example

List running tasks and graphs for Bob.
CALL gds.memory.summary()
YIELD
  user, totalGraphsMemory, totalTasksMemory
Table 7. Results
user totalGraphsMemory totalTasksMemory

"Bob"

20

234