Graph Catalog

This section details the graph catalog operations available to manage named graph projections within the Neo4j Graph Data Science library.

Graph algorithms run on a graph data model which is a projection of the Neo4j property graph data model. A graph projection can be seen as a view over the stored graph, containing only analytically relevant, potentially aggregated, topological and property information. Graph projections are stored entirely in-memory using compressed data structures optimized for topology and property lookup operations.

The graph catalog is a concept within the GDS library that allows managing multiple graph projections by name. Using its name, a created graph can be used many times in the analytical workflow. Named graphs can be created using either a Native projection or a Cypher projection. After usage, named graphs can be removed from the catalog to free up main memory.

Graphs can also be created when running an algorithm without placing them in the catalog. We refer to such graphs as anonymous graphs.

The graph catalog exists as long as the Neo4j instance is running. When Neo4j is restarted, graphs stored in the catalog are lost and need to be re-created.

This chapter explains the available graph catalog operations.

Name Description

gds.graph.create

Creates a graph in the catalog using a Native projection.

gds.graph.create.cypher

Creates a graph in the catalog using a Cypher projection.

gds.graph.list

Prints information about graphs that are currently stored in the catalog.

gds.graph.exists

Checks if a named graph is stored in the catalog.

gds.graph.removeNodeProperties

Removes node properties from a named graph.

gds.graph.deleteRelationships

Deletes relationships of a given relationship type from a named graph.

gds.graph.drop

Drops a named graph from the catalog.

gds.graph.streamNodeProperty

Streams a single node property stored in a named graph.

gds.graph.streamNodeProperties

Streams node properties stored in a named graph.

gds.graph.streamRelationshipProperty

Streams a single relationship property stored in a named graph.

gds.graph.streamRelationshipProperties

Streams relationship properties stored in a named graph.

gds.graph.writeNodeProperties

Writes node properties stored in a named graph to Neo4j.

gds.graph.writeRelationship

Writes relationships stored in a named graph to Neo4j.

gds.graph.export

Exports a named graph into a new offline Neo4j database.

gds.beta.graph.export.csv

Exports a named graph into CSV files.

Creating, using, listing, and dropping named graphs are management operations bound to a Neo4j user. Graphs created by a different Neo4j user are not accessible at any time.

1. Creating graphs in the catalog

A projected graph can be stored in the catalog under a user-defined name. Using that name, the graph can be referred to by any algorithm in the library. This allows multiple algorithms to use the same graph without having to re-create it on each algorithm run.

There are two variants of projecting a graph from the Neo4j database into main memory:

  • Native projection

    • Provides the best performance by reading from the Neo4j store files. Recommended to be used during both the development and the production phase.

  • Cypher projection

    • The more flexible, expressive approach with lesser focus on performance. Recommended to be primarily used during the development phase.

There is also a way to generate a random graph, see Graph Generation documentation for more details.

In this section, we will give brief examples on how to create a graph using either variant. For detailed information about the configuration of each variant, we refer to the dedicated sections.

In the following two examples we show how to create a graph called my-native-graph that contains Person nodes and LIKES relationships.

Create a graph using a native projection:
CALL gds.graph.create(
    'my-native-graph',
    'Person',
    'LIKES'
)
YIELD graphName, nodeCount, relationshipCount, createMillis;

We can also use Cypher to select the nodes and relationships to be projected into the in-memory graph.

Create a graph using a Cypher projection:
CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:Person) RETURN id(n) AS id',
    'MATCH (a:Person)-[:LIKES]->(b:Person) RETURN id(a) AS source, id(b) AS target'
)
YIELD graphName, nodeCount, relationshipCount, createMillis;

After creating the graphs in the catalog, we can refer to them in algorithms by using their name.

Run Page Rank on one of our created graphs:
CALL gds.pageRank.stream('my-native-graph') YIELD nodeId, score;

2. Listing graphs in the catalog

Information about graphs in the catalog can be listed using the gds.graph.list() procedure. The procedure takes an optional parameter graphName:

  • If a graph name is given, only information for that graph will be listed.

  • If no graph name is given, information about all graphs will be listed.

  • If a graph name is given but not found in the catalog, an error will be raised.

List information about graphs in the catalog:
CALL gds.graph.list(
  graphName: String?
) YIELD
  graphName,
  database,
  nodeProjection,
  relationshipProjection,
  nodeQuery,
  relationshipQuery,
  nodeCount,
  relationshipCount,
  schema,
  degreeDistribution,
  density,
  creationTime,
  modificationTime,
  sizeInBytes,
  memoryUsage;
Table 1. Results
Name Type Description

graphName

String

Name of the graph.

database

String

Name of the database in which the graph has been created.

nodeProjection

Map

Node projection used to create the graph. If a Cypher projection was used, this will be a derived node projection.

relationshipProjection

Map

Relationship projection used to create the graph. If a Cypher projection was used, this will be a derived relationship projection.

nodeQuery

String

Node query used to create the graph. If a native projection was used, this will be null.

relationshipQuery

String

Relationship query used to create the graph. If a native projection was used, this will be null.

nodeCount

Integer

Number of nodes in the graph.

relationshipCount

Integer

Number of relationships in the graph.

schema

Map

Node labels, Relationship types and properties contained in the in-memory graph.

degreeDistribution

Map

Histogram of degrees in the graph.

density

Float

Density of the graph.

creationTime

Datetime

Time when the graph was created.

modificationTime

Datetime

Time when the graph was last modified.

sizeInBytes

Integer

Number of bytes used in the Java heap to store the graph.

memoryUsage

String

Human readable description of sizeInBytes.

The information contains basic statistics about the graph, e.g., the node and relationship count. The result field creationTime indicates when the graph was created in memory. The result field modificationTime indicates when the graph was updated by an algorithm running in mutate mode.

The database column refers to the name of the database the corresponding graph has been created on. Referring to a named graph in a procedure is only allowed on the database it has been created on.

The schema consists of information about the nodes and relationships stored in the graph. For each node label, the schema maps the label to its property keys and their corresponding property types. Similarly, the schema maps the relationship types to their property keys and property types. The property type is either Integer, Float, List of Integer or List of Float.

The degreeDistribution field can be fairly time-consuming to compute for larger graphs. Its computation is cached per graph, so subsequent listing for the same graph will be fast. To avoid computing the degree distribution, specify a YIELD clause that omits it. Note that not specifying a YIELD clause is the same as requesting all possible return fields to be returned.

The density is the result of relationshipCount divided by the maximal number of relationships for a simple graph with the given nodeCount.

2.1. Examples

List basic information about all graphs in the catalog:
CALL gds.graph.list()
YIELD graphName, nodeCount, relationshipCount, schema;
List extended information about a specific named graph in the catalog:
CALL gds.graph.list('my-cypher-graph')
YIELD graphName, nodeQuery, relationshipQuery, nodeCount, relationshipCount, schema, creationTime, modificationTime, memoryUsage;
List all information about a specific named graph in the catalog:
CALL gds.graph.list('my-native-graph')
List information about the degree distribution of a specific graph:
CALL gds.graph.list('my-cypher-graph')
YIELD graphName, degreeDistribution;

3. Check if a graph exists in the catalog

We can check if a graph is stored in the catalog by looking up its name.

Check if a graph exists in the catalog:
CALL gds.graph.exists('my-store-graph') YIELD exists;

4. Removing node properties from a named graph

We can remove node properties from a named graph in the catalog. This is useful to free up main memory or to remove accidentally created node properties.

Remove multiple node properties from a named graph:
CALL gds.graph.removeNodeProperties('my-graph', ['pageRank', 'communityId'])

The above example requires all given properties to be present on at least one node projection, and the properties will be removed from all such projections.

The procedure can be configured to remove just the properties for some specific node projections. In the following example, we ran an algorithm on a sub-graph and subsequently remove the newly created property.

Remove node properties of a specific node projection:
CALL gds.graph.create('my-graph', ['A', 'B'], '*')
CALL gds.wcc.mutate('my-graph', {nodeLabels: ['A'], mutateProperty: 'componentId'})
CALL gds.graph.removeNodeProperties('my-graph', ['componentId'], ['A'])

When a list of projections that are not * is specified, as in the example above, a different validation and execution is applied; It is then required that all projections have all of the given properties, and they will be removed from all of the projections.

If any of the given projections is '*', the procedure behaves like in the first example.

5. Deleting relationship types from a named graph

We can delete all relationships of a given type from a named graph in the catalog. This is useful to free up main memory or to remove accidentally created relationship types.

Delete all relationships of type T from a named graph:
CALL gds.graph.deleteRelationships('my-graph', 'T')
YIELD graphName, relationshipType, deletedRelationships, deletedProperties

6. Removing graphs from the catalog

Once we have finished using the named graph we can remove it from the catalog to free up memory.

Remove a graph from the catalog:
CALL gds.graph.drop('my-store-graph') YIELD graphName;

If we want the procedure to fail silently on non-existing graphs, we can set a boolean flag as the second parameter to false. This will yield an empty result for non-existing graphs.

Try removing a graph from the catalog:
CALL gds.graph.drop('my-fictive-graph', false) YIELD graphName;

7. Stream node properties

We can stream node properties stored in a named in-memory graph back to the user. This is useful if we ran multiple algorithms in mutate mode and want to retrieve some or all of the results. This is similar to what the stream execution mode does, but allows more fine-grained control over the operations.

Stream multiple node properties:
CALL gds.graph.streamNodeProperties('my-graph', ['componentId', 'pageRank', 'communityId'])

The above example requires all given properties to be present on at least one node projection, and the properties will be streamed for all such projections.

The procedure can be configured to stream just the properties for some specific node projections. In the following example, we ran an algorithm on a sub-graph and subsequently streamed the newly created property.

Stream node properties of a specific node projection:
CALL gds.graph.create('my-graph', ['A', 'B'], '*')
CALL gds.wcc.mutate('my-graph', {nodeLabels: ['A'], mutateProperty: 'componentId'})
CALL gds.graph.streamNodeProperties('my-graph', ['componentId'], ['A'])

When a list of projections that are not * is specified, as in the example above, a different validation and execution is applied. It is then required that all projections have all of the given properties, and they will be streamed for all of the projections.

If any of the given projections is '*', the procedure behaves like in the first example.

When streaming multiple node properties, the name of each property is included in the result. This adds with some overhead, as each property name must be repeated for each node in the result, but is necessary in order to distinguish properties. For streaming a single node property this is not necessary. gds.graph.streamNodeProperty() streams a single node property from the in-memory graph, and omits the property name. The result has the format nodeId, propertyValue, as is familiar from the streaming mode of many algorithm procedures.

Stream a single node property:
CALL gds.graph.streamNodeProperty('my-graph', 'componentId')

8. Stream relationship properties

We can stream relationship properties stored in a named in-memory graph back to the user. This is useful if we ran multiple algorithms in mutate mode and want to retrieve some or all of the results. This is similar to what the stream execution mode does, but allows more fine-grained control over the operations.

Stream multiple relationship properties:
CALL gds.graph.streamRelationshipProperties('my-graph', ['similarityScore', 'weight'])

The procedure can be configured to stream just the properties for some specific relationship projections. In the following example, we ran an algorithm on a sub-graph and subsequently streamed the newly created property.

Stream relationship properties of a specific relationship projection:
CALL gds.graph.create('my-graph', ['*'], [A', 'B'])
CALL gds.nodeSimiliarity.mutate('my-graph', {relationshipTypes: ['A'], mutateRelationshipType: 'R', mutateProperty: 'similarityScore'})
CALL gds.graph.streamNodeProperties('my-graph', ['similarityScore'], ['R'])

When a list of projections that are not * is specified, as in the example above, a different validation and execution is applied. It is then required that all projections have all of the given properties, and they will be streamed for all of the projections.

If any of the given projections is '*', the procedure behaves like in the first example.

When streaming multiple relationship properties, the name of the relationship type and of each property is included in the result. This adds with some overhead, as each type name and property name must be repeated for each relationship in the result, but is necessary in order to distinguish properties. For streaming a single relationship property, the property name can be left out. gds.graph.streamNodeProperty() streams a single relationship property from the in-memory graph, and omits the property name. The result has the format sourceNodeId, targetNodeId, relationshipType, propertyValue.

Stream a single relationship property:
CALL gds.graph.streamRelationshipProperty('my-graph', 'similarityScore')

9. Write node properties to Neo4j

Similar to streaming properties stored in an in-memory graph it is also possible to write those back to Neo4j. This is similar to what the write execution mode does, but allows more fine-grained control over the operations.

The properties to write are typically the mutateProperty values that were used when running algorithms. Properties that were added to the created graph at creation time will often already be present in the Neo4j database.

9.1. Syntax

Write node properties to Neo4j:
CALL gds.graph.writeNodeProperties(
  graphName: String,
  nodeProperties: List<String>,
  nodeLabels: List<String>,
  configuration: Map
) YIELD
  graphName: String,
  nodeProperties: List<String>,
  writeMillis: Integer,
  propertiesWritten: Integer
Table 2. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

nodeProperties

List<String>

n/a

no

Names of properties to write.

nodeLabels

List<String>

['*']

yes

Names of labels to write properties for.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 3. Configuration parameters
Name Type Default Optional Description

concurrency

Integer

4

yes

The number of concurrent threads used for writing the properties to Neo4j.

writeConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for writing the properties to Neo4j. If both writeConcurrency and concurrency are specified, writeConcurrency will be used.

Table 4. Results
Name Type Description

graphName

String

Name of the graph.

nodeProperties

List<String>

Names of written properties.

writeMillis

Integer

Milliseconds for writing properties to Neo4j.

propertiesWritten

Integer

Number of properties written.

9.2. Examples

To write the properties 'componentId', 'pageRank', 'communityId' for all node projections in the graph 'my-graph' using 8 concurrent threads, use the following query:

Write multiple node properties to Neo4j:
CALL gds.graph.writeNodeProperties(
  'my-graph',
  ['componentId', 'pageRank', 'communityId'],
  ['*'],
  {writeConcurrency: 8}
)

The above example requires all given properties to be present on at least one node projection, and the properties will be written for all such projections.

The procedure can be configured to write just the properties for some specific node projections. In the following example, we run an algorithm on a sub-graph and subsequently write the newly created property to Neo4j.

Write node properties of a specific node projection to Neo4j:
CALL gds.graph.create('my-graph', ['A', 'B'], '*')
CALL gds.wcc.mutate('my-graph', {nodeLabels: ['A'], mutateProperty: 'componentId'})
CALL gds.graph.writeNodeProperties('my-graph', ['componentId'], ['A'])

When a list of projections not including the star projection ('*') is specified, as in the example above, a different validation and execution is applied. In this case, it is required that all projections have all of the given properties, and they will be written to Neo4j for all of the projections.

If any of the given projections is the star projection, the procedure behaves like in the first example.

10. Write relationships to Neo4j

We can write relationships stored in a named in-memory graph back to Neo4j. This can be used to write algorithm results (for example from Node Similarity) or relationships that have been aggregated during graph creation.

The relationships to write are specified by a relationship type. This can either be an element identifier used in a relationship projection during graph construction or the writeRelationshipType used in algorithms that create relationships. Relationships are always written using a single thread.

Write relationships to Neo4j:
CALL gds.graph.writeRelationship('my-graph', 'SIMILAR_TO')

By default, no relationship properties will be written. To write relationship properties, these have to be explicitly specified.

Write relationships and their properties to Neo4j:
CALL gds.graph.writeRelationship('my-graph', 'SIMILAR_TO', 'similarityScore')

11. Create Neo4j databases from named graphs

We can create new Neo4j databases from named in-memory graphs stored in the graph catalog. All nodes, relationships and properties present in an in-memory graph are written to a new Neo4j database. This includes data that has been projected in gds.graph.create and data that has been added by running algorithms in mutate mode. The newly created database will be stored in the Neo4j databases directory using a given database name.

The feature is useful in the following, exemplary scenarios:

  • Avoid heavy write load on the operational system by exporting the data instead of writing back.

  • Create an analytical view of the operational system that can be used as a basis for running algorithms.

  • Produce snapshots of analytical results and persistent them for archiving and inspection.

  • Share analytical results within the organization.

Export a named graph to a new database in the Neo4j databases directory:
CALL gds.graph.export('my-graph', { dbName: 'mydatabase' })

The procedure yields information about the number of nodes, relationships and properties written.

Table 5. Graph export configuration
Name Type Default Optional Description

dbName

String

none

No

Name of the exported Neo4j database.

writeConcurrency

Boolean

4

yes

The number of concurrent threads used for writing the database.

enableDebugLog

Boolean

false

yes

Prints debug information to Neo4j log files.

batchSize

Integer

10000

yes

Number of entities processed by one single thread at a time.

defaultRelationshipType

String

"_ALL_"

yes

Relationship type used for * relationship projections.

The new database can be started using databases management commands.

The database must not exist when using the export procedure, it needs to be created manually using the following commands.

After running the procedure, we can start a new database and query the exported graph:
:use system
CREATE DATABASE mydatabase;
:use mydatabase
MATCH (n) RETURN n;

12. Export a named graph to CSV

We can export named in-memory graphs stored in the graph catalog to a set of CSV files. All nodes, relationships and properties present in an in-memory graph are exported. This includes data that has been projected with gds.graph.create and data that has been added by running algorithms in mutate mode. The location of the exported CSV files can be configured via the configuration parameter gds.export.location in the neo4j.conf. All files will be stored in a subfolder using the specified export name. The export will fail if a folder with the given export name already exists.

The gds.export.location parameter must be configured for this feature.

Export a named graph to a set of CSV files:
CALL gds.beta.graph.export.csv('my-graph', {exportName: 'myExport'})

The procedure yields information about the number of nodes, relationships and properties written.

Table 6. Graph export configuration
Name Type Default Optional Description

exportName

String

none

No

Name of the folder to which the CSV files are exported.

writeConcurrency

Boolean

4

yes

The number of concurrent threads used for writing the database.

defaultRelationshipType

String

__ALL__

yes

Relationship type used for * relationship projections.

12.1. Export format

The format of the exported CSV files is based on the format that is supported by the Neo4j Admin import command.

12.1.1. Nodes

Nodes are exported into files grouped by the nodes labels, i.e., for every label combination that exists in the graph a set of export files is created. The naming schema of the exported files is: nodes_LABELS_INDEX.csv, where:

  • LABELS is the ordered list of labels joined by _.

  • INDEX is a number between 0 and concurrency.

For each label combination one or more data files are created, as each exporter thread exports into a separate file.

Additionally, each label combination produces a single header file, which contains a single line describing the columns in the data files More information about the header files can be found here: CSV header format.

For example a Graph with the node combinations :A, :B and :A:B might create the following files

nodes_A_header.csv
nodes_A_0.csv
nodes_B_header.csv
nodes_B_0.csv
nodes_B_2.csv
nodes_A_B_header.csv
nodes_A_B_0.csv
nodes_A_B_1.csv
nodes_A_B_2.csv

12.1.2. Relationships

The format of the relationship files is similar to those of the nodes. Relationships are exported into files grouped by the relationship type. The naming schema of the exported files is: relationships_TYPE_INDEX.csv, where:

  • TYPE is the relationship type

  • INDEX is a number between 0 and concurrency.

For each relationship type one or more data files are created, as each exporter thread exports into a separate file.

Additionally, each relationship type produces a single header file, which contains a single line describing the columns in the data files.

For example a Graph with the relationship types :KNOWS, :LIVES_IN might create the following files

relationships_KNOWS_header.csv
relationships_KNOWS_0.csv
relationships_LIVES_IN_header.csv
relationships_LIVES_IN_0.csv
relationships_LIVES_IN_2.csv

12.2. Estimation

Using the gds.graph.export.csv.estimate procedure it is possible to estimate the required disk space of the exported CSV files. The estimation uses sampling to generate a more accurate estiamte.

Estimate the required disk space for exporting a named graph to CSV files.:
CALL gds.beta.graph.export.csv.export('my-graph', {exportName: 'myExport'})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory;

The procedure yields information about the required disk space.

Table 7. Graph export estimate configuration
Name Type Default Optional Description

exportName

String

none

No

Name of the folder to which the CSV files are exported.

samplingFactor

Double

0.001

yes

The fraction of nodes and relationships to sample for the estimation.

writeConcurrency

Boolean

4

yes

The number of concurrent threads used for writing the database.

defaultRelationshipType

String

__ALL__

yes

Relationship type used for * relationship projections.