4.1. Graph Catalog

This section details the graph catalog operations available to manage named graph projections within the Neo4j Graph Data Science library.

Graph algorithms run on a graph data model which is a projection of the Neo4j property graph data model. A graph projection can be seen as a view over the stored graph, containing only analytically relevant, potentially aggregated, topological and property information. Graph projections are stored entirely in-memory using compressed data structures optimized for topology and property lookup operations.

The graph catalog is a concept within the GDS library that allows managing multiple graph projections by name. Using its name, a created graph can be used many times in the analytical workflow. Named graphs can be created using either a Native projection or a Cypher projection. After usage, named graphs can be removed from the catalog to free up main memory.

Graphs can also be created when running an algorithm without placing them in the catalog. We refer to such graphs as anonymous graphs.

The graph catalog exists as long as the Neo4j instance is running. When Neo4j is restarted, graphs stored in the catalog are lost and need to be re-created.

This chapter explains the available graph catalog operations.

Name Description

gds.graph.create

Creates a graph in the catalog using a Native projection.

gds.graph.create.cypher

Creates a graph in the catalog using a Cypher projection.

gds.graph.list

Prints information about graphs that are currently stored in the catalog.

gds.graph.exists

Checks if a named graph is stored in the catalog.

gds.graph.removeNodeProperties

Removes node properties from a named graph.

gds.graph.deleteRelationships

Deletes relationships of a given relationship type from a named graph.

gds.graph.drop

Drops a named graph from the catalog.

gds.graph.streamNodeProperty

Streams a single node property stored in a named graph.

gds.graph.streamNodeProperties

Streams node properties stored in a named graph.

gds.graph.streamRelationshipProperty

Streams a single relationship property stored in a named graph.

gds.graph.streamRelationshipProperties

Streams relationship properties stored in a named graph.

gds.graph.writeNodeProperties

Writes node properties stored in a named graph to Neo4j.

gds.graph.writeRelationship

Writes relationships stored in a named graph to Neo4j.

gds.graph.export

Exports a named graph into a new offline Neo4j database.

Creating, using, listing, and dropping named graphs are management operations bound to a Neo4j user. Graphs created by a different Neo4j user are not accessible at any time.

4.1.1. Creating graphs in the catalog

A projected graph can be stored in the catalog under a user-defined name. Using that name, the graph can be referred to by any algorithm in the library. This allows multiple algorithms to use the same graph without having to re-create it on each algorithm run.

There are two variants of projecting a graph from the Neo4j database into main memory:

  • Native projection

    • Provides the best performance by reading from the Neo4j store files. Recommended to be used during both the development and the production phase.
  • Cypher projection

    • The more flexible, expressive approach with lesser focus on performance. Recommended to be primarily used during the development phase.

There is also a way to generate a random graph, see Graph Generation documentation for more details.

In this section, we will give brief examples on how to create a graph using either variant. For detailed information about the configuration of each variant, we refer to the dedicated sections.

In the following two examples we show how to create a graph called my-native-graph that contains Person nodes and LIKES relationships.

Create a graph using a native projection: 

CALL gds.graph.create(
    'my-native-graph',
    'Person',
    'LIKES'
)
YIELD graphName, nodeCount, relationshipCount, createMillis;

We can also use Cypher to select the nodes and relationships to be projected into the in-memory graph.

Create a graph using a Cypher projection: 

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:Person) RETURN id(n) AS id',
    'MATCH (a:Person)-[:LIKES]->(b:Person) RETURN id(a) AS source, id(b) AS target'
)
YIELD graphName, nodeCount, relationshipCount, createMillis;

After creating the graphs in the catalog, we can refer to them in algorithms by using their name.

Run Page Rank on one of our created graphs: 

CALL gds.pageRank.stream('my-native-graph') YIELD nodeId, score;

4.1.2. Listing graphs in the catalog

Once we have created graphs in the catalog, we can list information about either all of them or a single graph using its name.

List information about all graphs in the catalog: 

CALL gds.graph.list()
YIELD graphName, nodeProjection, relationshipProjection, nodeQuery, relationshipQuery,
      nodeCount, relationshipCount, schema, degreeDistribution, creationTime, modificationTime;

List information about a named graph in the catalog: 

CALL gds.graph.list(graphName)
YIELD graphName, nodeProjection, relationshipProjection, nodeQuery, relationshipQuery,
      nodeCount, relationshipCount, schema, degreeDistribution, creationTime, modificationTime, sizeInBytes, memoryUsage;

The nodeProjection and relationshipProjection columns are primarily applicable to Native projection. The nodeQuery and relationshipQuery columns are applicable only to Cypher projection and are null for graphs created with Native projection.

The degreeDistribution is more time-consuming to compute than the other return columns. It is however only computed when included in the YIELD subclause.

The schema consists of information about the nodes and relationships stored in the graph. For each node label, the schema maps the label to its property keys and their corresponding property types. Similarly, the schema maps the relationship types to their property keys and property types. The property type is either Integer or Float.

The creationTime indicates when the graph was created in memory. The modificationTime indicates when the graph was updated by an algorithm running in mutate mode. The sizeInBytes yields the number of bytes used in the Java Heap to store that graph. The memoryUsage is the same information in a human readable format.

List information about the degree distribution of a specific graph: 

CALL gds.graph.list('my-cypher-graph')
YIELD graphName, degreeDistribution;

4.1.3. Check if a graph exists in the catalog

We can check if a graph is stored in the catalog by looking up its name.

Check if a graph exists in the catalog: 

CALL gds.graph.exists('my-store-graph') YIELD exists;

4.1.4. Removing node properties from a named graph

We can remove node properties from a named graph in the catalog. This is useful to free up main memory or to remove accidentally created node properties.

Remove multiple node properties from a named graph: 

CALL gds.graph.removeNodeProperties('my-graph', ['pageRank', 'communityId'])

The above example requires all given properties to be present on at least one node projection, and the properties will be removed from all such projections.

The procedure can be configured to remove just the properties for some specific node projections. In the following example, we ran an algorithm on a sub-graph and subsequently remove the newly created property.

Remove node properties of a specific node projection: 

CALL gds.graph.create('my-graph', ['A', 'B'], '*')
CALL gds.wcc.mutate('my-graph', {nodeLabels: ['A'], mutateProperty: 'componentId'})
CALL gds.graph.removeNodeProperties('my-graph', ['componentId'], ['A'])

When a list of projections that are not * is specified, as in the example above, a different validation and execution is applied; It is then required that all projections have all of the given properties, and they will be removed from all of the projections.

If any of the given projections is '*', the procedure behaves like in the first example.

4.1.5. Deleting relationship types from a named graph

We can delete all relationships of a given type from a named graph in the catalog. This is useful to free up main memory or to remove accidentally created relationship types.

Delete all relationships of type T from a named graph: 

CALL gds.graph.deleteRelationships('my-graph', 'T')
YIELD graphName, relationshipType, deletedRelationships, deletedProperties

4.1.6. Removing graphs from the catalog

Once we have finished using the named graph we can remove it from the catalog to free up memory.

Remove a graph from the catalog: 

CALL gds.graph.drop('my-store-graph') YIELD graphName;

4.1.7. Stream node properties

We can stream node properties stored in a named in-memory graph back to the user. This is useful if we ran multiple algorithms in mutate mode and want to retrieve some or all of the results. This is similar to what the stream execution mode does, but allows more fine-grained control over the operations.

Stream multiple node properties: 

CALL gds.graph.streamNodeProperties('my-graph', ['componentId', 'pageRank', 'communityId'])

The above example requires all given properties to be present on at least one node projection, and the properties will be streamed for all such projections.

The procedure can be configured to stream just the properties for some specific node projections. In the following example, we ran an algorithm on a sub-graph and subsequently streamed the newly created property.

Stream node properties of a specific node projection: 

CALL gds.graph.create('my-graph', ['A', 'B'], '*')
CALL gds.wcc.mutate('my-graph', {nodeLabels: ['A'], mutateProperty: 'componentId'})
CALL gds.graph.streamNodeProperties('my-graph', ['componentId'], ['A'])

When a list of projections that are not * is specified, as in the example above, a different validation and execution is applied. It is then required that all projections have all of the given properties, and they will be streamed for all of the projections.

If any of the given projections is '*', the procedure behaves like in the first example.

When streaming multiple node properties, the name of each property is included in the result. This adds with some overhead, as each property name must be repeated for each node in the result, but is necessary in order to distinguish properties. For streaming a single node property this is not necessary. gds.graph.streamNodeProperty() streams a single node property from the in-memory graph, and omits the property name. The result has the format nodeId, propertyValue, as is familiar from the streaming mode of many algorithm procedures.

Stream a single node property: 

CALL gds.graph.streamNodeProperty('my-graph', 'componentId')

4.1.8. Stream relationship properties

We can stream relationship properties stored in a named in-memory graph back to the user. This is useful if we ran multiple algorithms in mutate mode and want to retrieve some or all of the results. This is similar to what the stream execution mode does, but allows more fine-grained control over the operations.

Stream multiple relationship properties: 

CALL gds.graph.streamRelationshipProperties('my-graph', ['similarityScore', 'weight'])

The procedure can be configured to stream just the properties for some specific relationship projections. In the following example, we ran an algorithm on a sub-graph and subsequently streamed the newly created property.

Stream relationship properties of a specific relationship projection: 

CALL gds.graph.create('my-graph', ['*'], [A', 'B'])
CALL gds.nodeSimiliarity.mutate('my-graph', {relationshipTypes: ['A'], mutateRelationshipType: 'R', mutateProperty: 'similarityScore'})
CALL gds.graph.streamNodeProperties('my-graph', ['similarityScore'], ['R'])

When a list of projections that are not * is specified, as in the example above, a different validation and execution is applied. It is then required that all projections have all of the given properties, and they will be streamed for all of the projections.

If any of the given projections is '*', the procedure behaves like in the first example.

When streaming multiple relationship properties, the name of the relationship type and of each property is included in the result. This adds with some overhead, as each type name and property name must be repeated for each relationship in the result, but is necessary in order to distinguish properties. For streaming a single relationship property, the property name can be left out. gds.graph.streamNodeProperty() streams a single relationship property from the in-memory graph, and omits the property name. The result has the format sourceNodeId, targetNodeId, relationshipType, propertyValue.

Stream a single relationship property: 

CALL gds.graph.streamRelationshipProperty('my-graph', 'similarityScore')

4.1.9. Write node properties to Neo4j

Similar to streaming properties stored in an in-memory graph it is also possible to write those back to Neo4j. This is similar to what the write execution mode does, but allows more fine-grained control over the operations.

The properties to write are typically the writeProperty values that were used when running algorithms. Properties that were added to the created graph at creation time will often already be present in the Neo4j database.

Write multiple node properties to Neo4j: 

CALL gds.graph.writeNodeProperties('my-graph', ['componentId', 'pageRank', 'communityId'])

The above example requires all given properties to be present on at least one node projection, and the properties will be written for all such projections.

The procedure can be configured to write just the properties for some specific node projections. In the following example, we ran an algorithm on a sub-graph and subsequently wrote the newly created property to Neo4j.

Write node properties of a specific node projection to Neo4j: 

CALL gds.graph.create('my-graph', ['A', 'B'], '*')
CALL gds.wcc.mutate('my-graph', {nodeLabels: ['A'], mutateProperty: 'componentId'})
CALL gds.graph.writeNodeProperties('my-graph', ['componentId'], ['A'])

When a list of projections that are not * is specified, as in the example above, a different validation and execution is applied; It is then required that all projections have all of the given properties, and they will be written to Neo4j for all of the projections.

If any of the given projections is '*', the procedure behaves like in the first example.

4.1.10. Write relationships to Neo4j

We can write relationships stored in a named in-memory graph back to Neo4j. This can be used to write algorithm results (for example from Node Similarity) or relationships that have been aggregated during graph creation.

The relationships to write are specified by a relationship type. This can either be an element identifier used in a relationship projection during graph construction or the writeRelationshipType used in algorithms that create relationships.

Write relationships to Neo4j: 

CALL gds.graph.writeRelationship('my-graph', 'SIMILAR_TO')

By default, no relationship properties will be written. To write relationship properties, these have to be explicitly specified.

Write relationships and their properties to Neo4j: 

CALL gds.graph.writeRelationship('my-graph', 'SIMILAR_TO', 'similarityScore')

4.1.11. Create Neo4j databases from named graphs

We can create new Neo4j databases from named in-memory graphs stored in the graph catalog. All nodes, relationships and properties present in an in-memory graph are written to a new Neo4j database. This includes data that has been projected in gds.graph.create and data that has been added by running algorithms in mutate mode. The newly created database will be stored in the Neo4j databases directory using a given database name.

The feature is useful in the following, exemplary scenarios:

  • Avoid heavy write load on the operational system by exporting the data instead of writing back.
  • Create an analytical view of the operational system that can be used as a basis for running algorithms.
  • Produce snapshots of analytical results and persistent them for archiving and inspection.
  • Share analytical results within the organization.

Export a named graph to a new database in the Neo4j databases directory: 

CALL gds.graph.export('my-graph', { dbName: 'mydatabase' })

The procedure yields information about the number of nodes, relationships and properties written.

Table 4.1. Graph export configuration
Name Type Default Optional Description

dbName

String

none

No

Name of the exported Neo4j database.

writeConcurrency

Boolean

4

yes

The number of concurrent threads used for writing the database.

enableDebugLog

Boolean

false

yes

Prints debug information to Neo4j log files.

batchSize

Integer

10000

yes

Number of entities processed by one single thread at a time.

defaultRelationshipType

String

"_ALL_"

yes

Relationship type used for * relationship projections.

The new database can be started using databases management commands.

The database must not exist when using the export procedure, it needs to be created manually using the following commands.

After running the procedure, we can start a new database and query the exported graph: 

:use system
CREATE DATABASE mydatabase;
:use mydatabase
MATCH (n) RETURN n;