2.3. Named graphs

This section describes named graphs, which are stored in memory and can be referenced by a user-defined name. When Neo4j is restarted, named graphs are lost and will need to be reloaded.

As it can take some time to load large graphs into the algorithm data structures, you can pre-load graphs and then later refer to them by name when calling graph algorithm procedures. After usage, they can be removed from memory to free resources used.

2.3.1. Loading a named graph

We can load named graphs using either a node-label and relationship-type or a Cypher projection.

The following will load a graph with the name my-graph, for node label Label and relationship type REL_TYPE

CALL algo.graph.load('my-graph','Label','REL_TYPE',{graph:'huge' /*, ... other config */})
YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
      nodeWeight, relationshipWeight, nodeProperty, loadNodes, loadRelationships;

If we want to load a graph based on a Cypher projection, we should specify graph:'cypher' in the config.

The following will load a named graph using Cypher projections for nodes and relationships. 

CALL algo.graph.load('my-graph',
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (a)-->(b) RETURN id(a) AS source, id(b) AS target',
  {graph:'cypher' /*, ... other config */})
YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
      nodeWeight, relationshipWeight, nodeProperty, loadNodes, loadRelationships;

Once we’ve loaded a named graph we can return details about it.

The following will return details about a named graph: 

CALL algo.graph.info('my-graph')
YIELD name, type, direction, exists, removed, nodes;

Besides node and relationship counts, the procedure can compute information about degree distributions, i.e. the number of relationships per node. By default, the distribution values are not computed.

The following will enable computing information about degree distributions: 

CALL algo.graph.info('my-graph', true)
YIELD name, type, direction, exists, removed, nodes, min, max, mean, p50, p75, p90, p95, p99, p999;

In order to set the direction and concurrency for the degree computation, we can provide a parameter map instead: 

CALL algo.graph.info('my-graph', {direction: 'OUTGOING', concurrency: 8 })
YIELD name, type, direction, exists, removed, nodes, min, max, mean, p50, p75, p90, p95, p99, p999;

2.3.2. Using a named graph

We can use our named graph in queries by specifying its name in the graph key of config.

The following will run the PageRank algorithm on the my-graph named graph: 

CALL algo.pageRank(null,null,{graph:'my-graph' /*, ... */})

2.3.3. Loading multiple relationship types

Using the algo.graph.load procedure it is possible to load more than one relationship type, as opposed to exactly one type or all types. In order to specify multiple relationship types, we can provide a relationship type description as known from Cypher MATCH pattern declarations.

The following example will load the graph my-graph with relationships that have the type REL_TYPE1, REL_TYPE2 or REL_TYPE3

CALL algo.graph.load('my-graph', null, 'REL_TYPE1 | REL_TYPE2 | REL_TYPE3', {direction: 'OUTGOING', concurrency: 8 })

Having loaded a graph with multiple relationship types gives us several options for how we can use the relationships.

If we call an algorithm with no additional relationship type information, as described in the section above, the algorithm will use all loaded relationships.

However, it is also possible to only use a subset of the loaded relationships by specifying the requested relationship types in the relationship parameter for the algorithm.

The following example will run PageRank only on relationships of type REL_TYPE1 or REL_TYPE2

CALL algo.pageRank(null, 'REL_TYPE1 | REL_TYPE2', {graph: 'my-graph'})

Using just a subset of the loaded relationships is currently only possible for a node-label and relationship-type projection.

Specifying a subset of relationship types in an algorithm call only works when the graph was explicitly loaded with multiple relationship types. If the graph was loaded with null, no relationship type information is retained in the graph.

Loading a lot of multiple relationship types can have a negative impact on performace, both during load and execution time. It is best to only load as few types as needed.

2.3.4. Deduplication of parallel relationships

Named graphs offer different ways of handling multiple - so called "parallel" - relationships between a given pair of nodes.

2.3.4.1. Node-label and relationship-type projection

By default, the Huge graph assumes that the relationship projection only contains one relationship between a pair of nodes and will simply ignore all other relationships (see skip below). In order to control the deduplication behavior we can pass the duplicateRelationships key in the config to decide what should happen with duplicates.

duplicateRelationships supports the following options:

  • none - keeps all relationships between a given pair of nodes / no deduplication.
  • skip - keeps the first encountered relationship (and associated weight).
  • sum - sums the associated weights of all encountered relationships.
  • min - keeps the minimum weight of all encountered relationships.
  • max - keeps the maximum weight of all encountered relationships.

Note that setting an explict deduplication strategy, other then none or skip will increase the relationship loading time.

The following query loads a graph of roads between locations keeping all the ROAD relationships between two Loc nodes. 

CALL algo.graph.load('allRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipWeight: 'cost',
  duplicateRelationships: 'none'})

The following query loads a graph of roads between locations keeping only those ROAD relationships with the minimal cost. 

CALL algo.graph.load('cheapestRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipWeight: 'cost',
  duplicateRelationships: 'sum'})

2.3.4.2. Cypher projection

A Cypher projected graph will, by default, store all projected relationships without any deduplication (see none below). As for the Huge graph, we can specify a deduplicateRelationships strategy.

The following runs shortest path over a graph based on Cypher projections, picking the ROAD relationship with minimum cost: 

MATCH (start:Loc {name: 'A'}), (end:Loc {name: 'F'})
CALL algo.shortestPath(start, end, 'cost', {
  nodeQuery:'MATCH (n:Loc) RETURN id(n) as id',
  relationshipQuery:'MATCH (n:Loc)-[r:ROAD]->(m:Loc) RETURN id(n) AS source, id(m) AS target, r.cost AS weight',
  {graph: 'cypher', duplicateRelationships: 'min'})
YIELD writeMillis, loadMillis, nodeCount, totalCost
RETURN writeMillis, loadMillis, nodeCount, totalCost

2.3.5. Loading multiple relationship properties

In some scenarios, it is useful to load more than one relationship property. For the load.graph procedure, this can be configured via the relationshipProperties parameter.

The parameter is configured using a map in which each key refers to a user-defined property key. Any algorithm that supports relationship properties can make use of these.

The value under each property key is a configuration, that is applied when loading relationship properties. In the configuration we specify the Neo4j relationship property to load. Optionally, we can define an aggregation function to set the deduplication behavior and a default property value which is used for absent property values (see Section 2.3.4, “Deduplication of parallel relationships”).

For the following example, let’s assume that each ROAD relationship stores two properties: the cost (distance) and the road quality (between 1 and 10).

The following query loads all roads, deduplicates parallel relationships and aggregates them by their distance and also by their quality. 

CALL algo.graph.load('allRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipProperties: {
    minDistance: {
        property: 'cost',
        aggregation: 'MIN',
        defaultValue: 1.0
    },
    maxQuality: {
        property: 'quality',
        aggregation: 'MAX',
        defaultValue: 5.0
    }
  }})

When executed, our allRoads in-memory graph stores two relationship properties: minDistance and maxQuality. We can access the loaded properties by specifying them in an algorithm configuration. Let us use algo.shortestPath again as an example weighted algorithm.

We first compute the shortest path using the minDistance property as weight to compute the path with shortest distance: 

MATCH (start:Loc {name: 'A'}), (end:Loc {name: 'F'})
CALL algo.shortestPath(start, end, 'minDistance', {graph: 'allRoads'})
YIELD writeMillis, loadMillis, nodeCount, totalCost
RETURN writeMillis, loadMillis, nodeCount, totalCost

We use the same graph, but the maxQuality property if we are interested in the path with the best quality: 

MATCH (start:Loc {name: 'A'}), (end:Loc {name: 'F'})
CALL algo.shortestPath(start, end, 'maxQuality', {graph: 'allRoads'})
YIELD writeMillis, loadMillis, nodeCount, totalCost
RETURN writeMillis, loadMillis, nodeCount, totalCost

With the short-hand syntax for specifying property mappings we can skip the aggregation and defaultWeight parameters. If those are omitted, the procedure uses SKIP as default aggregation function and Double.NaN as default property value.

The following query loads the graph and allows us to refer to the cost property via distance

CALL algo.graph.load('allRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipProperties: { distance: 'cost' }
})

Note that in this particular shortest path example, using the default property value is not recommended.

Loading multiple relationship properties is currently only supported for node-label and relationship-type projections.

As with relationship types, loading a lot of multiple relationship properties can have a negative impact on performace, both during load and execution time. It is best to only load as few properties as needed.

2.3.6. List all named graphs

We can get an overview over all loaded named graphs.

The following will return information about all currently loaded graphs: 

CALL algo.graph.list()
YIELD name, nodes, relationships, type, direction;

The following will remove all currently loaded graphs: 

CALL algo.graph.list() YIELD name
CALL algo.graph.remove(name) YIELD removed
RETURN name, removed

2.3.7. Remove named graph

Once we’ve finished using the named graph we can remove them to free up memory.

The following will remove the my-graph named graph: 

CALL algo.graph.remove('my-graph')
YIELD name, type, exists, removed, nodes;