2.3. Named graphs

This section describes named graphs, which are stored in memory and can be referenced by a user-defined name. When Neo4j is restarted, named graphs are lost and will need to be reloaded.

As it can take some time to load large graphs into the algorithm data structures, you can pre-load graphs and then later refer to them by name when calling graph algorithm procedures. After usage, they can be removed from memory to free resources used.

2.3.1. Visibility of named graphs

Loading, using, listing, and removing named graphs are user-related management operations. Graphs created by a different Neo4j user are not accessible at any time.

2.3.2. Loading a named graph

We can load named graphs using either a node-label and relationship-type or a Cypher projection.

The following will load a graph with the name 'my-graph', for node label Label and relationship type REL_TYPE

CALL algo.graph.load('my-graph', 'Label', 'REL_TYPE', {graph: 'huge' /*, ... other config */})
YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
      nodeProperties, relationshipProperties, relationshipWeight, loadNodes, loadRelationships;

If we want to load a graph based on a Cypher projection, we should specify graph:'cypher' in the config.

The following will load a named graph using Cypher projections for nodes and relationships. 

CALL algo.graph.load('my-graph',
  'MATCH (n) RETURN id(n) AS id',
  'MATCH (a)-->(b) RETURN id(a) AS source, id(b) AS target',
  {graph:'cypher' /*, ... other config */})
YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
      nodeProperties, relationshipProperties, relationshipWeight, loadNodes, loadRelationships;

Once we’ve loaded a named graph we can return details about it.

The following will return details about a named graph: 

CALL algo.graph.info('my-graph')
YIELD name, type, direction, exists, removed, nodes;

Besides node and relationship counts, the procedure can compute information about degree distributions, i.e. the number of relationships per node. By default, the distribution values are not computed.

The following will enable computing information about degree distributions: 

CALL algo.graph.info('my-graph', true)
YIELD name, type, direction, exists, removed, nodes, min, max, mean, p50, p75, p90, p95, p99, p999;

In order to set the direction and concurrency for the degree computation, we can provide a parameter map instead: 

CALL algo.graph.info('my-graph', {direction: 'OUTGOING', concurrency: 8 })
YIELD name, type, direction, exists, removed, nodes, min, max, mean, p50, p75, p90, p95, p99, p999;

2.3.3. Using a named graph

We can use our named graph in queries by specifying its name in the graph key of config.

The following will run the PageRank algorithm on the my-graph named graph: 

CALL algo.pageRank(null, null, {graph: 'my-graph' /*, ... */})

2.3.4. Loading multiple relationship types and node labels

Using the algo.graph.load procedure it is possible to specify multiple relationship types and node labels. The loaded graph will retain the relationship type information. A graph loaded with multiple relationship types supports filtering subgraphs based on these types. Node label information is not retained in the loaded graph.

Graphs loaded with an empty relationship projection, or a Cypher relationship projection query, do not retain information about relationship types.

The following example will load the graph my-graph with relationships that have the type REL_TYPE1, REL_TYPE2 or REL_TYPE3

CALL algo.graph.load('my-graph', null, 'REL_TYPE1 | REL_TYPE2 | REL_TYPE3', {direction: 'OUTGOING', concurrency: 8 })

Having loaded the graph with multiple relationship types we can run an algorithm on a filtered subgraph based on these types. To run an algorithm over a subset of the graph we specify one or more of the loaded relationship types in the relationship parameter for the algorithm. If the relationship parameter is empty, the whole graph is used.

The following example will run PageRank only on relationships of type REL_TYPE1 or REL_TYPE2

CALL algo.pageRank(null, 'REL_TYPE1 | REL_TYPE2', {graph: 'my-graph'})

The same syntax used to load multiple relationship types can also be used to load multiple labels.

The following example will load a graph my-graph with nodes that have either the Person or Instrument label: 

CALL algo.graph.load('my-graph', 'Person | Instrument', null, {direction: 'OUTGOING', concurrency: 8 })

Unlike multiple relationship types, the node label information is not retained in the loaded graph.

2.3.5. Deduplication of parallel relationships

Named graphs offer different ways of handling multiple - so called "parallel" - relationships between a given pair of nodes.

2.3.5.1. Node-label and relationship-type projection

By default, the Huge graph assumes that the relationship projection only contains one relationship between a pair of nodes and will simply ignore all other relationships (see skip below). In order to control the deduplication behavior we can pass the duplicateRelationships key in the config to decide what should happen with duplicates.

duplicateRelationships supports the following options:

  • none - keeps all relationships between a given pair of nodes / no deduplication.
  • skip - keeps the first encountered relationship (and associated weight).
  • sum - sums the associated weights of all encountered relationships.
  • min - keeps the minimum weight of all encountered relationships.
  • max - keeps the maximum weight of all encountered relationships.

Note that setting an explict deduplication strategy, other then none or skip will increase the relationship loading time.

The following query loads a graph of roads between locations keeping all the ROAD relationships between two Loc nodes. 

CALL algo.graph.load('allRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipWeight: 'cost',
  duplicateRelationships: 'none'})

The following query loads a graph of roads between locations keeping only those ROAD relationships with the minimal cost. 

CALL algo.graph.load('cheapestRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipWeight: 'cost',
  duplicateRelationships: 'sum'})

2.3.5.2. Cypher projection

A Cypher projected graph will, by default, store all projected relationships without any deduplication (see none below). As for the Huge graph, we can specify a duplicateRelationships strategy.

The following runs shortest path over a graph based on Cypher projections, picking the ROAD relationship with minimum cost: 

MATCH (start:Loc {name: 'A'}), (end:Loc {name: 'F'})
CALL algo.shortestPath(start, end, 'cost', {
  nodeQuery:'MATCH (n:Loc) RETURN id(n) as id',
  relationshipQuery:'MATCH (n:Loc)-[r:ROAD]->(m:Loc) RETURN id(n) AS source, id(m) AS target, r.cost AS weight',
  {graph: 'cypher', duplicateRelationships: 'min'})
YIELD writeMillis, loadMillis, nodeCount, totalCost
RETURN writeMillis, loadMillis, nodeCount, totalCost

2.3.6. Loading multiple node properties

It is often useful to load an in-memory graph with more than one node property. A typical scenario is running different weighted algorithms on the same graph, but with different node properties as weight.

For the load.graph procedure, loading multiple node properties can be configured via the nodeProperties parameter. The parameter is configured using a map in which each key refers to a user-defined property key. Any algorithm that supports node properties, for example for node weights or seed values, can refer to these user-defined property keys.

The value under each property key is a configuration, that is applied when loading node properties. In the configuration we specify the Neo4j node property to load.

For the following example, let’s assume that each City node stores two properties: the population of the city and an optional stateId that identifies the state in which the city is located.

The following query loads all cities, including the two properties, since not all cities have a stateId, we set the defaultValue to 0

CALL algo.graph.load('cities', 'City', '', {
  graph: 'huge',
  nodeProperties: {
    population: {
        property: 'population'
    },
    seedValue: {
        property: 'stateId',
        defaultValue: 0
    }
  }
})

We can refer to the loaded properties in each algorithm that supports reading node properties. For a path search algorithm, one could use the population as node weight whereas a clustering algorithm could use the stateId as seed value.

We can also use the Cypher projection to load multiple node properties. Here, the specified Neo4j node property must appear in the RETURN clause of the node query. If a property is not present on a node in Neo4j, the given default value is used instead.

The following query also loads all cities including their population and stateId properties. 

CALL algo.graph.load('cities',
  'MATCH (c:City) RETURN id(c) AS id, c.population AS population, c.stateId AS stateId',
  'MATCH (a:City)-->(b:City) RETURN id(a) AS sourceId, id(b) AS targetId', {
    graph: 'cypher',
    nodeProperties: {
      population: {
          property: 'population'
      },
      seedValue: {
          property: 'stateId',
          defaultValue: 0
      }
  }
})

If we just want to refer to the Neo4j node property key, we can use the following shorthand syntax: 

CALL algo.graph.load('cities', 'City', '', {
  graph: 'huge',
  nodeProperties: {
    population: 'population',
    seedValue: 'stateId'
  }
})

We can also use the nodeProperties parameter to load a single node property: 

CALL algo.graph.load('cities', 'City', '', {
  graph: 'huge',
  nodeProperties: 'population'
})

2.3.7. Loading multiple relationship properties

Similar to node properties, the load.graph procedure also supports loading multiple relationship properties. Those can be configured via the relationshipProperties parameter.

As for nodes, the parameter is configured using a map in which each key refers to a user-defined property key. In addition to the Neo4j relationship property and an optional default value, we can define an aggregation function to set the deduplication behavior and a default property value which is used for absent property values (see Section 2.3.5, “Deduplication of parallel relationships”).

For the following example, let’s assume that each ROAD relationship stores two properties: the cost (distance) and the road quality (between 1 and 10).

The following query loads all roads, deduplicates parallel relationships and aggregates them by their distance and also by their quality. 

CALL algo.graph.load('allRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipProperties: {
    minDistance: {
        property: 'cost',
        aggregation: 'MIN',
        defaultValue: 1.0
    },
    maxQuality: {
        property: 'quality',
        aggregation: 'MAX',
        defaultValue: 5.0
    }
  }
})

When executed, our allRoads in-memory graph stores two relationship properties: minDistance and maxQuality. We can access the loaded properties by specifying them in an algorithm configuration. Let us use algo.shortestPath again as an example weighted algorithm.

We first compute the shortest path using the minDistance property as weight to compute the path with shortest distance: 

MATCH (start:Loc {name: 'A'}), (end:Loc {name: 'F'})
CALL algo.shortestPath(start, end, 'minDistance', {graph: 'allRoads'})
YIELD writeMillis, loadMillis, nodeCount, totalCost
RETURN writeMillis, loadMillis, nodeCount, totalCost

We use the same graph, but the maxQuality property if we are interested in the path with the best quality: 

MATCH (start:Loc {name: 'A'}), (end:Loc {name: 'F'})
CALL algo.shortestPath(start, end, 'maxQuality', {graph: 'allRoads'})
YIELD writeMillis, loadMillis, nodeCount, totalCost
RETURN writeMillis, loadMillis, nodeCount, totalCost

With the short-hand syntax for specifying property mappings we can skip the aggregation and defaultWeight parameters. If those are omitted, the procedure uses SKIP as default aggregation function and Double.NaN as default property value.

The following query loads the graph and allows us to refer to the cost property via distance

CALL algo.graph.load('allRoads', 'Loc', 'ROAD', {
  graph: 'huge',
  relationshipProperties: { distance: 'cost' }
})

Note that in this particular shortest path example, using the default property value is not recommended.

Loading multiple relationship properties is currently only supported for node-label and relationship-type projections.

As with relationship types, loading a lot of multiple relationship properties can have a negative impact on performace, both during load and execution time. It is best to only load as few properties as needed.

2.3.8. List all named graphs

We can get an overview over all loaded named graphs.

The following will return information about all currently loaded graphs: 

CALL algo.graph.list()
YIELD name, nodes, relationships, type, direction;

The following will remove all currently loaded graphs: 

CALL algo.graph.list() YIELD name
CALL algo.graph.remove(name) YIELD removed
RETURN name, removed

2.3.9. Remove named graph

Once we’ve finished using the named graph we can remove them to free up memory.

The following will remove the my-graph named graph: 

CALL algo.graph.remove('my-graph')
YIELD name, type, exists, removed, nodes;