Cypher projection

A Cypher projection creates an in-memory graph from the context of a Cypher query. With a Cypher projection, you can read data from one or more Neo4j databases, load local or remote files, or create data on the fly.

A Cypher projection has two main parts:

One or more clauses to construct a set of nodes or source-target node pairs.
A call to the gds.graph.project function.

See the Examples section for some common projection patterns.

Cypher projections are more flexible and expressive than native projections.

Considerations

Lifecycle

Projected graphs reside in memory (in the graph catalog) until any of the following happens:

The graph is dropped with the gds.graph.drop procedure.
The Neo4j database from which the graph was projected is stopped or dropped.
The Neo4j DBMS is stopped.

Node property support

Cypher projections can only project a limited set of node property types from a Cypher query. The Node Properties page details which node property types are supported. Other types of node properties have to be transformed or encoded into one of the supported types in order to be projected using a Cypher projection.

Selection of node properties and labels

If a node occurs multiple times, the node properties and labels of the first occurrence will be used for the projection. This is important when a node can be a source node as well as a target node and their configuration differs. Relevant configuration options are sourceNodeProperties, targetNodeProperties, sourceNodeLabels and targetNodeLabels.

Parallel Cypher Runtime

Cypher projection is compatible with the parallel runtime, which can be used to speed up the execution of the projection. The achieved speedup depends on how well the query can be parallelized. Note that the parallel runtime is only available on Neo4j Enterprise Edition and since version 5.13.

Syntax

A Cypher projection is an aggregation function over the relationships that are being projected; as such, it returns an object containing information on the projected graph.

The projection function takes two mandatory arguments, graphName and sourceNode. The third parameter is targetNode and is usually provided. The parameter is optional and can be null to project an unconnected node. The next and fourth optional dataConfig parameter can be used to project node properties and labels as well as relationship properties and type. The last and fifth optional configuration parameter can be used for general configuration of the projection such as readConcurrency.

RETURN gds.graph.project(
    graphName: String,
    sourceNode: Node or Integer,
    targetNode: Node or Integer,
    dataConfig: Map,
    configuration: Map
) YIELD
    graphName: String,
    nodeCount: Integer,
    relationshipCount: Integer,
    projectMillis: Integer,
    query: String,
    configuration: Map

Table 1. Parameters
Name	Optional	Description
graphName	no	The name under which the graph is stored in the catalog.
sourceNode	no	The source node of the relationship. Must not be null.
targetNode	yes	The target node of the relationship. The targetNode can be null (for example due to an `OPTIONAL MATCH`), in which case the source node is projected as an unconnected node.
dataConfig	yes	Properties and labels configuration for the source and target nodes as well as properties and type configuration for the relationship.
configuration	yes	Additional parameters to configure the projection.

Table 2. Data configuration
Name	Type	Default	Description
sourceNodeProperties	Map	{}	The properties of the source node.
targetNodeProperties	Map	{}	The properties of the target node.
sourceNodeLabels	List of String or String	[]	The label(s) of the source node.
targetNodeLabels	List of String or String	[]	The label(s) of the target node.
relationshipProperties	Map	{}	The properties of the relationship.
relationshipType	String	'*'	The type of the relationship.

Table 3. Configuration
Name	Type	Default	Optional	Description
readConcurrency	Integer	4 ^[1]	yes	The number of concurrent threads used for creating the graph.
undirectedRelationshipTypes	List of String	[]	yes	Declare a number of relationship types as undirected. Relationships with the specified types will be imported as undirected. `*` can be used to declare all relationship types as undirected.
inverseIndexedRelationshipTypes	List of String	[]	yes	Declare a number of relationship types which will also be indexed in inverse direction. `*` can be used to declare all relationship types as inverse indexed.
memory Aura Graph Analytics Serverless	String	-	no ^[2]	Declare the memory used for the GDS Session created for the projected graph.
ttl Aura Graph Analytics Serverless	Duration	PT1H	yes	Declare the time to live before the GDS Session created for the projected graph expires due to inactivity.
batchSize Aura Graph Analytics Serverless	Integer	10000	yes	Size of batches transmitted from the DBMS to the session. Lower the value to reduce the memory usage on the DBMS side. Increase the value to send fewer batches to the GDS Session.
1. In a GDS Session the default is the number of available processors 2. Only required for Aura Graph Analytics Serverless

Table 4. Results
Name	Type	Description
graphName	String	The name under which the graph is stored in the catalog.
nodeCount	Integer	The number of nodes stored in the projected graph.
relationshipCount	Integer	The number of relationships stored in the projected graph.
projectMillis	Integer	Milliseconds for projecting the graph.
query	String	The query used for this projection.
configuration	Integer	The configuration used for this projection.

To get information about a stored graph, such as its schema, one can use gds.graph.list.

Examples

All the examples below should be run in an empty database.

In order to demonstrate the GDS Cypher Aggregation we are going to create a small social network graph in Neo4j. The example graph looks like this:

The following Cypher statement will create the example graph in the Neo4j database:

CREATE
  (florentin:Person { name: 'Florentin', age: 16 }),
  (adam:Person { name: 'Adam', age: 18 }),
  (veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
  (hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
  (frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),

  (florentin)-[:KNOWS { since: 2010 }]->(adam),
  (florentin)-[:KNOWS { since: 2018 }]->(veselin),
  (florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
  (florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
  (adam)-[:READ { numberOfPages: 30 }]->(hobbit),
  (veselin)-[:READ]->(frankenstein)

Simple graph

A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph. We are going to start with demonstrating how to load a simple graph by projecting only the Person node label and KNOWS relationship type.

Project Person nodes and KNOWS relationships:

MATCH (source:Person)-[r:KNOWS]->(target:Person)
WITH gds.graph.project('persons', source, target) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 5. Results
graph	nodes	rels
"persons"	3	`2`

Graph with unconnected nodes

In order to project nodes that are not connected, we can use an OPTIONAL MATCH. To demonstrate we are projecting all nodes, where some might be connected with the KNOWS relationship type.

Project all nodes and KNOWS relationships:

MATCH (source) OPTIONAL MATCH (source)-[r:KNOWS]->(target)
WITH gds.graph.project('persons', source, target) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 6. Results
graph	nodes	rels
"persons"	5	`2`

Using the parallel runtime

Cypher projection is compatible with the parallel runtime.

Project using the parallel runtime:

CYPHER runtime=parallel
MATCH (source:Person)-[r:KNOWS]->(target:Person)
WITH gds.graph.project('persons', source, target) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 7. Results
graph	nodes	rels
"persons"	3	`2`

Arbitrary source and target ID values

So far, the examples showed how to project a graph based on existing nodes. It is also possible to pass INTEGER values directly.

Project arbitrary id values:

UNWIND [ [42, 84], [13, 37], [19, 84] ] AS sourceAndTarget
WITH sourceAndTarget[0] AS source, sourceAndTarget[1] AS target
WITH gds.graph.project('arbitrary', source, target) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 8. Results
graph	nodes	rels
"arbitrary"	5	`3`

The projected graph can no longer connect to projected nodes to existing nodes in the underlying database. As such, .write procedures cannot be executed on this graph.

Multi-graph

A multi-graph is a graph with multiple node labels and relationship types.

To retain the label when we load multiple node labels, we can add a sourceNodeLabels key and a targetNodeLabels key to the fourth dataConfig parameter. — To retain the type information when we load multiple relationship types, we can add a relationshipType key to the fourth dataConfig parameter.

Project Person and Book nodes and KNOWS and READ relationships:

MATCH (source)
WHERE source:Person OR source:Book
OPTIONAL MATCH (source)-[r:KNOWS|READ]->(target)
WHERE target:Person OR target:Book
WITH gds.graph.project(
  'personsAndBooks',
  source,
  target,
  {
    sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target),
    relationshipType: type(r)
  }
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 9. Results
graph	nodes	rels
"personsAndBooks"	`5`	`6`

The value for sourceNodeLabels or targetNodeLabels can be one of the following:

Table 10. Values for `sourceNodeLabels` and `targetNodeLabels`
type	example	description
List of String	`labels(s)` or `['A', 'B']`	Associate all labels in that list with the source or target node.
String	`'A'`	Associate that label with the source or target node.
Boolean	`true`	Associate all labels of the source or target node; same as `labels(s)`.
Boolean	`false`	Don’t load any label information for the source or target node; same as if `nodeLabels` was missing.

The value for relationshipType must be a String:

Table 11. Values for `relationshipType`
type	example	description
String	`type(r)` or `'A'`	Associate that type with the relationship.

Relationship orientation

The native projection supports specifying an orientation per relationship type. The Cypher Aggregation will treat every relationship returned by the relationship query as if it was in NATURAL orientation by default.

Reverse relationships

The orientation of a relationship can be reversed by switching the source and target nodes.

Project Person and Book nodes and KNOWS and READ relationships:

MATCH (source)-[r:KNOWS|READ]->(target)
WHERE source:Book OR source:Person
WITH gds.graph.project(
  'graphWithReverseRelationships',
  target,
  source
) as g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 12. Results
graph	nodes	rels
"graphWithReverseRelationships"	5	6

Undirected relationships

Relationships can be projected as undirected by specifying the undirectedRelationshipTypes parameter.

Project Person and Book nodes and KNOWS and READ relationships:

MATCH (source)-[r:KNOWS|READ]->(target)
WHERE source:Book OR source:Person
WITH gds.graph.project(
  'graphWithUndirectedRelationships',
  source,
  target,
  {},
  {undirectedRelationshipTypes: ['*']}
) as g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 13. Results
graph	nodes	rels
"graphWithUndirectedRelationships"	5	12

Add both natural and reverse relationships

To add both a relationship with its natural and with its reverse orientation, you can use a UNION clause within a subquery.

Project Person nodes and KNOWS and KNOWN_BY relationships:

MATCH (source:Person)-[:KNOWS]->(target:Person)
CALL {
  WITH source, target
  RETURN id(source) AS sourceId, id(target) AS targetId, 'KNOWS' AS rType
  UNION
  WITH source, target
  RETURN id(target) AS sourceId, id(source) AS targetId, 'KNOWN_BY' AS rType
}
WITH gds.graph.project(
  'graphWithNaturalAndReverseRelationships',
  sourceId,
  targetId,
  {
    sourceNodeLabels: 'Person',
    targetNodeLabels: 'Person',
    relationshipType: rType
  }
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 14. Results
graph	nodes	rels
"graphWithNaturalAndReverseRelationships"	3	4

Node properties

To load node properties, we add a map of all properties for the source and target nodes. Thereby, we use the Cypher function coalesce() function to specify the default value, if the node does not have the property.

The properties for the source node are specified as sourceNodeProperties key in the fourth dataConfig parameter. The properties for the target node are specified as targetNodeProperties key in the fourth dataConfig parameter.

Project Person and Book nodes and KNOWS and READ relationships:

MATCH (source)-[r:KNOWS|READ]->(target)
WHERE source:Book OR source:Person
WITH gds.graph.project(
  'graphWithProperties',
  source,
  target,
  {
    sourceNodeProperties: source { age: coalesce(source.age, 18), price: coalesce(source.price, 5.0), .ratings },
    targetNodeProperties: target { age: coalesce(target.age, 18), price: coalesce(target.price, 5.0), .ratings }
  }
) as g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 15. Results
graph	nodes	rels
"graphWithProperties"	5	6

The projected graphWithProperties graph contains five nodes and six relationships. In a Cypher Aggregation every node will get the same properties, which means you can’t have node-specific properties. For instance in the example above the Person nodes will also get ratings and price properties, while Book nodes get the age property.

Further, the price property has a default value of 5.0. Not every book has a price specified in the example graph. In the following we check if the price was correctly projected:

Verify the ratings property of Adam in the projected graph:

MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', n, 'price') AS price
ORDER BY price

Table 16. Results
name	price
"The Hobbit"	5.0
"Frankenstein"	19.99

We can see, that the price was projected with the Hobbit having the default price of 5.0.

Relationship properties

Analogous to node properties, we can project relationship properties using the fourth parameter.

Project Person and Book nodes and READ relationships with numberOfPages property:

MATCH (source)-[r:READ]->(target)
WITH gds.graph.project(
  'readWithProperties',
  source,
  target,
  { relationshipProperties: r { .numberOfPages } }
) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 17. Results
graph	nodes	rels
"readWithProperties"	5	4

Next, we will verify that the relationship property numberOfPages was correctly loaded.

Stream the relationship property numberOfPages from the projected graph:

CALL gds.graph.relationshipProperty.stream('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY person ASC, numberOfPages DESC

Table 18. Results
person	book	numberOfPages
"Adam"	"The Hobbit"	30.0
"Florentin"	"The Hobbit"	42.0
"Florentin"	"The Hobbit"	4.0
"Veselin"	"Frankenstein"	NaN

We can see, that the numberOfPages are loaded. The default property value is Double.Nan and can be changed as in the previous example Node properties by using the Cypher function coalesce().

Parallel relationships

The Property Graph Model in Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves the parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.

The simplest way to achieve relationship deduplication is to use the DISTINCT operator in the relationship query. Alternatively, we can aggregate the parallel relationship by using the count() function and store the count as a relationship property.

Project Person and Book nodes and COUNT aggregated READ relationships:

MATCH (source)-[r:READ]->(target)
WITH source, target, count(r) AS numberOfReads
WITH gds.graph.project('readCount', source, target, { relationshipProperties: { numberOfReads: numberOfReads } }) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 19. Results
graph	nodes	rels
"readCount"	5	3

Next, we will verify that the READ relationships were correctly aggregated.

Stream the relationship property numberOfReads of the projected graph:

CALL gds.graph.relationshipProperty.stream('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfReads
ORDER BY numberOfReads DESC, person

Table 20. Results
person	book	numberOfReads
"Florentin"	"The Hobbit"	2.0
"Adam"	"The Hobbit"	1.0
"Veselin"	"Frankenstein"	1.0

We can see, that the two READ relationships between Florentin and the Hobbit result in 2 numberOfReads.

Parallel relationships with properties

For graphs with relationship properties we can also use other aggregations documented in the Cypher Manual.

Project Person and Book nodes and aggregated READ relationships by summing the numberOfPages:

MATCH (source)-[r:READ]->(target)
WITH source, target, sum(r.numberOfPages) AS numberOfPages
WITH gds.graph.project('readSums', source, target, { relationshipProperties: { numberOfPages: numberOfPages } }) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 21. Results
graph	nodes	rels
"readSums"	5	3

Next, we will verify that the relationship property numberOfPages were correctly aggregated.

Stream the relationship property numberOfPages of the projected graph:

CALL gds.graph.relationshipProperty.stream('readSums', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY numberOfPages DESC, person

Table 22. Results
person	book	numberOfPages
"Florentin"	"The Hobbit"	46.0
"Adam"	"The Hobbit"	30.0
"Veselin"	"Frankenstein"	0.0

We can see, that the two READ relationships between Florentin and the Hobbit sum up to 46 numberOfPages.

Projecting filtered Neo4j graphs

Cypher-projections allow us to specify the graph to project in a more fine-grained way. The following examples will demonstrate how to filter out READ relationships if they do not have a numberOfPages property.

Project Person and Book nodes and READ relationships where numberOfPages is present:

MATCH (source) OPTIONAL MATCH (source)-[r:READ]->(target)
WHERE r.numberOfPages IS NOT NULL
WITH gds.graph.project('existingNumberOfPages', source, target, { relationshipProperties: r { .numberOfPages } }) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels

Table 23. Results
graph	nodes	rels
"existingNumberOfPages"	5	3

Next, we will verify that the relationship property numberOfPages was correctly loaded.

Stream the relationship property numberOfPages from the projected graph:

CALL gds.graph.relationshipProperty.stream('existingNumberOfPages', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY person ASC, numberOfPages DESC

Table 24. Results
person	book	numberOfPages
"Adam"	"The Hobbit"	30.0
"Florentin"	"The Hobbit"	42.0
"Florentin"	"The Hobbit"	4.0

If we compare the results to the ones from Relationship properties, we can see that using IS NOT NULL is filtering out the relationship from Veselin to the book Frankenstein. This functionality is only expressible with native projections by projecting a subgraph.