Creating graphs using Cypher

This section details projecting GDS graphs using Cypher projections.

A projected graph can be stored in the catalog under a user-defined name. Using that name, the graph can be referred to by any algorithm in the library. This allows multiple algorithms to use the same graph without having to re-create it on each algorithm run.

Using Cypher projections is a more flexible and expressive approach with diminished focus on performance compared to the native projections. Cypher projections are primarily recommended for the development phase (see Common usage).

There is also a way to generate a random graph, see Graph Generation documentation for more details.

The projected graph will reside in the catalog until:

  • the graph is dropped using gds.graph.drop

  • the Neo4j database from which the graph was projected is stopped or dropped

  • the Neo4j database management system is stopped.

1. Syntax

A Cypher projection takes three mandatory arguments: graphName, nodeQuery and relationshipQuery. In addition, the optional configuration parameter allows us to further configure graph creation.

CALL gds.graph.create.cypher(
    graphName: String,
    nodeQuery: String,
    relationshipQuery: String,
    configuration: Map
) YIELD
    graphName: String,
    nodeQuery: String,
    nodeCount: Integer,
    relationshipQuery: String,
    relationshipCount: Integer,
    createMillis: Integer
Table 1. Parameters
Name Optional Description

graphName

no

The name under which the graph is stored in the catalog.

nodeQuery

no

Cypher query to project nodes. The query result must contain an id column. Optionally, a labels column can be specified to represent node labels. Additional columns are interpreted as properties.

relationshipQuery

no

Cypher query to project relationships. The query result must contain source and target columns. Optionally, a type column can be specified to represent relationship type. Additional columns are interpreted as properties.

configuration

yes

Additional parameters to configure the Cypher projection.

Table 2. Configuration
Name Type Default Description

readConcurrency

Integer

4

The number of concurrent threads used for creating the graph.

validateRelationships

Boolean

true

Whether to throw an error if the relationshipQuery returns relationships between nodes not returned by the nodeQuery.

parameters

Map

{}

A map of user-defined query parameters that are passed into the node and relationship queries.

Table 3. Results
Name Type Description

graphName

String

The name under which the graph is stored in the catalog.

nodeQuery

String

The Cypher query used to project the nodes in the graph.

nodeCount

Integer

The number of nodes stored in the projected graph.

relationshipQuery

String

The Cypher query used to project the relationships in the graph.

relationshipCount

Integer

The number of relationships stored in the projected graph.

createMillis

Integer

Milliseconds for creating the graph.

To get information about a stored graph, such as its schema, one can use gds.graph.list.

2. Examples

In order to demonstrate the GDS Graph Create capabilities we are going to create a small social network graph in Neo4j. The example graph looks like this:

Visualization of the example graph
The following Cypher statement will create the example graph in the Neo4j database:
CREATE
  (florentin:Person { name: 'Florentin', age: 16 }),
  (adam:Person { name: 'Adam', age: 18 }),
  (veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
  (hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
  (frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),

  (florentin)-[:KNOWS { since: 2010 }]->(adam),
  (florentin)-[:KNOWS { since: 2018 }]->(veselin),
  (florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
  (florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
  (adam)-[:READ { numberOfPages: 30 }]->(hobbit),
  (veselin)-[:READ]->(frankenstein)

2.1. Simple graph

A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph. We are going to start with demonstrating how to load a simple graph by projecting only the Person node label and KNOWS relationship type.

Project Person nodes and KNOWS relationships:
CALL gds.graph.create.cypher(
  'persons',
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) AS source, id(m) AS target')
YIELD
  graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
Table 4. Results
graph nodeQuery nodes relationshipQuery rels

"persons"

"MATCH (n:Person) RETURN id(n) AS id"

3

"MATCH (n:Person)-[r:KNOWS]→(m:Person) RETURN id(n) AS source, id(m) AS target"

2

2.2. Multi-graph

A multi-graph is a graph with multiple node labels and relationship types.

To retain the label and type information when we load multiple node labels and relationship types, we can add a labels column to the node query and a type column to the relationship query.

Project Person and Book nodes and KNOWS and READ relationships:
CALL gds.graph.create.cypher(
  'personsAndBooks',
  'MATCH (n) WHERE n:Person OR n:Book RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type')
YIELD
  graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipCount AS rels
Table 5. Results
graph nodeQuery nodes rels

"personsAndBooks"

"MATCH (n) WHERE n:Person OR n:Book RETURN id(n) AS id, labels(n) AS labels"

5

6

2.3. Relationship orientation

The native projection supports specifying an orientation per relationship type. The Cypher projection will treat every relationship returned by the relationship query as if it was in NATURAL orientation. It is thus not possible to project graphs in UNDIRECTED or REVERSE orientation when Cypher projections are used.

Some algorithms require that the graph was loaded with UNDIRECTED orientation. These algorithms can not be used with a graph created by a Cypher projection.

2.4. Node properties

To load node properties, we add a column to the result of the node query for each property. Thereby, we use the Cypher function coalesce() function to specify the default value, if the node does not have the property.

Project Person and Book nodes and KNOWS and READ relationships:
CALL gds.graph.create.cypher(
  'graphWithProperties',
  'MATCH (n)
   WHERE n:Book OR n:Person
   RETURN
    id(n) AS id,
    labels(n) AS labels,
    coalesce(n.age, 18) AS age,
    coalesce(n.price, 5.0) AS price,
    n.ratings AS ratings',
  'MATCH (n)-[r:KNOWS|READ]->(m) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodes, rels
Table 6. Results
graphName nodes rels

"graphWithProperties"

5

6

The projected graphWithProperties graph contains five nodes and six relationships. In a Cypher projection every node from the nodeQuery gets the same node properties, which means you can’t have label-specific properties. For instance in the example above the Person nodes will also get ratings and price properties, while Book nodes get the age property.

Further, the price property has a default value of 5.0. Not every book has a price specified in the example graph. In the following we check if the price was correctly projected:

Verify the ratings property of Adam in the projected graph:
MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', id(n), 'price') AS price
ORDER BY price
Table 7. Results
name price

"The Hobbit"

5.0

"Frankenstein"

19.99

We can see, that the price was projected with the Hobbit having the default price of 5.0.

2.5. Relationship properties

Analogous to node properties, we can project relationship properties using the relationshipQuery.

Project Person and Book nodes and READ relationships with numberOfPages property:
CALL gds.graph.create.cypher(
  'readWithProperties',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
Table 8. Results
graph nodes rels

"readWithProperties"

5

4

Next, we will verify that the relationship property numberOfPages was correctly loaded.

Stream the relationship property numberOfPages from the projected graph:
CALL gds.graph.streamRelationshipProperty('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY person ASC, numberOfPages DESC
Table 9. Results
person book numberOfPages

"Adam"

"The Hobbit"

30.0

"Florentin"

"The Hobbit"

42.0

"Florentin"

"The Hobbit"

4.0

"Veselin"

"Frankenstein"

NaN

We can see, that the numberOfPages are loaded. The default property value is Double.Nan and can be changed as in the previous example Node properties by using the Cypher function coalesce().

2.6. Parallel relationships

The Property Graph Model in Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves the parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.

The simplest way to achieve relationship deduplication is to use the DISTINCT operator in the relationship query. Alternatively, we can aggregate the parallel relationship by using the count() function and store the count as a relationship property.

Project Person and Book nodes and COUNT aggregated READ relationships:
CALL gds.graph.create.cypher(
  'readCount',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, count(r) AS numberOfReads'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
Table 10. Results
graph nodes rels

"readCount"

5

3

Next, we will verify that the READ relationships were correctly aggregated.

Stream the relationship property numberOfReads of the projected graph:
CALL gds.graph.streamRelationshipProperty('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfReads
ORDER BY numberOfReads DESC, person
Table 11. Results
person book numberOfReads

"Florentin"

"The Hobbit"

2.0

"Adam"

"The Hobbit"

1.0

"Veselin"

"Frankenstein"

1.0

We can see, that the two READ relationships between Florentin and the Hobbit result in 2 numberOfReads.

2.7. Parallel relationships with properties

For graphs with relationship properties we can also use other aggregations documented in the Cypher Manual.

Project Person and Book nodes and aggregated READ relationships by summing the numberOfPages:
CALL gds.graph.create.cypher(
  'readSums',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, sum(r.numberOfPages) AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
Table 12. Results
graph nodes rels

"readSums"

5

3

Next, we will verify that the relationship property numberOfPages were correctly aggregated.

Stream the relationship property numberOfPages of the projected graph:
CALL gds.graph.streamRelationshipProperty('readSums', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY numberOfPages DESC, person
Table 13. Results
person book numberOfPages

"Florentin"

"The Hobbit"

46.0

"Adam"

"The Hobbit"

30.0

"Veselin"

"Frankenstein"

0.0

We can see, that the two READ relationships between Florentin and the Hobbit sum up to 46 numberOfPages.

2.8. Projecting filtered Neo4j graphs

Cypher-projections allow us to specify the graph to project in a more fine-grained way. The following examples will demonstrate how we to filter out READ relationship if they do not have a numberOfPages property.

Project Person and Book nodes and READ relationships where numberOfPages is present:
CALL gds.graph.create.cypher(
  'existingNumberOfPages',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    WHERE r.numberOfPages IS NOT NULL
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
Table 14. Results
graph nodes rels

"existingNumberOfPages"

5

3

Next, we will verify that the relationship property numberOfPages was correctly loaded.

Stream the relationship property numberOfPages from the projected graph:
CALL gds.graph.streamRelationshipProperty('existingNumberOfPages', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY person ASC, numberOfPages DESC
Table 15. Results
person book numberOfPages

"Adam"

"The Hobbit"

30.0

"Florentin"

"The Hobbit"

42.0

"Florentin"

"The Hobbit"

4.0

If we compare the results to the ones from Relationship properties, we can see that using IS NOT NULL is filtering out the relationship from Veselin to the book Frankenstein. This functionality is only expressible with native projections by creating a subraph.

2.9. Using query parameters

Similar to Cypher, it is also possible to set query parameters. In the following example we supply a list of strings to limit the cities we want to project.

Project Person and Book nodes and READ relationships where numberOfPages is greater than 9:
CALL gds.graph.create.cypher(
  'existingNumberOfPages',
  'MATCH (n) RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:READ]->(m)
    WHERE r.numberOfPages > $minNumberOfPages
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
  { parameters: { minNumberOfPages: 9} }
)
YIELD
  graphName AS graph, nodeCount AS nodes, relationshipCount AS rels
Table 16. Results
graph nodes rels

"existingNumberOfPages"

5

2

2.10. Further usage of parameters

The parameters can also be used to directly pass in a list of nodes or a list of relationships. For example, pre-computing the list of nodes can be useful if the node filter is expensive.

Project Person nodes younger than 17 and their name not beginning with V, and KNOWS relationships:
CALL gds.graph.create.cypher(
  'personSubset',
  'MATCH (n)
    WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
    RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:KNOWS]->(m)
    WHERE (n.age < 20 AND NOT n.name STARTS WITH "V") AND
          (m.age < 20 AND NOT m.name STARTS WITH "V")
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages'
)
YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels
Table 17. Results
graphName nodes rels

"personSubset"

2

1

By passing the relevant Persons as a parameter, the above query can be transformed into the following:

Project Person nodes younger than 20 and their name not beginning with V, and KNOWS relationships by using parameters:
MATCH (n)
WHERE n.age < 20 AND NOT n.name STARTS WITH "V"
WITH collect(n) AS olderPersons
CALL gds.graph.create.cypher(
  'personSubsetViaParameters',
  'UNWIND $nodes AS n RETURN id(n) AS id, labels(n) AS labels',
  'MATCH (n)-[r:KNOWS]->(m)
    WHERE (n IN $nodes) AND (m IN $nodes)
    RETURN id(n) AS source, id(m) AS target, type(r) AS type, r.numberOfPages AS numberOfPages',
  { parameters: { nodes: olderPersons} }
)
 YIELD
  graphName, nodeCount AS nodes, relationshipCount AS rels
 RETURN graphName, nodes, rels
Table 18. Results
graphName nodes rels

"personSubsetViaParameters"

2

1