4.3. Cypher projection

This chapter explains how to create a graph using a Cypher projection.

If the Native projection is not expressive enough to describe the in-memory graph, we can instead use Cypher queries to select nodes and relationships. One benefit of using Cypher queries is the possibility to form the graph from data that exists only at query time. A common use case is the reduction of paths into single relationships between the start and end node of the path.

The following query reduces a 2-hop path to a single relationship effectively representing co-authors: 

MATCH (p1:Author)-[:WROTE]->(a:Article)<-[:WROTE]-(p2:Author)
RETURN id(p1) AS source, id(p2) AS target, count(a) AS weight

Cypher projections are especially useful during the development phase. Their flexibility is convenient when exploring data and algorithms, and designing a workflow. However, creating a graph from a Cypher projection can be significantly slower than creating it directly from the Neo4j store files. For production, it is recommended to adapt the domain model in a way that it can take advantage of the loading speed of native projections.

During graph creation, GDS performs automatic memory estimation and potential execution blocking by default. This feature is off by default for Cypher projections as the memory consumption might be overestimated.

This section includes:

4.3.1. Syntax

A Cypher projection takes three mandatory arguments: graphName, nodeQuery and relationshipQuery. In addition, the optional configuration parameter allows us to further configure graph creation.

CALL gds.graph.create.cypher(
    graphName: String,
    nodeQuery: String,
    relationshipQuery: String,
    configuration: Map
)
Table 4.4. Parameters
Name Optional Description

graphName

no

The name under which the graph is stored in the catalog.

nodeQuery

no

Cypher query to project nodes.

relationshipQuery

no

Cypher query to project relationships.

configuration

yes

Additional parameters to configure the Cypher projection.

Table 4.5. Configuration
Name Type Default Description

readConcurrency

Integer

4

The number of concurrent threads used for creating the graph.

validateRelationships

Boolean

true

Whether to throw an error if relationships contain nodes not included in the nodeQuery.

parameters

Map

empty map

A map of user-defined query parameters that are passed into the node and relationship query.

To get information about a stored named graph, including its schema, one can use gds.graph.list.

4.3.2. Query constraints

The node query projects nodes and optionally their properties to an in-memory graph. Each row in the query result represents a node in the projected graph.

The query result must contain a column called id. The value in that column is used to uniquely identify the node.

Simple example of a node query used for Cypher projection. 

MATCH (n) RETURN id(n) AS id

The relationship query projects relationships and optionally their type and properties to an in-memory graph. Each row in the query result represents a relationship in the projected graph.

The query result must contain a column called source and a column called target. The values in those columns represent the source node id and the target node id of the relationship. The values are used to connect the relationships to the nodes selected by the node query. If either the source or the target value can not be mapped to a node, the relationship is not projected.

Simple example of a relationship query used for Cypher projection. 

MATCH (n)-->(m) RETURN id(n) AS source, id(m) AS target

Using both example queries in a Cypher projection, we can project the whole Neo4j graph into an in-memory graph and store it in the catalog:

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n) RETURN id(n) AS id',
    'MATCH (n)-->(m) RETURN id(n) AS source, id(m) AS target'
)

Cypher projections allow creating graphs from arbitrary query results, regardless of whether these map to actual identifiers in the Neo4j graph. Executing an algorithm on such a graph in write mode may lead to unexpected changes in the Neo4j database.

4.3.3. Node and relationship properties

Similar to the default native projection, we can load node and relationship properties using a Cypher projection.

Both node and relationship queries must return their respective mandatory columns, i.e., id, source and target. If a query returns additional columns, those columns are used as node and relationship properties, respectively.

The values stored in property columns need to be numeric. If a value is null a default value (Double.NaN) is loaded instead. If we want to use a different default value, the coalesce function can be used.

The following Cypher projection loads multiple node and relationship properties:

Projecting query columns into node and relationship properties. 

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:City) RETURN id(n) AS id, n.stateId AS community, n.population AS population',
    'MATCH (n:City)-[r:ROAD]->(m:City) RETURN id(n) AS source, id(m) AS target, r.distance AS distance, coalesce(r.condition, 1.0) AS quality'
)

The projected properties can be referred to by any algorithm that uses properties as input, for example, Label Propagation.

CALL gds.labelPropagation.stream(
    'my-cypher-graph', {
        seedProperty: 'community',
        relationshipWeightProperty: 'quality'
    }
)

4.3.4. Node labels

Native projections supports specifying multiple node labels which can be filtered in an individual algorithm execution. Cypher projections can achieve the same feature by returning the node label in the node query. If a column called labels is present in the node query result, we use the values in that column to distinguish node labels. This column is expected to return a list of strings.

Consider the following example where Author nodes are connected by WROTE relationships to either Article or Book nodes.

Using the labels column to distinguish between node labels. 

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n) WHERE n:Author OR n:Article OR n:Book RETURN id(n) AS id, labels(n) AS labels',
    'MATCH (n:Author)-[r:WROTE]->(m) RETURN id(n) AS source, id(m) AS target'
)

The created graph will be composed of nodes labeled with either :Book, :Article, or :Author. This allows us to apply a node filter during algorithm execution:

Using a node filter to run the algorithm on a subgraph. 

CALL gds.labelPropagation.stream(
    'my-cypher-graph', {
        nodeLabels: ['Author', 'Book']
    }
)

4.3.5. Relationship types

The native projection supports loading multiple relationship types which can be filtered in an individual algorithm execution. The Cypher projection can achieve the same feature by returning the relationship type in the query. If the type column is present in the query result, we use the values in that column to distinguish relationship types.

For the following example, let’s assume City nodes to be connected by either ROAD or RAIL relationships.

Using the type column to distinguish between multiple relationship types. 

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:City) RETURN id(n) AS id',
    'MATCH (n:City)-[r:ROAD|RAIL]->(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)

The loaded graph will be composed of the two relationship types. This allows us to apply a relationship filter during algorithm execution:

Using a relationship filter to run the algorithm on a subgraph. 

CALL gds.labelPropagation.stream(
    'my-cypher-graph', {
        relationshipTypes: ['ROAD']
    }
)

4.3.6. Relationship orientation

The native projection supports specifying an orientation per relationship type. The cypher projection can achieve the same feature by adjusting the MATCH clause of the relationship query.

Loading the relationships with orientation NATURAL

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:City) RETURN id(n) AS id',
    'MATCH (n:City)-[r:ROAD|RAIL]->(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)

Loading the relationships with orientation UNDIRECTED

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:City) RETURN id(n) AS id',
    'MATCH (n:City)-[r:ROAD|RAIL]-(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)

Note the missing arrow in the Match clause of the relationship query.

Loading the relationships with orientation REVERSE

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:City) RETURN id(n) AS id',
    'MATCH (n:City)<-[r:ROAD|RAIL]-(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)

The REVERSE orientation can also be achieved by swapping source and target in the RESULT clause.

4.3.7. Relationship aggregation

The property graph model supports parallel relationships, which means two nodes can be connected by multiple relationships of the same relationship type. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.

The simplest way to achieve this is to use the DISTINCT operator in the relationship query:

MATCH (n:City)-[r:ROAD]->(m:City)
RETURN DISINCT id(n) AS source, id(m) AS target

If we also want to load relationship properties, aggregating the values of parallel edges can also be achieved using Cypher.

MATCH (n:City)-[r:ROAD]->(m:City)
RETURN
    id(n) AS source,
    id(m) AS target,
    min(r.distance) AS minDistance,
    coalesce(max(r.condition), 1.0) AS maxQuality

4.3.8. Using query parameters

Similar to Cypher, it is also possible to set query parameters. In the following example we supply a list of strings to limit the cities we want to project.

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH (n:City) WHERE n.name IN $cities RETURN id(n) AS id',
    'MATCH (n:City)-[r:ROAD]->(m:City) WHERE n.name IN $cities AND m.name IN $cities RETURN id(n) AS source, id(m) AS target',
    {
       parameters: { cities: ["Leipzig", "Malmö"] }
    }
)