Cypher projection
This chapter explains how to create a graph using a Cypher projection.
If the Native projection is not expressive enough to describe the in-memory graph, we can instead use Cypher queries to select nodes and relationships. One benefit of using Cypher queries is the possibility to form the graph from data that exists only at query time. A common use case is the reduction of paths into single relationships between the start and end node of the path.
MATCH (p1:Author)-[:WROTE]->(a:Article)<-[:WROTE]-(p2:Author)
RETURN id(p1) AS source, id(p2) AS target, count(a) AS weight
For certain Cypher patterns, the same behaviour can be achieved using the Collapse Path algorithm in combination with native projections. |
Cypher projections are especially useful during the development phase. Their flexibility is convenient when exploring data and algorithms, and designing a workflow. However, creating a graph from a Cypher projection can be significantly slower than creating it directly from the Neo4j store files. For production, it is recommended to adapt the domain model in a way that it can take advantage of the loading speed of native projections.
During graph creation, GDS performs automatic memory estimation and potential execution blocking by default. This feature is off by default for Cypher projections as the memory consumption might be overestimated. |
1. Syntax
A Cypher projection takes three mandatory arguments: graphName
, nodeQuery
and relationshipQuery
.
In addition, the optional configuration
parameter allows us to further configure graph creation.
CALL gds.graph.create.cypher(
graphName: String,
nodeQuery: String,
relationshipQuery: String,
configuration: Map
)
Name | Optional | Description |
---|---|---|
graphName |
no |
The name under which the graph is stored in the catalog. |
nodeQuery |
no |
Cypher query to project nodes. |
relationshipQuery |
no |
Cypher query to project relationships. |
configuration |
yes |
Additional parameters to configure the Cypher projection. |
Name | Type | Default | Description |
---|---|---|---|
readConcurrency |
Integer |
4 |
The number of concurrent threads used for creating the graph. |
validateRelationships |
Boolean |
true |
Whether to throw an error if relationships contain nodes not included in the nodeQuery. |
parameters |
Map |
empty map |
A map of user-defined query parameters that are passed into the node and relationship query. |
To get information about a stored named graph, including its schema, one can use gds.graph.list.
2. Query constraints
The node query projects nodes and optionally their properties to an in-memory graph. Each row in the query result represents a node in the projected graph.
The query result must contain a column called id
.
The value in that column is used to uniquely identify the node.
MATCH (n) RETURN id(n) AS id
The relationship query projects relationships and optionally their type and properties to an in-memory graph. Each row in the query result represents a relationship in the projected graph.
The query result must contain a column called source
and a column called target
.
The values in those columns represent the source node id and the target node id of the relationship.
The values are used to connect the relationships to the nodes selected by the node query.
If either the source or the target value can not be mapped to a node, the relationship is not projected.
MATCH (n)-->(m) RETURN id(n) AS source, id(m) AS target
Using both example queries in a Cypher projection, we can project the whole Neo4j graph into an in-memory graph and store it in the catalog:
CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n) RETURN id(n) AS id',
'MATCH (n)-->(m) RETURN id(n) AS source, id(m) AS target'
)
Cypher projections allow creating graphs from arbitrary query results, regardless of whether these map to actual identifiers in the Neo4j graph.
Executing an algorithm on such a graph in |
3. Node and relationship properties
Similar to the default native projection, we can load node and relationship properties using a Cypher projection.
Both node and relationship queries must return their respective mandatory columns, i.e., id
, source
and target
.
If a query returns additional columns, those columns are used as node and relationship properties, respectively.
Node property values can be of any supported type.
Relationship property values need to be of numeric type.
If a value is null
a default value of the determined property type is loaded instead.
If we want to use a different default value, the coalesce
function can be used.
For more information about the supported property types see Node Properties.
The following Cypher projection loads multiple node and relationship properties:
CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n:City) RETURN id(n) AS id, n.stateId AS community, n.population AS population',
'MATCH (n:City)-[r:ROAD]->(m:City) RETURN id(n) AS source, id(m) AS target, r.distance AS distance, coalesce(r.condition, 1.0) AS quality'
)
The projected properties can be referred to by any algorithm that uses properties as input, for example, Label Propagation.
CALL gds.labelPropagation.stream(
'my-cypher-graph', {
seedProperty: 'community',
relationshipWeightProperty: 'quality'
}
)
4. Node labels
Native projections supports specifying multiple node labels which can be filtered in an individual algorithm execution.
Cypher projections can achieve the same feature by returning the node label in the node query.
If a column called labels
is present in the node query result, we use the values in that column to distinguish node labels.
This column is expected to return a list of strings.
Consider the following example where Author
nodes are connected by WROTE
relationships to either Article
or Book
nodes.
labels
column to distinguish between node labels.CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n) WHERE n:Author OR n:Article OR n:Book RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n:Author)-[r:WROTE]->(m) RETURN id(n) AS source, id(m) AS target'
)
The created graph will be composed of nodes labeled with either :Book
, :Article
, or :Author
.
This allows us to apply a node filter during algorithm execution:
CALL gds.labelPropagation.stream(
'my-cypher-graph', {
nodeLabels: ['Author', 'Book']
}
)
5. Relationship types
The native projection supports loading multiple relationship types which can be filtered in an individual algorithm execution.
The Cypher projection can achieve the same feature by returning the relationship type in the query.
If the type
column is present in the query result, we use the values in that column to distinguish relationship types.
For the following example, let’s assume City
nodes to be connected by either ROAD
or RAIL
relationships.
type
column to distinguish between multiple relationship types.CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n:City) RETURN id(n) AS id',
'MATCH (n:City)-[r:ROAD|RAIL]->(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
The loaded graph will be composed of the two relationship types. This allows us to apply a relationship filter during algorithm execution:
CALL gds.labelPropagation.stream(
'my-cypher-graph', {
relationshipTypes: ['ROAD']
}
)
6. Relationship orientation
The native projection supports specifying an orientation per relationship type.
The cypher projection can achieve the same feature by adjusting the MATCH
clause of the relationship query.
NATURAL
CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n:City) RETURN id(n) AS id',
'MATCH (n:City)-[r:ROAD|RAIL]->(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
UNDIRECTED
CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n:City) RETURN id(n) AS id',
'MATCH (n:City)-[r:ROAD|RAIL]-(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
Note the missing arrow in the Match
clause of the relationship query.
REVERSE
CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n:City) RETURN id(n) AS id',
'MATCH (n:City)<-[r:ROAD|RAIL]-(m:City) RETURN id(n) AS source, id(m) AS target, type(r) AS type'
)
The REVERSE
orientation can also be achieved by swapping source and target in the RESULT
clause.
7. Relationship aggregation
The property graph model supports parallel relationships, which means two nodes can be connected by multiple relationships of the same relationship type. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.
The simplest way to achieve this is to use the DISTINCT
operator in the relationship query:
MATCH (n:City)-[r:ROAD]->(m:City)
RETURN DISINCT id(n) AS source, id(m) AS target
If we also want to load relationship properties, aggregating the values of parallel edges can also be achieved using Cypher.
MATCH (n:City)-[r:ROAD]->(m:City)
RETURN
id(n) AS source,
id(m) AS target,
min(r.distance) AS minDistance,
coalesce(max(r.condition), 1.0) AS maxQuality
8. Using query parameters
Similar to Cypher, it is also possible to set query parameters. In the following example we supply a list of strings to limit the cities we want to project.
CALL gds.graph.create.cypher(
'my-cypher-graph',
'MATCH (n:City) WHERE n.name IN $cities RETURN id(n) AS id',
'MATCH (n:City)-[r:ROAD]->(m:City) WHERE n.name IN $cities AND m.name IN $cities RETURN id(n) AS source, id(m) AS target',
{
parameters: { cities: ["Leipzig", "Malmö"] }
}
)
If node label and relationship type are not selective enough to create the graph projection to run the algorithm on, you can use Cypher queries to project your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Cypher projection section of the manual.
If the similarity lists are very large they can take up a lot of memory. For cases where those lists contain lots of values that should be skipped, you can use the less memory-intensive approach of using Cypher statements to project the graph instead.
The Cypher loader expects to receive 3 fields:
-
item
- should contain node ids, which we can return using theid
function. -
category
- should contain node ids, which we can return using theid
function. -
weight
- should contain a double value.
Was this page helpful?