Projecting graphs using Cypher Aggregation
A projected graph can be stored in the catalog under a user-defined name. Using that name, the graph can be referred to by any algorithm in the library. This allows multiple algorithms to use the same graph without having to project it on each algorithm run.
Using Cypher aggregations is a more flexible and expressive approach with diminished focus on performance compared to the native projections. Cypher projections are primarily recommended for the development phase (see Common usage).
There is also a way to generate a random graph, see Graph Generation documentation for more details. |
The projected graph will reside in the catalog until:
|
1. Syntax
A Cypher aggregation is used in a query as an aggregation over the relationships that are being projected.
It takes three mandatory arguments: graphName
, sourceNode
and targetNode
.
In addition, the optional sourceNodeProperties
, targetNodeProperties
, and relationshipProperties
parameters allows us to project properties.
RETURN gds.alpha.graph.project(
graphName: String,
sourceNode: Node or Integer,
targetNode: Node or Integer,
nodesConfig: Map,
relationshipConfig: Map,
configuration: Map
) YIELD
graphName: String,
nodeCount: Integer,
relationshipCount: Integer,
projectMillis: Integer,
configuration: Map
Name | Optional | Description |
---|---|---|
graphName |
no |
The name under which the graph is stored in the catalog. |
sourceNode |
no |
The source node of the relationship. Must not be null. |
targetNode |
yes |
The target node of the relationship. The targetNode can be null (for example due to an |
nodesConfig |
yes |
Properties and Labels configuration for the source and target nodes. |
relationshipConfig |
yes |
Properties and Type configuration for the relationship. |
configuration |
yes |
Additional parameters to configure the cypher aggregation projection. |
Name | Type | Default | Description |
---|---|---|---|
readConcurrency |
Integer |
4 |
The number of concurrent threads used for creating the graph. |
undirectedRelationshipTypes |
List of String |
[] |
Declare a number of relationship types as undirected. Relationships with the specified types will be imported as undirected. |
inverseIndexedRelationshipTypes |
List of String |
[] |
Declare a number of relationship types which will also be indexed in inverse direction. |
Name | Type | Description |
---|---|---|
graphName |
String |
The name under which the graph is stored in the catalog. |
nodeCount |
Integer |
The number of nodes stored in the projected graph. |
relationshipCount |
Integer |
The number of relationships stored in the projected graph. |
projectMillis |
Integer |
Milliseconds for projecting the graph. |
configuration |
Integer |
The configuration used for this projection. |
To get information about a stored graph, such as its schema, one can use gds.graph.list. |
2. Examples
In order to demonstrate the GDS Cypher Aggregation we are going to create a small social network graph in Neo4j. The example graph looks like this:
CREATE
(florentin:Person { name: 'Florentin', age: 16 }),
(adam:Person { name: 'Adam', age: 18 }),
(veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
(hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
(frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),
(florentin)-[:KNOWS { since: 2010 }]->(adam),
(florentin)-[:KNOWS { since: 2018 }]->(veselin),
(florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
(florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
(adam)-[:READ { numberOfPages: 30 }]->(hobbit),
(veselin)-[:READ]->(frankenstein)
2.1. Simple graph
A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph.
We are going to start with demonstrating how to load a simple graph by projecting only the Person
node label and KNOWS
relationship type.
Person
nodes and KNOWS
relationships:MATCH (source:Person)-[r:KNOWS]->(target:Person)
WITH gds.alpha.graph.project('persons', source, target) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"persons" |
3 |
|
2.1.1. Graph with unconnected nodes
In order to project nodes that are not connected, we can use an OPTIONAL MATCH
.
To demonstrate we are projecting all nodes, where some might be connected with the KNOWS
relationship type.
KNOWS
relationships:MATCH (source) OPTIONAL MATCH (source)-[r:KNOWS]->(target)
WITH gds.alpha.graph.project('persons', source, target) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"persons" |
5 |
|
2.2. Arbitrary source and target ID values
So far, the examples showed how to project a graph based on existing nodes. It is also possible to pass INTEGER values directly.
UNWIND [ [42, 84], [13, 37], [19, 84] ] AS sourceAndTarget
WITH sourceAndTarget[0] AS source, sourceAndTarget[1] AS target
WITH gds.alpha.graph.project('arbitrary', source, target) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"arbitrary" |
5 |
|
The projected graph does not know that the IDs did not originate from an existing node.
Any procedure that interacts with the underlying db (such as the |
2.3. Multi-graph
A multi-graph is a graph with multiple node labels and relationship types.
To retain the label when we load multiple node labels, we can add a sourceNodeLabels
key and a targetNodeLabels
key to the fourth nodesConfig
parameter. — To retain the type information when we load multiple relationship types, we can add a relationshipType
key to the fifth relationshipConfig
parameter.
Person
and Book
nodes and KNOWS
and READ
relationships:MATCH (source)
WHERE source:Person OR source:Book
OPTIONAL MATCH (source)-[r:KNOWS|READ]->(target)
WHERE target:Person OR target:Book
WITH gds.alpha.graph.project(
'personsAndBooks',
source,
target,
{
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target)
},
{
relationshipType: type(r)
}
) AS g
RETURN g.graphName AS graph , g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"personsAndBooks" |
|
|
The value for sourceNodeLabels
or targetNodeLabels
can be one of the following:
type | example | description |
---|---|---|
List of String |
|
|
String |
|
|
Boolean |
|
|
Boolean |
|
|
The value for relationshipType
must be a String
:
type | example | description |
---|---|---|
String |
|
|
2.4. Relationship orientation
The native projection supports specifying an orientation per relationship type.
The Cypher Aggregation will treat every relationship returned by the relationship query as if it was in NATURAL
orientation.
It is thus not possible to project graphs in UNDIRECTED
or REVERSE
orientation when Cypher projections are used.
Some algorithms require that the graph was loaded with |
2.5. Node properties
To load node properties, we add a map of all properties for the source and target nodes. Thereby, we use the Cypher function coalesce() function to specify the default value, if the node does not have the property.
The properties for the source node are specified as sourceNodeProperties
key in the fourth nodesConfig
parameter.
The properties for the target node are specified as targetNodeProperties
key in the fourth nodesConfig
parameter.
Person
and Book
nodes and KNOWS
and READ
relationships:MATCH (source)-[r:KNOWS|READ]->(target)
WHERE source:Book OR source:Person
WITH gds.alpha.graph.project(
'graphWithProperties',
source,
target,
{
sourceNodeProperties: source { age: coalesce(source.age, 18), price: coalesce(source.price, 5.0), .ratings },
targetNodeProperties: target { age: coalesce(target.age, 18), price: coalesce(target.price, 5.0), .ratings }
}
) as g
RETURN g.graphName AS graph , g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"graphWithProperties" |
5 |
6 |
The projected graphWithProperties
graph contains five nodes and six relationships.
In a Cypher Aggregation every node will get the same properties, which means you can’t have node-specific properties.
For instance in the example above the Person
nodes will also get ratings
and price
properties, while Book
nodes get the age
property.
Further, the price
property has a default value of 5.0
.
Not every book has a price specified in the example graph.
In the following we check if the price was correctly projected:
MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', id(n), 'price') AS price
ORDER BY price
name | price |
---|---|
"The Hobbit" |
5.0 |
"Frankenstein" |
19.99 |
We can see, that the price was projected with the Hobbit having the default price of 5.0.
2.6. Relationship properties
Analogous to node properties, we can project relationship properties using the fifth parameter.
If we only want to project relationship properties and not any node properties or labels, we must provide a {}
value for the nodesConfig parameter.
Person
and Book
nodes and READ
relationships with numberOfPages
property:MATCH (source)-[r:READ]->(target)
WITH gds.alpha.graph.project(
'readWithProperties',
source,
target,
{},
{ properties: r { .numberOfPages } }
) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"readWithProperties" |
5 |
4 |
Next, we will verify that the relationship property numberOfPages
was correctly loaded.
numberOfPages
from the projected graph:CALL gds.graph.relationshipProperty.stream('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
person | book | numberOfPages |
---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
"Veselin" |
"Frankenstein" |
NaN |
We can see, that the numberOfPages
are loaded. The default property value is Double.Nan
and can be changed as in the previous example Node properties by using the Cypher function coalesce().
2.7. Parallel relationships
The Property Graph Model in Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves the parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.
The simplest way to achieve relationship deduplication is to use the DISTINCT
operator in the relationship query.
Alternatively, we can aggregate the parallel relationship by using the count() function and store the count as a relationship property.
Person
and Book
nodes and COUNT
aggregated READ
relationships:MATCH (source)-[r:READ]->(target)
WITH source, target, count(r) AS numberOfReads
WITH gds.alpha.graph.project('readCount', source, target, {}, { properties: { numberOfReads: numberOfReads } }) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"readCount" |
5 |
3 |
Next, we will verify that the READ
relationships were correctly aggregated.
numberOfReads
of the projected graph:CALL gds.graph.relationshipProperty.stream('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfReads
ORDER BY numberOfReads DESC, person
person | book | numberOfReads |
---|---|---|
"Florentin" |
"The Hobbit" |
2.0 |
"Adam" |
"The Hobbit" |
1.0 |
"Veselin" |
"Frankenstein" |
1.0 |
We can see, that the two READ relationships between Florentin and the Hobbit result in 2
numberOfReads.
2.8. Parallel relationships with properties
For graphs with relationship properties we can also use other aggregations documented in the Cypher Manual.
Person
and Book
nodes and aggregated READ
relationships by summing the numberOfPages
:MATCH (source)-[r:READ]->(target)
WITH source, target, sum(r.numberOfPages) AS numberOfPages
WITH gds.alpha.graph.project('readSums', source, target, {}, { properties: { numberOfPages: numberOfPages } }) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"readSums" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages
were correctly aggregated.
numberOfPages
of the projected graph:CALL gds.graph.relationshipProperty.stream('readSums', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY numberOfPages DESC, person
person | book | numberOfPages |
---|---|---|
"Florentin" |
"The Hobbit" |
46.0 |
"Adam" |
"The Hobbit" |
30.0 |
"Veselin" |
"Frankenstein" |
0.0 |
We can see, that the two READ
relationships between Florentin and the Hobbit sum up to 46
numberOfPages.
2.9. Projecting filtered Neo4j graphs
Cypher-projections allow us to specify the graph to project in a more fine-grained way.
The following examples will demonstrate how to filter out READ
relationships if they do not have a numberOfPages
property.
Person
and Book
nodes and READ
relationships where numberOfPages
is present:MATCH (source) OPTIONAL MATCH (source)-[r:READ]->(target)
WHERE r.numberOfPages IS NOT NULL
WITH gds.alpha.graph.project('existingNumberOfPages', source, target, {}, { properties: r { .numberOfPages } }) AS g
RETURN
g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels
graph | nodes | rels |
---|---|---|
"existingNumberOfPages" |
5 |
3 |
Next, we will verify that the relationship property numberOfPages
was correctly loaded.
numberOfPages
from the projected graph:CALL gds.graph.relationshipProperty.stream('existingNumberOfPages', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
gds.util.asNode(sourceNodeId).name AS person,
gds.util.asNode(targetNodeId).name AS book,
numberOfPages
ORDER BY person ASC, numberOfPages DESC
person | book | numberOfPages |
---|---|---|
"Adam" |
"The Hobbit" |
30.0 |
"Florentin" |
"The Hobbit" |
42.0 |
"Florentin" |
"The Hobbit" |
4.0 |
If we compare the results to the ones from Relationship properties, we can see that using IS NOT NULL
is filtering out the relationship from Veselin to the book Frankenstein.
This functionality is only expressible with native projections by projecting a subgraph.
Was this page helpful?