Projecting graphs using native projections

A native projection is the fastest and most scalable way to project a graph from a Neo4j database into the GDS Graph Catalog. Native projections are recommended for any use case, and for both the development and the production phase (see Common usage).

1. Considerations

1.1. Lifecycle

The projected graphs will reside in the catalog until either:

  • the graph is dropped using gds.graph.drop

  • the Neo4j database from which the graph was projected is stopped or dropped

  • the Neo4j database management system is stopped.

1.2. Node property support

Native projections can only project a limited set of node property types from the Neo4j database. The Node Properties page details which node property types are supported. Other types of node properties have to be transformed or encoded into one of the supported types in order to be projected using a native projection.

2. Syntax

A native projection takes three mandatory arguments: graphName, nodeProjection and relationshipProjection. In addition, the optional configuration parameter allows us to further configure the graph creation.

To get information about a previously projected graph, such as its schema, one can use gds.graph.list.
CALL gds.graph.project(
  graphName: String,
  nodeProjection: String or List or Map,
  relationshipProjection: String or List or Map,
  configuration: Map
) YIELD
  graphName: String,
  nodeProjection: Map,
  nodeCount: Integer,
  relationshipProjection: Map,
  relationshipCount: Integer,
  projectMillis: Integer
Table 1. Parameters
Name Type Optional Description

graphName

String

no

The name under which the graph is stored in the catalog.

nodeProjection

String, List or Map

no

One or more node projections.

relationshipProjection

String, List or Map

no

One or more relationship projections.

configuration

Map

yes

Additional parameters to configure the native projection.

Table 2. Configuration
Name Type Default Description

readConcurrency

Integer

4

The number of concurrent threads used for creating the graph.

nodeProperties

String, List or Map

{}

The node properties to load for all node projections.

relationshipProperties

String, List or Map

{}

The relationship properties to load for all relationship projections.

validateRelationships

Boolean

false

Whether to throw an error if the relationshipProjection includes relationships between nodes not part of the nodeProjection.

jobId

String

Generated internally

An ID that can be provided to more easily track the projection’s progress.

Table 3. Results
Name Type Description

graphName

String

The name under which the graph is stored in the catalog.

nodeProjection

Map

The node projections used to project the graph.

nodeCount

Integer

The number of nodes stored in the projected graph.

relationshipProjection

Map

The relationship projections used to project the graph.

relationshipCount

Integer

The number of relationships stored in the projected graph.

projectMillis

Integer

Milliseconds for projecting the graph.

2.1. Node Projection

The node projection specifies which nodes from the database should be projected into the in-memory GDS graph. The projection is based around node labels, and offers three different syntaxes that can be used based on how detailed the projection needs to be.

All nodes with any of the specified node labels will be projected to the GDS graph. If a node has several labels, it will be projected several times. If the nodes have values for the specified properties, these will be projected as well. If a node does not have a value for a specified property, a default value will be used. Read more about default values below.

All specified node labels and properties must exist in the database. To project using a non-existing label, it is possible to create a label without any nodes using the db.createLabel() procedure. Similarly, to project a non-existing property, it is possible to create a node property without modifying the database, using the db.createProperty() procedure.

2.1.1. Projecting a single label

The simplest syntax is to specify a single node label as a string value.

Short-hand String-syntax for nodeProjection. The projected graph will contain the given neo4j-label.
<neo4j-label>
Example outline:
CALL gds.graph.project(
  /* graph name */,
  'MyLabel',
  /* relationship projection */
)

2.1.2. Projecting multiple labels

To project more than one label, the list syntax is available. Specify all labels to be projected as a list of strings.

Short-hand List-syntax for nodeProjection. The projected graph will contain the given `neo4j-label`s.
[<neo4j-label>, ..., <neo4j-label>]
Example outline:
CALL gds.graph.project(
  /* graph name */,
  ['MyLabel', 'MySecondLabel', 'AnotherLabel']
  /* relationship projection */
)
We also support * as the neo4j-label to load all nodes. However, this does not keep the label information. To retain the label, we recommend using CALL db.labels() YIELD label WITH collect(label) AS allLabels.

2.1.3. Projecting labels with uniform node properties

In order to project properties in conjunction with the node labels, the nodeProperties configuration parameter can be used. This is a shorthand syntax to the full map-based syntax described below. The node properties specified with the nodeProperties parameter will be applied to all node labels specified in the node projection.

Example outline:
CALL gds.graph.project(
  /* graph name */,
  ['MyLabel', 'MySecondLabel', 'AnotherLabel']
  /* relationship projection */,
  { nodeProperties: ['prop1', 'prop2] }
)

2.1.4. Projecting multiple labels with name mapping and label-specific properties

The full node projection syntax uses a map. The keys in the map are the projected labels. Each value specifies the projection for that node label. The following syntax description and table details the format and expected values. Note that it is possible to project node labels to a label in the GDS graph with a different name.

The properties key can take a similar set of syntax variants as the node projection itself: a single string for a single property, a list of strings for multiple properties, or a map for the full syntax expressiveness.

Extended Map-syntax for nodeProjection.
{
    <projected-label>: {
        label: <neo4j-label>,
        properties: <neo4j-property-key>
    },
    <projected-label>: {
        label: <neo4j-label>,
        properties: [<neo4j-property-key>, <neo4j-property-key>, ...]
    },
    ...
    <projected-label>: {
        label: <neo4j-label>,
        properties: {
            <projected-property-key>: {
                property: <neo4j-property-key>,
                defaultValue: <fallback-value>
            },
            ...
            <projected-property-key>: {
                property: <neo4j-property-key>,
                defaultValue: <fallback-value>
            }
        }
    }
}
Table 4. Node Projection fields
Name Type Optional Default Description

<projected-label>

String

no

n/a

The node label in the projected graph.

label

String

yes

projected-label

The node label in the Neo4j graph. If not set, uses the projected-label.

properties

Map, List or String

yes

{}

The projected node properties for the specified projected-label.

<projected-property-key>

String

no

n/a

The key for the node property in the projected graph.

property

String

yes

projected-property-key

The node property key in the Neo4j graph. If not set, uses the projected-property-key.

defaultValue

Float

yes

Double.NaN

The default value if the property is not defined for a node.

Float[]

null

Integer

Integer.MIN_VALUE

Integer[]

null

2.2. Relationship Projection

The relationship projection specifies which relationships from the database should be projected into the in-memory GDS graph. The projection is based around relationship types, and offers three different syntaxes that can be used based on how detailed the projection needs to be.

All relationships with any of the specified relationship types and with endpoint nodes projected in the node projection will be projected to the GDS graph. The validateRelationships configuration parameter controls whether to fail or silently discard relationships with endpoint nodes not projected by the node projection. If the relationships have values for the specified properties, these will be projected as well. If a relationship does not have a value for a specified property, a default value will be used. Read more about default values below.

All specified relationship types and properties must exist in the database. To project using a non-existing relationship type, it is possible to create a relationship without any relationships using the db.createRelationshipType() procedure. Similarly, to project a non-existing property, it is possible to create a relationship property without modifying the database, using the db.createProperty() procedure.

2.2.1. Projecting a single relationship type

The simplest syntax is to specify a single relationship type as a string value.

Short-hand String-syntax for relationshipProjection. The projected graph will contain the given neo4j-type.
<neo4j-type>
Example outline:
CALL gds.graph.project(
  /* graph name */,
  /* node projection */,
  'MY_TYPE'
)

2.2.2. Projecting multiple relationship types

To project more than one relationship type, the list syntax is available. Specify all relationship types to be projected as a list of strings.

Short-hand List-syntax for relationshipProjection. The projected graph will contain the given `neo4j-type`s.
[<neo4j-type>, ..., <neo4j-type>]
Example outline:
CALL gds.graph.project(
  /* graph name */,
  /* node projection */,
  ['MY_TYPE', 'MY_SECOND_TYPE', 'ANOTHER_TYPE']
)
We also support * as the neo4j-type to load all relationships. However, this does not keep the type information. To retain the type, we recommend using CALL db.relationshipTypes() YIELD relationshipType WITH collect(relationshipType) AS allTypes.

2.2.3. Projecting relationship types with uniform relationship properties

In order to project properties in conjunction with the relationship types, the relationshipProperties configuration parameter can be used. This is a shorthand syntax to the full map-based syntax described below. The relationship properties specified with the relationshipProperties parameter will be applied to all relationship types specified in the relationship projection.

Example outline:
CALL gds.graph.project(
  /* graph name */,
  /* node projection */,
  ['MY_TYPE', 'MY_SECOND_TYPE', 'ANOTHER_TYPE'],
  { relationshipProperties: ['prop1', 'prop2] }
)

2.2.4. Projecting multiple relationship types with name mapping and type-specific properties

The full relationship projection syntax uses a map. The keys in the map are the projected relationship types. Each value specifies the projection for that relationship type. The following syntax description and table details the format and expected values. Note that it is possible to project relationship types to a type in the GDS graph with a different name.

The properties key can take a similar set of syntax variants as the relationship projection itself: a single string for a single property, a list of strings for multiple properties, or a map for the full syntax expressiveness.

Extended Map-syntax for relationshipProjection.
{
    <projected-type>: {
        type: <neo4j-type>,
        orientation: <orientation>,
        aggregation: <aggregation-type>,
        properties: <neo4j-property-key>
    },
    <projected-type>: {
        type: <neo4j-type>,
        orientation: <orientation>,
        aggregation: <aggregation-type>,
        properties: [<neo4j-property-key>, <neo4j-property-key>]
    },
    ...
    <projected-type>: {
        type: <neo4j-type>,
        orientation: <orientation>,
        aggregation: <aggregation-type>,
        properties: {
            <projected-property-key>: {
                property: <neo4j-property-key>,
                defaultValue: <fallback-value>,
                aggregation: <aggregation-type>
            },
            ...
            <projected-property-key>: {
                property: <neo4j-property-key>,
                defaultValue: <fallback-value>,
                aggregation: <aggregation-type>
            }
        }
    }
}
Table 5. Relationship Projection fields
Name Type Optional Default Description

<projected-type>

String

no

n/a

The name of the relationship type in the projected graph.

type

String

yes

projected-type

The relationship type in the Neo4j graph.

orientation

String

yes

NATURAL

Denotes how Neo4j relationships are represented in the projected graph. Allowed values are NATURAL, UNDIRECTED, REVERSE.

aggregation

String

no

NONE

Handling of parallel relationships. Allowed values are NONE, MIN, MAX, SUM, SINGLE, COUNT.

properties

Map, List or String

yes

{}

The projected relationship properties for the specified projected-type.

<projected-property-key>

String

no

n/a

The key for the relationship property in the projected graph.

property

String

yes

projected-property-key

The node property key in the Neo4j graph. If not set, uses the projected-property-key.

defaultValue

Float or Integer

yes

Double.NaN

The default value if the property is not defined for a node.

3. Examples

In order to demonstrate the GDS Graph Projection capabilities we are going to create a small social network graph in Neo4j. The example graph looks like this:

Visualization of the example graph
The following Cypher statement will create the example graph in the Neo4j database:
CREATE
  (florentin:Person { name: 'Florentin', age: 16 }),
  (adam:Person { name: 'Adam', age: 18 }),
  (veselin:Person { name: 'Veselin', age: 20, ratings: [5.0] }),
  (hobbit:Book { name: 'The Hobbit', isbn: 1234, numberOfPages: 310, ratings: [1.0, 2.0, 3.0, 4.5] }),
  (frankenstein:Book { name: 'Frankenstein', isbn: 4242, price: 19.99 }),

  (florentin)-[:KNOWS { since: 2010 }]->(adam),
  (florentin)-[:KNOWS { since: 2018 }]->(veselin),
  (florentin)-[:READ { numberOfPages: 4 }]->(hobbit),
  (florentin)-[:READ { numberOfPages: 42 }]->(hobbit),
  (adam)-[:READ { numberOfPages: 30 }]->(hobbit),
  (veselin)-[:READ]->(frankenstein)

3.1. Simple graph

A simple graph is a graph with only one node label and relationship type, i.e., a monopartite graph. We are going to start with demonstrating how to load a simple graph by projecting only the Person node label and KNOWS relationship type.

Project Person nodes and KNOWS relationships:
CALL gds.graph.project(
  'persons',            (1)
  'Person',             (2)
  'KNOWS'               (3)
)
YIELD
  graphName AS graph, nodeProjection, nodeCount AS nodes, relationshipProjection, relationshipCount AS rels
1 The name of the graph. Afterwards, persons can be used to run algorithms or manage the graph.
2 The nodes to be projected. In this example, the nodes with the Person label.
3 The relationships to be projected. In this example, the relationships of type KNOWS.
Table 6. Results
graph nodeProjection nodes relationshipProjection rels

"persons"

{Person={label=Person, properties={}}}

3

{KNOWS={aggregation=DEFAULT, indexInverse=false, orientation=NATURAL, properties={}, type=KNOWS}}

2

In the example above, we used a short-hand syntax for the node and relationship projection. The used projections are internally expanded to the full Map syntax as shown in the Results table. In addition, we can see the projected in-memory graph contains three Person nodes, and the two KNOWS relationships.

3.2. Multi-graph

A multi-graph is a graph with multiple node labels and relationship types.

To project multiple node labels and relationship types, we can adjust the projections as follows:

Project Person and Book nodes and KNOWS and READ relationships:
CALL gds.graph.project(
  'personsAndBooks',    (1)
  ['Person', 'Book'],   (2)
  ['KNOWS', 'READ']     (3)
)
YIELD
  graphName AS graph, nodeProjection, nodeCount AS nodes, relationshipCount AS rels
1 Projects a graph under the name personsAndBooks.
2 The nodes to be projected. In this example, the nodes with a Person or Book label.
3 The relationships to be projected. In this example, the relationships of type KNOWS or READ.
Table 7. Results
graph nodeProjection nodes rels

"personsAndBooks"

{Book={label=Book, properties={}}, Person={label=Person, properties={}}}

5

6

In the example above, we used a short-hand syntax for the node and relationship projection. The used projections are internally expanded to the full Map syntax as shown for the nodeProjection in the Results table. In addition, we can see the projected in-memory graph contains five nodes, and the two relationships.

3.3. Relationship orientation

By default, relationships are loaded in the same orientation as stored in the Neo4j db. In GDS, we call this the NATURAL orientation. Additionally, we provide the functionality to load the relationships in the REVERSE or even UNDIRECTED orientation.

Project Person nodes and undirected KNOWS relationships:
CALL gds.graph.project(
  'undirectedKnows',                    (1)
  'Person',                             (2)
  {KNOWS: {orientation: 'UNDIRECTED'}}  (3)
)
YIELD
  graphName AS graph,
  relationshipProjection AS knowsProjection,
  nodeCount AS nodes,
  relationshipCount AS rels
1 Projects a graph under the name undirectedKnows.
2 The nodes to be projected. In this example, the nodes with the Person label.
3 Projects relationships with type KNOWS and specifies that they should be UNDIRECTED by using the orientation parameter.
Table 8. Results
graph knowsProjection nodes rels

"undirectedKnows"

{KNOWS={aggregation=DEFAULT, indexInverse=false, orientation=UNDIRECTED, properties={}, type=KNOWS}}

3

4

To specify the orientation, we need to write the relationshipProjection with the extended Map-syntax. Projecting the KNOWS relationships UNDIRECTED, loads each relationship in both directions. Thus, the undirectedKnows graph contains four relationships, twice as many as the persons graph in Simple graph.

3.4. Node properties

To project node properties, we can either use the nodeProperties configuration parameter for shared properties, or extend an individual nodeProjection for a specific label.

Project Person and Book nodes and KNOWS and READ relationships:
CALL gds.graph.project(
  'graphWithProperties',                                (1)
  {                                                     (2)
    Person: {properties: 'age'},                        (3)
    Book: {properties: {price: {defaultValue: 5.0}}}    (4)
  },
  ['KNOWS', 'READ'],                                    (5)
  {nodeProperties: 'ratings'}                           (6)
)
YIELD
  graphName, nodeProjection, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodeProjection.Book AS bookProjection, nodes, rels
1 Projects a graph under the name graphWithProperties.
2 Use the expanded node projection syntax.
3 Projects nodes with the Person label and their age property.
4 Projects nodes with the Book label and their price property. Each Book that doesn’t have the price property will get the defaultValue of 5.0.
5 The relationships to be projected. In this example, the relationships of type KNOWS or READ.
6 The global configuration, projects node property rating on each of the specified labels.
Table 9. Results
graphName bookProjection nodes rels

"graphWithProperties"

{label=Book, properties={price={defaultValue=5.0, property=price}, ratings={defaultValue=null, property=ratings}}}

5

6

The projected graphWithProperties graph contains five nodes and six relationships. In the returned bookProjection we can observe, the node properties price and ratings are loaded for Books.

GDS currently only supports loading numeric properties.

Further, the price property has a default value of 5.0. Not every book has a price specified in the example graph. In the following we check if the price was correctly projected:

Verify the ratings property of Adam in the projected graph:
MATCH (n:Book)
RETURN n.name AS name, gds.util.nodeProperty('graphWithProperties', id(n), 'price') as price
ORDER BY price
Table 10. Results
name price

"The Hobbit"

5.0

"Frankenstein"

19.99

We can see, that the price was projected with the Hobbit having the default price of 5.0.

3.5. Relationship properties

Analogous to node properties, we can either use the relationshipProperties configuration parameter or extend an individual relationshipProjection for a specific type.

Project Person and Book nodes and READ relationships with numberOfPages property:
CALL gds.graph.project(
  'readWithProperties',                     (1)
  ['Person', 'Book'],                       (2)
  {                                         (3)
    READ: { properties: "numberOfPages" }   (4)
  }
)
YIELD
  graphName AS graph,
  relationshipProjection AS readProjection,
  nodeCount AS nodes,
  relationshipCount AS rels
1 Projects a graph under the name readWithProperties.
2 The nodes to be projected. In this example, the nodes with a Person or Book label.
3 Use the expanded relationship projection syntax.
4 Project relationships of type READ and their numberOfPages property.
Table 11. Results
graph readProjection nodes rels

"readWithProperties"

{READ={aggregation=DEFAULT, indexInverse=false, orientation=NATURAL, properties={numberOfPages={defaultValue=null, property=numberOfPages, aggregation=DEFAULT}}, type=READ}}

5

4

Next, we will verify that the relationship property numberOfPages were correctly loaded.

Stream the relationship property numberOfPages of the projected graph:
CALL gds.graph.relationshipProperty.stream('readWithProperties', 'numberOfPages')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY person ASC, numberOfPages DESC
Table 12. Results
person book numberOfPages

"Adam"

"The Hobbit"

30.0

"Florentin"

"The Hobbit"

42.0

"Florentin"

"The Hobbit"

4.0

"Veselin"

"Frankenstein"

NaN

We can see, that the numberOfPages property is loaded. The default property value is Double.NaN and could be changed using the Map-Syntax the same as for node properties in Node properties.

3.6. Parallel relationships

Neo4j supports parallel relationships, i.e., multiple relationships between two nodes. By default, GDS preserves parallel relationships. For some algorithms, we want the projected graph to contain at most one relationship between two nodes.

We can specify how parallel relationships should be aggregated into a single relationship via the aggregation parameter in a relationship projection.

For graphs without relationship properties, we can use the COUNT aggregation. If we do not need the count, we could use the SINGLE aggregation.

Project Person and Book nodes and COUNT aggregated READ relationships:
CALL gds.graph.project(
  'readCount',                      (1)
  ['Person', 'Book'],               (2)
  {
    READ: {                         (3)
      properties: {
        numberOfReads: {            (4)
          property: '*',            (5)
          aggregation: 'COUNT'      (6)
        }
      }
    }
  }
)
YIELD
  graphName AS graph,
  relationshipProjection AS readProjection,
  nodeCount AS nodes,
  relationshipCount AS rels
1 Projects a graph under the name readCount.
2 The nodes to be projected. In this example, the nodes with a Person or Book label.
3 Project relationships of type READ.
4 Project relationship property numberOfReads.
5 A placeholder, signaling that the value of the relationship property is derived and not based on Neo4j property.
6 The aggregation type. In this example, COUNT results in the value of the property being the number of parallel relationships.
Table 13. Results
graph readProjection nodes rels

"readCount"

{READ={aggregation=DEFAULT, indexInverse=false, orientation=NATURAL, properties={numberOfReads={defaultValue=null, property=*, aggregation=COUNT}}, type=READ}}

5

3

Next, we will verify that the READ relationships were correctly aggregated.

Stream the relationship property numberOfReads of the projected graph:
CALL gds.graph.relationshipProperty.stream('readCount', 'numberOfReads')
YIELD sourceNodeId, targetNodeId, propertyValue AS numberOfReads
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfReads
ORDER BY numberOfReads DESC, person
Table 14. Results
person book numberOfReads

"Florentin"

"The Hobbit"

2.0

"Adam"

"The Hobbit"

1.0

"Veselin"

"Frankenstein"

1.0

We can see, that the two READ relationships between Florentin, and the Hobbit result in 2 numberOfReads.

3.7. Parallel relationships with properties

For graphs with relationship properties we can also use other aggregations.

Project Person and Book nodes and aggregated READ relationships by summing the numberOfPages:
CALL gds.graph.project(
  'readSums',                                                   (1)
  ['Person', 'Book'],                                           (2)
  {READ: {properties: {numberOfPages: {aggregation: 'SUM'}}}}   (3)
)
YIELD
  graphName AS graph,
  relationshipProjection AS readProjection,
  nodeCount AS nodes,
  relationshipCount AS rels
1 Projects a graph under the name readSums.
2 The nodes to be projected. In this example, the nodes with a Person or Book label.
3 Project relationships of type READ. Aggregation type SUM results in a projected numberOfPages property with its value being the sum of the numberOfPages properties of the parallel relationships.
Table 15. Results
graph readProjection nodes rels

"readSums"

{READ={aggregation=DEFAULT, indexInverse=false, orientation=NATURAL, properties={numberOfPages={defaultValue=null, property=numberOfPages, aggregation=SUM}}, type=READ}}

5

3

Next, we will verify that the relationship property numberOfPages was correctly aggregated.

Stream the relationship property numberOfPages of the projected graph:
CALL gds.graph.relationshipProperty.stream('readSums', 'numberOfPages')
YIELD
  sourceNodeId, targetNodeId, propertyValue AS numberOfPages
RETURN
  gds.util.asNode(sourceNodeId).name AS person,
  gds.util.asNode(targetNodeId).name AS book,
  numberOfPages
ORDER BY numberOfPages DESC, person
Table 16. Results
person book numberOfPages

"Florentin"

"The Hobbit"

46.0

"Adam"

"The Hobbit"

30.0

"Veselin"

"Frankenstein"

0.0

We can see, that the two READ relationships between Florentin and the Hobbit sum up to 46 numberOfReads.

3.8. Validate relationships flag

As mentioned in the syntax section, the validateRelationships flag controls whether an error will be raised when attempting to project a relationship where either the source or target node is not present in the node projection. Note that even if the flag is set to false such a relationship will still not be projected but the loading process will not be aborted.

We can simulate such a case with the graph present in the Neo4j database:

Project READ and KNOWS relationships but only Person nodes, with validateRelationships set to true:
CALL gds.graph.project(
  'danglingRelationships',
  'Person',
  ['READ', 'KNOWS'],
  {
    validateRelationships: true
  }
)
YIELD
  graphName AS graph,
  relationshipProjection AS readProjection,
  nodeCount AS nodes,
  relationshipCount AS rels
Results
org.neo4j.graphdb.QueryExecutionException: Failed to invoke procedure `gds.graph.project`: Caused by: java.lang.IllegalArgumentException: Failed to load a relationship because its target-node with id 3 is not part of the node query or projection. To ignore the relationship, set the configuration parameter `validateRelationships` to false.

We can see that the above query resulted in an exception being thrown. The exception message will provide information about the specific node id that was missing, which will help debugging underlying problems.