4.2. Native projection

This section explains native projections in the Neo4j Graph Data Science library.

A native projection allows us to project a graph from Neo4j into an in-memory graph. The projected graph can be specified in terms of node labels, relationship types and properties. Node labels and node properties are projected using Node projections. Relationship types and relationship properties are projected using Relationship projections.

The main benefit of native projections is their performance. In contrast, a Cypher projection is more flexible from the declaration point of view, but less performant. In most cases it is possible to structure your Neo4j graph model in a way that enables native projections to be used.

This section includes:

4.2.1. Syntax

A native projection takes three mandatory arguments: graphName, nodeProjection and relationshipProjection. In addition, the optional configuration parameter allows us to further configure graph creation.

CALL gds.graph.create(
    graphName: String,
    nodeProjection: String, List or Map,
    relationshipProjection: String, List or Map,
    configuration: Map
)
Table 4.3. Parameters
Name Optional Description

graphName

no

The name under which the graph is stored in the catalog.

nodeProjection

no

One or more node projections.

relationshipProjection

no

One or more relationship projections.

configuration

yes

Additional parameters to configure the native projection.

Table 4.4. Configuration
Name Type Default Description

readConcurrency

Integer

4

The number of concurrent threads used for creating the graph.

nodeProperties

String, List or Map

empty map

Node properties to load for all node projections.

relationshipProperties

String, List or Map

empty map

Relationship properties to load for all relationship projections.

validateRelationships

Boolean

false

Whether to throw an error if relationships contain nodes not included in the nodeProjection.

To get information about a stored named graph, including its schema, one can use gds.graph.list.

4.2.2. Node projections

A node projection enables mapping Neo4j nodes into the in-memory graph. When specifying a node projection, we can declare one or more node labels that we want to project.

The following map-like syntax shows the general way of defining node projections:

{
    <node-label-1>: {
        label: <neo4j-label>,
        properties: <node-property-mappings>
    },
    <node-label-2>: {
        label: <neo4j-label>,
        properties: <node-property-mappings>
    },
    // ...
    <node-label-n>: {
        label: <neo4j-label>,
        properties: <node-property-mappings>
    }
}
  • node-label-i denotes the node label used in the projected graph

    • neo4j-label denotes the node label in the Neo4j graph

      • The label must exist in the Neo4j database
      • If not specified, neo4j-label defaults to node-label-i
    • node-property-mappings denotes a set of mappings between Neo4j and in-memory properties

In the following example, we want to project Person nodes into the in-memory graph. The resulting graph contains all Neo4j nodes that have the label Person.

CALL gds.graph.create(
    'my-graph', {
        Person: { label: 'Person' }
    },
    '*'
)
YIELD graphName, nodeCount, relationshipCount;

We can use the following shorthand syntax to project a single node label.

CALL gds.graph.create('my-graph', 'Person', '*')
YIELD graphName, nodeCount, relationshipCount;

It is often useful to create an in-memory graph representing more than one node label. Let’s assume, the database contains Person and City nodes which we want to project.

CALL gds.graph.create(
    'my-graph', {
        Person: { label: 'Person' },
        City: { label: 'City' }
    },
    '*'
)
YIELD graphName, nodeCount, relationshipCount;

The following shorthand syntax can be used to project multiple labels.

CALL gds.graph.create('my-graph', ['Person', 'City'], '*')
YIELD graphName, nodeCount, relationshipCount;

Projecting multiple node labels enables algorithms to only use a subset of those.

// Uses `Person` nodes for computing Page Rank scores between persons
CALL gds.pageRank.stream('my-graph', { nodeLabels: ['Person'] }) YIELD nodeId, score;

To project all nodes in the Neo4j graph, we can use the special * ('star') node projection.

CALL gds.graph.create('my-graph', '*', '*')
YIELD graphName, nodeCount, relationshipCount;

When a graph uses the special star projection, additional nodes from named node projections are not added additionally as the star projection already contains them. This also applies to filtering. For example, if a graph was projected with labels 'A', 'B' and also with the special star '*', any node filtering that includes the star is equivalent to using only the star. It is however still possible to filter for nodes with label 'A' or 'B' in order to retrieve a subgraph containing only those nodes.

Example usage of node label filters: 

CALL gds.graph.create('myGraph', ['Person', 'City', '*'])

 // using all nodes projected in 'myGraph' (the star projection)
CALL gds.pageRank.stats('myGraph', {nodeLabels: ['*']})
CALL gds.pageRank.stats('myGraph', {nodeLabels: ['Person', 'City', '*']}) // equivalent
CALL gds.pageRank.stats('myGraph') // equivalent

// will use only nodes projected as 'Person' in 'myGraph'
CALL gds.pageRank.stats('myGraph', {nodeLabels: ['Person']})

// will use nodes projected as 'Person' or 'City' in 'myGraph'
CALL gds.pageRank.stats('myGraph', {nodeLabels: ['Person', 'City']})

4.2.2.1. Node properties

It is often useful to load an in-memory graph with more than one node property. A typical scenario is running different seedable algorithms on the same graph, but with different node properties as seed. We can load multiple node properties for each node projection using node property mappings. A node property mapping maps a user-defined property key to a property key in the Neo4j database. Any algorithm that supports node properties can refer to these user-defined property keys. For more information about the supported property types see Node Properties.

{
    <node-label>: {
        label: <neo4j-label>,
        properties: {
            <property-key-1>: {
                property: <neo-property-key>,
                defaultValue: <fallback-value>
            },
            <property-key-2>: {
                property: <neo-property-key>,
                defaultValue: <fallback-value>
            },
            // ...
            <property-key-n>: {
                property: <neo-property-key>,
                defaultValue: <fallback-value>
            }
        }
    }
}
  • property-key-i denotes the property key in the projected graph

    • neo-property-key denotes the property key in the Neo4j graph

      • The property key must exist in the Neo4j database
      • If not specified, neo-property-key defaults to property-key-i
    • fallback-value is used if the property does not exist for a node

      • If not specified, fallback-value defaults to the respective fallback value of the properties type.

For the following example, let’s assume that each City node stores two properties: the population of the city and an optional stateId that identifies the state in which the city is located. We want to project both properties and project stateId to the custom property key community.

Create a graph with multiple node properties: 

CALL gds.graph.create(
    'my-graph', {
        City: {
            properties: {
                stateId: {
                    property: 'stateId'
                },
                population: {
                    property: 'population'
                }
            }
        }
    },
    '*'
)
YIELD graphName, nodeCount, relationshipCount;

If we do not need to rename the node property keys or give a default value, we can use the following shorthand syntax.

CALL gds.graph.create('my-graph', 'City', '*', {
        nodeProperties: ['population', 'stateId']
    }
)
YIELD graphName, nodeCount, relationshipCount;

It is also possible to rename the property key during projection. In the example, we project the property key stateId to a custom property key community. When we use the projected graph in an algorithm, we refer to the custom property key instead.

Project node properties for all projected node labels: 

CALL gds.graph.create('my-graph', 'City', '*', {
        nodeProperties: ['population', { community: 'stateId' }]
    }
)
YIELD graphName, nodeCount, relationshipCount;

The projected properties can be referred to by any algorithm that uses properties as input, for example, Label Propagation.

CALL gds.labelPropagation.stream(
    'my-graph', {
        seedProperty: 'community'
    }
) YIELD nodeId, communityId;

4.2.3. Relationship projections

A relationship projection defines how a specific subset of Neo4j relationships is projected into the in-memory graph.

The following map-like syntax shows the general way of defining relationship projections:

{
    <relationship-type-1>: {
        type: <neo4j-type>,
        orientation: <orientation>,
        aggregation: <aggregation-type>,
        properties: <relationship-property-mappings>
    },
    <relationship-type-2>: {
        type: <neo4j-type>,
        orientation: <orientation>,
        aggregation: <aggregation-type>,
        properties: <relationship-property-mappings>
    },
    // ...
    <relationship-type-n>: {
        type: <neo4j-type>,
        orientation: <orientation>,
        aggregation: <aggregation-type>,
        properties: <relationship-property-mappings>
    }
}
  • relationship-type-i denotes the relationship type in the projected graph

    • neo4j-type denotes the relationship type in the Neo4j graph

      • The relationship type must exist in the Neo4j database
      • If not specified, neo4j-type defaults to relationship-type-i
    • orientation denotes how Neo4j relationships are represented in the projected graph. The following values are allowed:

      • NATURAL: each relationship is projected the same way as it is stored in Neo4j (default)
      • REVERSE: each relationship is reversed during graph projection
      • UNDIRECTED: each relationship is projected in both natural and reverse orientation
    • aggregation-type denotes how parallel relationships and their properties are handled. The specified value is applied to all property mappings that have no aggregation specified. The following values are allowed:

      • NONE: parallel relationships are not aggregated (default)
      • MIN, MAX, SUM: applied to the numeric properties of parallel relationships
      • SINGLE: a single, arbitrary relationship out of the parallel relationships is projected
      • COUNT: counts the number of non-null numeric properties

        • If the special property name '*' is used, COUNT will count parallel relationships
    • relationship-property-mappings denotes a set of mappings between Neo4j and in-memory relationship properties

In the following example, we want to project City nodes as well as ROAD and RAIL relationships into the in-memory graph.

CALL gds.graph.create(
    'my-graph',
    'City',
    {
        ROAD: {
            type: 'ROAD',
            orientation: 'NATURAL'
        },
        RAIL: {
            type: 'RAIL',
            orientation: 'NATURAL'
        }
    }
)
YIELD graphName, nodeCount, relationshipCount;

In the above example, we are using the same relationship type as in the Neo4j database as well as the default orientation. In that case we can use the following syntactic sugar, similar to node projections.

CALL gds.graph.create( 'my-graph', 'City', ['ROAD', 'RAIL'])
YIELD graphName, nodeCount, relationshipCount;

Projecting multiple relationship types enables algorithms to only use a subset of those.

// Uses `ROAD` relationships for computing Page Rank of cities
CALL gds.pageRank.stream('my-graph', { relationshipTypes: ['ROAD'] }) YIELD nodeId, score;

// Uses `RAIL` relationships for computing Page Rank of cities
CALL gds.pageRank.stream('my-graph', { relationshipTypes: ['RAIL'] }) YIELD nodeId, score;

4.2.3.1. Projection orientation

By default, relationships are projected in their natural representation, i.e., in the same way as they are stored in Neo4j. Using the orientation key within a relationship projection definition, we can alter that behaviour. There are three possible values: NATURAL, REVERSE and UNDIRECTED which can be best described from a node’s perspective:

  • NATURAL is the default behaviour and projects relationships that are pointing away from a node.
  • REVERSE projects relationships that are pointing towards a node.
  • UNDIRECTED projects relationships in both, natural and reversed order.

Consider the following graph containing Person nodes connected by KNOWS relationships. A KNOWS relationship is directed, as one person might know another person, but not necessarily the other way around.

CREATE (alice:Person {name: 'Alice'})
CREATE (bob:Person {name: 'Bob'})
CREATE (eve:Person {name: 'Eve'})

CREATE (alice)-[:KNOWS]->(bob)
CREATE (bob)-[:KNOWS]->(eve)
CREATE (eve)-[:KNOWS]->(bob);

In a NATURAL projection, Alice has one relationship to Bob, Bob has one relationship to Eve who in turn also has one relationship to Bob. In a REVERSE projection, Alice has no relationships as there is no relationship pointing towards Alice. Bob and Eve would have one relationship each, as they point to each other. In an UNDIRECTED projection, Alice would have one relationship representing the outgoing relationship. However, Bob and Eve would have two relationships each as the outgoing and incoming relationships are viewed independently.

To create a graph projection with different projection types, we use the following syntax:

CALL gds.graph.create(
    'my-graph',
    'Person',
    {
        KNOWS: {
            type: 'KNOWS',
            orientation: 'NATURAL'
        },
        KNOWN_BY: {
            type: 'KNOWS',
            orientation: 'REVERSE'
        },
        FRIEND_OF: {
            type: 'KNOWS',
            orientation: 'UNDIRECTED'
        }
    }
)
YIELD graphName, nodeCount, relationshipCount;

As in the previous example, we can refer to a subset of the projected relationships when running an algorithm. If we run the examples, we can see different ranks for the individual nodes. The Page Rank algorithm evenly distributes ranks along the relationships of a node. In the reverse case, Alice has no relationships which leads to a different result.

// Uses `KNOWS` relationships for computing Page Rank of persons
CALL gds.pageRank.stream('my-graph', { relationshipTypes: ['KNOWS'] }) YIELD nodeId, score;

// Uses `KNOWN_BY` relationships for computing Page Rank based on reversed relationships
CALL gds.pageRank.stream('my-graph', { relationshipTypes: ['KNOWN_BY'] }) YIELD nodeId, score;

// Uses `FRIEND_OF` relationships for computing Page Rank based on both projection types
CALL gds.pageRank.stream('my-graph', { relationshipTypes: ['FRIEND_OF'] }) YIELD nodeId, score;

Creating a projection consumes additional memory as those projections are stored in individual in-memory data structures. Sometimes it is possible to combine relationship projections instead of creating a new one. In the above example, the FRIEND_OF projection is equivalent to using ['KNOWS', 'KNOWN_BY'] as a relationship type predicate. This is not possible, if we use different aggregations for the single projections.

4.2.3.2. Relationship properties

Similar to node properties, relationship projections support specifying relationship properties. We can specify multiple relationship properties for each relationship projection using relationship property mappings. A relationship property mapping maps a user-defined property key to a property key in the Neo4j database. The parameter is configured using a map in which each key refers to a user-defined property key. Relationships only support numeric properties.

The following map-like syntax shows the general way of defining relationship property mappings:

{
    <relationship-type-1>: {
        type: <neo4j-type>,
        orientation: <orientation-type>,
        aggregation: <aggregation-type>,
        properties: {
            <property-key-1>: {
                property: <neo4j-property-key>,
                defaultValue: <numeric-value>,
                aggregation: <aggregation-type>
            },
            <property-key-2>: {
                property: <neo4j-property-key>,
                defaultValue: <numeric-value>,
                aggregation: <aggregation-type>
            },
            // ...
            <property-key-n>: {
                property: <neo4j-property-key>,
                defaultValue: <numeric-value>,
                aggregation: <aggregation-type>
            }
        }
    }
}
  • property-key-i denotes the name of the property in the projected graph

    • neo4j-property-key denotes the name of the property in the Neo4j graph

      • The property key must exist in the Neo4j database
      • neo4j-property-key defaults to property-key-i
      • The special property key '*' is allowed in combination with the COUNT aggregation
    • numeric-value is used if the property does not exist for a relationship

      • numeric-value defaults to NaN
    • aggregation-type denotes how properties of parallel relationships are handled. The specified value overrides the aggregation type specified for the enclosing relationship projection. The following values are allowed:

      • NONE: parallel relationships are not aggregated (default)
      • MIN, MAX, SUM: applied to the numeric properties of parallel relationships
      • SINGLE: a single, arbitrary relationship out of the parallel relationships is projected
      • COUNT: counts the number of non-null numeric properties

        • If the special property name '*' is used, COUNT will count parallel relationships

In the following example, we want to project City nodes and ROAD relationships. For nodes we project the stateId property.

Create a graph with multiple node and relationship properties: 

CALL gds.graph.create(
    'my-graph', {
        City: {
            properties: {
                community: {
                    property: 'stateId'
                }
            }
        }
    }, {
        ROAD: {
            properties: {
                quality: {
                    property: 'condition'
                },
                distance: {
                    property: 'length'
                }
            }
        }
    }
)
YIELD graphName, nodeCount, relationshipCount;

We can use the following shorthand syntax to express the same projection.

CALL gds.graph.create(
    'my-graph', 'City', 'ROAD', {
        nodeProperties: { community: 'stateId' },
        relationshipProperties: [{ quality: 'condition' }, { distance: 'length' }]
    }
)
YIELD graphName, nodeCount, relationshipCount;

The projected properties can be referred to by any algorithm that uses properties as input, for example Label Propagation.

// Option 1: Use the road quality as relationship weight
CALL gds.labelPropagation.stream(
    'my-graph', {
        seedProperty: 'community',
        relationshipWeightProperty: 'quality'
    }
) YIELD nodeId, communityId;
// Option 2: Use the distance between cities as relationship weight
CALL gds.labelPropagation.stream(
    'my-graph', {
        seedProperty: 'community',
        relationshipWeightProperty: 'distance'
    }
) YIELD nodeId, communityId;

4.2.3.3. Relationship aggregations

Relationship projections offer different ways of handling multiple - so called "parallel" - relationships between a given pair of nodes. The default is the NONE aggregation which keeps all parallel relationships and directly projects them into the in-memory graph. All other aggregations project all the parallel relationships between a pair of nodes into a single relationship.

In the following example, we want to aggregate all ROAD relationships between two cities to a single relationship. While doing so, we compute the maximum quality of the parallel relationships and store it on the resulting relationship.

Create a graph with aggregated parallel relationships using the maximum value of the condition property: 

CALL gds.graph.create(
    'my-graph', {
        City: {
            properties: {
                community: {
                    property: 'stateId'
                }
            }
        }
    }, {
        ROAD: {
            properties: {
                maxQuality: {
                    property: 'condition',
                    aggregation: 'MAX',
                    defaultValue: 1.0
                }
            }
        }
    }
)
YIELD graphName, nodeCount, relationshipCount;

Create a graph with aggregated relationships using the parallel relationship count as a relationship property: 

CALL gds.graph.create(
    'my-graph', {
        City: {
            properties: {
                community: {
                    property: 'stateId'
                }
            }
        }
    }, {
        ROAD: {
            properties: {
                roadCount: {
                    property: 'condition',
                    aggregation: 'COUNT'
                }
            }
        }
    }
)
YIELD graphName, nodeCount, relationshipCount;

Since we have only one node projection and one relationship projection, we can use the following shorthand syntax.

CALL gds.graph.create(
    'my-graph', 'City', 'ROAD', {
        nodeProperties: { community: 'stateId' },
        relationshipProperties: { maxQuality: { property: 'condition', aggregation: 'MAX', defaultValue: 1.0 }}
    }
)
YIELD graphName, nodeCount, relationshipCount;

As before, the projected properties can be referred to by any algorithm that uses properties as input, for example Label Propagation.

CALL gds.labelPropagation.stream(
    'my-graph', {
        seedProperty: 'community',
        relationshipWeightProperty: 'maxQuality'
    }
) YIELD nodeId, communityId;