Creating a subgraph

This section details how to create subgraphs from existing graphs stored in the graph catalog of the Neo4j Graph Data Science library.

In GDS, algorithms can be executed on a named graph that has been filtered based on its node labels and relationship types. However, that filtered graph only exists during the execution of the algorithm and it is not possible to filter on property values. If a filtered graph needs to be used multiple times, one can use the subgraph catalog procedure to create a new graph in the graph catalog.

The filter predicates in the subgraph procedure can take labels, relationship types as well as node and relationship properties into account. The new graph can be used in the same way as any other in-memory graph in the catalog. Creating subgraphs of subgraphs is also possible.

1. Syntax

A new graph can be created by using the gds.beta.graph.create.subgraph() procedure:
CALL gds.beta.graph.create.subgraph(
  graphName: String,
  fromGraphName: String,
  nodeFilter: String,
  relationshipFilter: String,
  configuration: Map
) YIELD
  graphName: String,
  fromGraphName: String,
  nodeFilter: String,
  relationshipFilter: String,
  nodeCount: Integer,
  relationshipCount: Integer,
  createMillis: Integer
Table 1. Parameters
Name Type Description

graphName

String

The name of the new graph that is stored in the graph catalog.

fromGraphName

String

The name of the original graph in the graph catalog.

nodeFilter

String

A Cypher predicate for filtering nodes in the input graph. * can be used to allow all nodes.

relationshipFilter

String

A Cypher predicate for filtering relationships in the input graph. * can be used to allow all relationships.

configuration

Map

Additional parameters to configure subgraph creation.

Table 2. Subgraph specific configuration
Name Type Default Optional Description

concurrency

Integer

4

yes

The number of concurrent threads used for filtering the graph.

Table 3. Results
Name Type Description

graphName

String

The name of the new graph that is stored in the graph catalog.

fromGraphName

String

The name of the original graph in the graph catalog.

nodeFilter

String

Filter predicate for nodes.

relationshipFilter

String

Filter predicate for relationships.

nodeCount

Integer

Number of nodes in the subgraph.

relationshipCount

Integer

Number of relationships in the subgraph.

createMillis

Integer

Milliseconds for creating the subgraph.

The nodeFilter and relationshipFilter configuration keys can be used to express filter predicates. Filter predicates are Cypher predicates bound to a single entity. An entity is either a node or a relationship. The filter predicate always needs to evaluate to true or false. A node is contained in the subgraph if the node filter evaluates to true. A relationship is contained in the subgraph if the relationship filter evaluates to true and its source and target nodes are contained in the subgraph.

A predicate is a combination of expressions. The simplest form of expression is a literal. GDS currently supports the following literals:

  • float literals, e.g., 13.37

  • integer literals, e.g., 42

  • boolean literals, i.e., TRUE and FALSE

Property, label and relationship type expressions are bound to an entity. The node entity is always identified by the variable n, the relationship entity is identified by r. Using the variable, we can refer to:

  • node label expression, e.g., n:Person

  • relationship type expression, e.g., r:KNOWS

  • node property expression, e.g., n.age

  • relationship property expression, e.g., r.since

Boolean predicates combine two expressions and return either true or false. GDS supports the following boolean predicates:

  • greater/lower than, such as n.age > 42 or r.since < 1984

  • greater/lower than or equal, such as n.age > 42 or r.since < 1984

  • equality, such as n.age = 23 or r.since = 2020

  • logical operators, such as

  • n.age > 23 AND n.age < 42

  • n.age = 23 OR n.age = 42

  • n.age = 23 XOR n.age = 42

  • n.age IS NOT 23

Variable names that can be used within predicates are not arbitrary. A node predicate must refer to variable n. A relationship predicate must refer to variable r.

2. Examples

In order to demonstrate the GDS create subgraph capabilities we are going to create a small social graph in Neo4j.

The following Cypher statement will create the example graph in the Neo4j database:
CREATE
  (p0:Person { age: 16 }),
  (p1:Person { age: 18 }),
  (p2:Person { age: 20 }),
  (b0:Book   { isbn: 1234 }),
  (b1:Book   { isbn: 4242 }),
  (p0)-[:KNOWS { since: 2010 }]->(p1),
  (p0)-[:KNOWS { since: 2018 }]->(p2),
  (p0)-[:READS]->(b0),
  (p1)-[:READS]->(b0),
  (p2)-[:READS]->(b1)
Project the social network graph:
CALL gds.graph.create(
  'social-graph',
  {
    Person: { properties: 'age' },    (1)
    Book: {}                          (2)
  },
  {
    KNOWS: { properties: 'since' },   (3)
    READS: {}                         (4)
  }
)
YIELD graphName, nodeCount, relationshipCount, createMillis
1 Project Person nodes with their age property.
2 Project Book nodes without any of their properties.
3 Project KNOWS relationships with their since property.
4 Project READS relationships without any of their properties.

2.1. Node filtering

Create a new graph containing only users of a certain age group:
CALL gds.beta.graph.create.subgraph(
  'teenagers',
  'social-graph',
  'n.age > 13 AND n.age <= 18',
  '*'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 4. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers"

"social-graph"

2

1

2.2. Node and relationship filtering

Create a new graph containing only users of a certain age group that know each other since a given point a time:
CALL gds.beta.graph.create.subgraph(
  'teenagers',
  'social-graph',
  'n.age > 13 AND n.age <= 18',
  'r.since >= 2012'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 5. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers"

"social-graph"

2

0

2.3. Bipartite subgraph

Create a new bipartite graph between books and users connected by the READS relationship type:
CALL gds.beta.graph.create.subgraph(
  'teenagers-books',
  'social-graph',
  'n:Book OR n:Person',
  'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 6. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers-books"

"social-graph"

5

3

2.4. Bipartite graph node filtering

The previous example can be extended with an additional filter applied only to persons:
CALL gds.beta.graph.create.subgraph(
  'teenagers-books',
  'social-graph',
  'n:Book OR (n:Person AND n.age > 18)',
  'r:READS'
)
YIELD graphName, fromGraphName, nodeCount, relationshipCount
Table 7. Results
graphName fromGraphName nodeCount relationshipCount

"teenagers-books"

"social-graph"

3

1