Cypher projection

This is documentation for the Graph Algorithms Library, which has been deprecated by the Graph Data Science Library (GDS).

If the node-label and relationship-type projection is not selective enough to describe our subgraph to run the algorithm on, we can use Cypher statements to project subsets of our graph. Use a node-statement instead of the label parameter and a relationship-statement instead of the relationship type, and use graph:'cypher' in the config.

Relationships described in the relationship-statement will only be projected if both source and target nodes are described in the node-statement, otherwise they will be ignored.

Cypher projection enables us to be more expressive in describing our subgraph that we want to analyse, but might take longer to project the graph with more complex cypher queries.

The general call syntax is:

CALL algo.<name>(
  'MATCH (n) RETURN id(n) AS id',
  "MATCH (n)-->(m) RETURN id(n) AS source, id(m) AS target",
  {graph: "cypher"})
  • The first query MATCH (n) RETURN id(n) AS id returns node ids. The Cypher loader expects the query to return an id field.

  • The second query MATCH (n)-→(m) RETURN id(n) AS source, id(m) AS target returns pairs of node ids that have a relationship between them in our projected graph. The Cypher loader expects the query to return source and target fields.

Note that in both queries we use the id function to return the node id.

1. Weights

We can also return a property value or weight (according to our config) in addition to the ids from these statements. We do this by returning an optional weight field.

The following will run the algortihm over a graph based on Cypher projections for nodes and relationships, using the score property of each relationship as weight:
CALL algo.<name>(
  'MATCH (n) RETURN id(n) AS id',
  "MATCH (n)-[r]->(m) RETURN id(n) AS source, id(m) AS target, r.score AS weight",
  {graph: "cypher"})

2. Parameters

We can pass through parameters into our Cypher projection queries via the params config option.

The following will run the algorithm over a graph based on Cypher projections that use the $firstName and $lastName parameters:
CALL algo.<name>(
  'MATCH (n)
   WHERE n.name CONTAINS $firstName
   RETURN id(n) AS id',
  "MATCH (n)-->(m)
   WHERE n.name CONTAINS $lastName OR m.name CONTAINS $lastName
   RETURN id(n) AS source, id(m) AS target",
  {graph: "cypher", params: {firstName: $firstName, lastName: $lastName} })

3. Example Usage

We could use Cypher projections to run the PageRank algorithm on DBpedia, as shown in the following examples.

The following will run the PageRank algorithm over a graph based on Cypher projections for nodes and relationships:
CALL algo.pageRank(
  'MATCH (p:Page) RETURN id(p) as id',
  'MATCH (p1:Page)-[:Link]->(p2:Page) RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:5, write: true});

Cypher projection can also be used to project a virtual (non-stored) graph. Here is an example of how to project an undirected graph of people who visited the same web page and run the Louvain community detection algorithm on it, using the number of common visited web pages between pairs of people as relationship weight:

The following will run the Louvain algortihm over a graph based on Cypher projections for nodes and relationships:
CALL algo.louvain(
  'MATCH (p:Person) RETURN id(p) as id',
  'MATCH (p1:Person)-[:VISIT]->(:Page)<-[:VISIT]-(p2:Person)
   RETURN id(p1) as source, id(p2) as target, count(*) as weight',
  {graph:'cypher', iterations:5, write: true});

4. Parallel loading

By default, Cypher projection queries are run on a single thread.

We can parallelize loading by including SKIP clause and $skip parameter, as well as LIMIT clause and $limit parameter in our projection queries. Parallelization of the queries is based on the batchSize key in the config, which has a default value of 10,000.

If the parameters are not named $skip and $limit, they will be ignored, and the sequential loading approach will be used.

The following runs PageRank over a graph based on Cypher projections, with nodes loaded in parallel:
CALL algo.pageRank(
  'MATCH (p:Page) WITH p SKIP $skip LIMIT $limit RETURN id(p) as id',
  'MATCH (p1:Page)-[:Link]->(p2:Page) RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:5, write: true});
The following runs PageRank over a graph based on Cypher projections, with relationships loaded in parallel:
CALL algo.pageRank(
  'MATCH (p:Page) RETURN id(p) as id',
  'MATCH (p1:Page)-[:Link]->(p2:Page) WITH * SKIP $skip LIMIT $limit RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:5, write: true});
The following runs PageRank over a graph based on Cypher projections, with nodes and relationships loaded in parallel:
CALL algo.pageRank(
  'MATCH (p:Page) WITH p SKIP $skip LIMIT $limit RETURN id(p) as id',
  'MATCH (p1:Page)-[:Link]->(p2:Page) WITH * SKIP $skip LIMIT $limit RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:5, write: true});

If node label and relationship type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Cypher projection section of the manual.

If the similarity lists are very large they can take up a lot of memory. For cases where those lists contain lots of values that should be skipped, you can use the less memory-intensive approach of using Cypher statements to project the graph instead.

The Cypher loader expects to receive 3 fields:

  • item - should contain node ids, which we can return using the id function.

  • category - should contain node ids, which we can return using the id function.

  • weight - should contain a double value.