PageRank

Introduction

The PageRank algorithm measures the importance of each node within the graph, based on the number of incoming relationships and the importance of the corresponding source nodes. The underlying assumption roughly speaking is that a page is only as important as the pages that link to it.

PageRank is introduced in the original Google paper as a function that solves the following equation:

page rank formula

where,

  • we assume that a page A has pages T1 to Tn which point to it.

  • d is a damping factor which can be set between 0 (inclusive) and 1 (exclusive). It is usually set to 0.85.

  • C(A) is defined as the number of links going out of page A.

This equation is used to iteratively update a candidate solution and arrive at an approximate solution to the same equation.

For more information on this algorithm, see:

Considerations

There are some things to be aware of when using the PageRank algorithm:

  • If there are no relationships from within a group of pages to outside the group, then the group is considered a spider trap.

  • Rank sink can occur when a network of pages is forming an infinite cycle.

  • Dead-ends occur when pages have no outgoing relationship.

Changing the damping factor can help with all the considerations above. It can be interpreted as a probability of a web surfer to sometimes jump to a random page and therefore not getting stuck in sinks.

Syntax

This section covers the syntax used to execute the PageRank algorithm in each of its execution modes. We are describing the named graph variant of the syntax.

Run PageRank.
CALL graph.page_rank(
  'CPU_X64_XS',        (1)
  {
    'project': {...}, (2)
    'compute': {...}, (3)
    'write':   {...}  (4)
  }
);
1 Compute pool selector.
2 Project config.
3 Compute config.
4 Write config.
Table 1. Parameters
Name Type Default Optional Description

computePoolSelector

String

n/a

no

The selector for the compute pool on which to run the Page Rank job.

configuration

Map

{}

no

Configuration for graph project, algorithm compute and result write back.

The configuration map consists of the following three entries.

For more details on below Project configuration, refer to the Project documentation.
Table 2. Project configuration
Name Type

nodeTables

List of node tables.

relationshipTables

Map of relationship types to relationship tables.

Table 3. Compute configuration
Name Type Default Optional Description

mutateProperty

String

'pageRank'

yes

The node property that will be written back to the Snowflake database.

dampingFactor

Float

0.85

yes

The damping factor of the Page Rank calculation. Must be in [0, 1).

maxIterations

Integer

20

yes

The maximum number of iterations of Page Rank to run.

tolerance

Float

0.0000001

yes

Minimum change in scores between iterations. If all scores change less than the tolerance value the result is considered stable and the algorithm returns.

relationshipWeightProperty

String

null

yes

Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted.

sourceNodes

Node/Number or List or List of pairs as lists

[]

yes

The nodes or node ids or node-bias pairs to use for computing Personalized Page Rank. To use different bias for different source nodes, use the use the syntax: [[nodeId1, bias1], [nodeId2, bias2], …​].

scaler

String or Map

None

yes

The name of the scaler applied for the final scores. Supported values are None, MinMax, Max, Mean, Log, and StdScore. To apply scaler-specific configuration, use the Map syntax: {scaler: 'name', …​}.

For more details on below Write configuration, refer to the Write documentation.
Table 4. Write configuration
Name Type Default Optional Description

nodeLabel

String

n/a

no

Node label in the in-memory graph from which to write a node property.

nodeProperty

String

'pageRank'

yes

The node property that will be written back to the Snowflake database.

outputTable

String

n/a

no

Table in Snowflake database to which node properties are written.

Examples

In this section we will show examples of running the PageRank algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small web network graph of a handful nodes connected in a particular pattern. The example graph looks like this:

Visualization of the example graph
The following SQL statement will create the example graph tables in the Snowflake database:
CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.PAGES (NODEID STRING);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.PAGES VALUES
  ('Home'),
  ('About'),
  ('Product'),
  ('Links'),
  ('Site A'),
  ('Site B'),
  ('Site C'),
  ('Site D');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.LINKS (SOURCENODEID STRING, TARGETNODEID STRING, WEIGHT DOUBLE);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.LINKS VALUES
  ('Home',    'About',   0.2),
  ('Home',    'Links',   0.2),
  ('Home',    'Product', 0.6),
  ('About',   'Home',    1.0),
  ('Product', 'Home',    1.0),
  ('Site A',  'Home',    1.0),
  ('Site B',  'Home',    1.0),
  ('Site C',  'Home',    1.0),
  ('Site D',  'Home',    1.0),
  ('Links',   'Home',    0.8),
  ('Links',   'Site a',  0.05),
  ('Links',   'Site B',  0.05),
  ('Links',   'Site C',  0.05),
  ('Links',   'Site D',  0.05);

This graph represents eight pages, linking to one another. Each relationship has a property called weight, which describes the importance of the relationship.

Run job

Running a PageRank job involves the three steps: Project, Compute and Write.

To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.

We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.

The following will run a PageRank job:
CALL Neo4j_Graph_Analytics.graph.page_rank('CPU_X64_XS', {
    'project': {
        'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
        'nodeTables': [ 'PAGES' ],
        'relationshipTables': {
            'LINKS': {
                'sourceTable': 'PAGES',
                'targetTable': 'PAGES'
            }
        }
    },
    'compute': {
        'mutateProperty': 'score'
    },
    'write': [{
        'nodeLabel': 'PAGES',
        'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.PAGES_CENTRALITY',
        'nodeProperty': 'score'
    }]
});
Table 5. Results
JOB_ID JOB_START JOB_END JOB_RESULT

job_c755f1e112164f78b7054d162dc4aab4

2025-04-29 15:55:58.260000

2025-04-29 15:56:04.730000

 {
    "page_rank_1": {
      "centralityDistribution": {
        "max": 3.215682983398437,
        "mean": 0.9612393379211426,
        "min": 0.32785606384277344,
        "p50": 0.32785606384277344,
        "p75": 1.0542736053466797,
        "p90": 3.2156810760498047,
        "p95": 3.2156810760498047,
        "p99": 3.2156810760498047,
        "p999": 3.2156810760498047
      },
      "computeMillis": 73,
      "configuration": {
        "concurrency": 2,
        "dampingFactor": 0.85,
        "jobId": "e692cccc-34a9-4b9b-b41f-6b9c39530e7f",
        "logProgress": true,
        "maxIterations": 20,
        "mutateProperty": "score",
        "nodeLabels": [
          "*"
        ],
        "relationshipTypes": [
          "*"
        ],
        "scaler": "NONE",
        "sourceNodes": [],
        "sudo": false,
        "tolerance": 1.000000000000000e-07
      },
      "didConverge": false,
      "mutateMillis": 4,
      "nodePropertiesWritten": 8,
      "postProcessingMillis": 34,
      "preProcessingMillis": 10,
      "ranIterations": 20
    },
    "project_1": {
      "graphName": "snowgraph",
      "nodeCount": 8,
      "nodeMillis": 326,
      "relationshipCount": 14,
      "relationshipMillis": 470,
      "totalMillis": 796
    },
    "write_node_property_1": {
      "exportMillis": 2029,
      "nodeLabel": "PAGES",
      "nodeProperty": "score",
      "outputTable": "EXAMPLE_DB.DATA_SCHEMA.PAGES_CENTRALITY",
      "propertiesExported": 8
    }
  }

The returned result contains information about the job execution and result distribution. The centrality distribution histogram can be useful for inspecting the computed scores or perform normalizations. Additionally, the centrality score for each of the seven nodes has been written back to the Snowflake database. We can query it like so:

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.PAGES_CENTRALITY;
Table 6. Results
NODEID SCORE

Home

3.215681999884452

About

1.0542700552146722

Product

1.0542700552146722

Links

1.0542700552146722

Site A

0.3278578964488539

Site B

0.3278578964488539

Site C

0.3278578964488539

Site D

0.3278578964488539

The above query is running the PageRank algorithm mode as unweighted and the returned scores are not normalized.