## 5.1. The PageRank algorithm

This section describes the PageRank algorithm in the Neo4j Graph Algorithms library.

PageRank is an algorithm that measures the transitive influence or connectivity of nodes.

It can be computed by either iteratively distributing one node’s rank (originally based on degree) over its neighbours or by randomly traversing the graph and counting the frequency of hitting each node during these walks.

PageRank is a variant of Eigenvector Centrality.

This section includes:

### 5.1.1. History and explanation

PageRank is named after Google co-founder Larry Page, and is used to rank websites in Google’s search results. It counts the number, and quality, of links to a page which determines an estimation of how important the page is. The underlying assumption is that pages of importance are more likely to receive a higher volume of links from other pages.

PageRank is defined in the original Google paper as follows:

``PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))``

where,

• we assume that a page `A` has pages `T1` to `Tn` which point to it (i.e., are citations).
• `d` is a damping factor which can be set between 0 and 1. It is usually set to 0.85.
• `C(A)` is defined as the number of links going out of page `A`.

### 5.1.2. Use-cases - when to use the PageRank algorithm

PageRank can be applied across a wide range of domains. The following are some notable use-cases:

• Personalized PageRank is used by Twitter to present users with recommendations of other accounts that they may wish to follow. The algorithm is run over a graph which contains shared interests and common connections. Their approach is described in more detail in "WTF: The Who to Follow Service at Twitter".
• PageRank has been used to rank public spaces or streets, predicting traffic flow and human movement in these areas. The algorithm is run over a graph which contains intersections connected by roads, where the PageRank score reflects the tendency of people to park, or end their journey, on each street. This is described in more detail in "Self-organized Natural Roads for Predicting Traffic Flow: A Sensitivity Study".
• PageRank can be used as part of an anomaly or fraud detection system in the healthcare and insurance industries. It can help find doctors or providers that are behaving in an unusual manner, and then feed the score into a machine learning algorithm.

There are many more use cases, which you can read about in David Gleich’s "PageRank beyond the web"

### 5.1.3. Constraints - when not to use the PageRank algorithm

There are some things to be aware of when using the PageRank algorithm:

• If there are no links from within a group of pages to outside of the group, then the group is considered a spider trap.
• Rank sink can occur when a network of pages form an infinite cycle.

If you see unexpected results from running the algorithm, it is worth doing some exploratory analysis of the graph to see if any of these problems are the cause. You can read The Google PageRank Algorithm and How It Works to learn more.

### 5.1.4. PageRank algorithm sample

This sample will explain the PageRank algorithm, using a simple graph:

The following will create a sample graph:

``````MERGE (home:Page {name:'Home'})
MERGE (product:Page {name:'Product'})
MERGE (a:Page {name:'Site A'})
MERGE (b:Page {name:'Site B'})
MERGE (c:Page {name:'Site C'})
MERGE (d:Page {name:'Site D'})

The following will run the algorithm and stream results:

``````CALL algo.pageRank.stream('Page', 'LINKS', {iterations:20, dampingFactor:0.85})
YIELD nodeId, score

RETURN algo.asNode(nodeId).name AS page,score
ORDER BY score DESC``````

The following will run the algorithm and write back results:

``````CALL algo.pageRank('Page', 'LINKS',
{iterations:20, dampingFactor:0.85, write: true,writeProperty:"pagerank"})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty``````

Table 5.1. Results
Name PageRank

Home

3.232

Product

1.059

1.059

1.059

Site A

0.328

Site B

0.328

Site C

0.328

Site D

0.328

As we might expect, the Home page has the highest PageRank because it has incoming links from all other pages. We can also see that it’s not only the number of incoming links that is important, but also the importance of the pages behind those links.

### 5.1.5. Weighted PageRank algorithm sample

This sample will explain the PageRank algorithm, using a simple graph:

The following will create a sample graph:

``````MERGE (home:Page {name:'Home'})
MERGE (product:Page {name:'Product'})
MERGE (a:Page {name:'Site A'})
MERGE (b:Page {name:'Site B'})
MERGE (c:Page {name:'Site C'})
MERGE (d:Page {name:'Site D'})

The following will run the algorithm and stream results:

``````CALL algo.pageRank.stream('Page', 'LINKS', {
iterations:20, dampingFactor:0.85, weightProperty: "weight"
})
YIELD nodeId, score

RETURN algo.asNode(nodeId).name AS page,score
ORDER BY score DESC``````

The following will run the algorithm and write back results:

``````CALL algo.pageRank('Page', 'LINKS',{
iterations:20, dampingFactor:0.85, write: true, writeProperty:"pagerank", weightProperty: "weight"
})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty``````

Table 5.2. Results
Name PageRank

Home

3.550

Product

1.953

0.7509

0.7509

Site A

0.1816

Site B

0.1816

Site C

0.1816

Site D

0.1816

As we might expect, the Home page has the highest PageRank because it has incoming links from all other pages. It’s even more important now that the Links page links to it with a high weight. The Product page has now become the 2nd most important page in its own right because of the high weighted link from the Home page.

#### 5.1.5.1. Personalized PageRank

Personalized PageRank is a variation of PageRank which is biased towards a set of `sourceNodes`. This variant of PageRank is often used as part of recommender systems.

The following examples show how to run PageRank centered around 'Site A'.

The following will run the algorithm and stream results:

``````MATCH (siteA:Page {name: "Site A"})

CALL algo.pageRank.stream('Page', 'LINKS', {iterations:20, dampingFactor:0.85, sourceNodes: [siteA]})
YIELD nodeId, score

RETURN algo.asNode(nodeId).name AS page,score
ORDER BY score DESC``````

The following will run the algorithm and write back results:

``````MATCH (siteA:Page {name: "Site A"})
{iterations:20, dampingFactor:0.85, sourceNodes: [siteA], write: true, writeProperty:"ppr"})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty
RETURN *``````

Table 5.3. Results
Name PageRank

Home

0.399

Site A

0.169

0.112

Product

0.112

0.112

Site B

0.019

Site C

0.019

Site D

0.019

### 5.1.6. Example usage

In this example we will run PageRank on Yelp’s social network to find potential influencers.

When importing the Yelp dataset we stored the social network as a undirected graph. Relationships in Neo4j always have a direction, but in this domain the direction is irrelevant. If `Person A` is a `FRIEND` with `Person B`, we can say that `Person B` is also a `FRIEND` with `Person A`.

The default label and relationship-type selection syntax won’t work for us here, because it will project a directed social network. Instead, we can project our undirected social network using Cypher loading. We can also apply this approach to other algorithms that use Cypher loading.

The following will run the algorithm on Yelp social network:

``````CALL algo.pageRank.stream(
'MATCH (u:User) WHERE exists( (u)-[:FRIENDS]-() ) RETURN id(u) as id',
'MATCH (u1:User)-[:FRIENDS]-(u2:User) RETURN id(u1) as source, id(u2) as target',
{graph:'cypher'}
) YIELD nodeId,score with algo.asNode(nodeId) as node, score order by score desc limit 10
RETURN node {.name, .review_count, .average_stars,.useful,.yelping_since,.funny}, score``````

### 5.1.7. Huge graph projection

The default label and relationship-type projection has a limitation of 2 billion nodes and 2 billion relationships. Therefore, if our projected graph contains more than 2 billion nodes or relationships, we will need to use huge graph projection.

Set `graph:'huge'` in the config:

``````CALL algo.pageRank('Page','LINKS',
{graph:'huge'})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, writeProperty;``````

### 5.1.8. Cypher projection

If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 2.2, “Cypher projection” section of the manual.

Set `graph:'cypher'` in the config:

``````CALL algo.pageRank(
'MATCH (p:Page) RETURN id(p) as id',
'MATCH (p1:Page)-[:LINKS]->(p2:Page) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:5, write: true}
)``````

### 5.1.9. Syntax

The following will run the algorithm and write back results:

``````CALL algo.pageRank(label:String, relationship:String,
{direction:'OUTGOING', iterations:20, dampingFactor:0.85, write: true, writeProperty:'pagerank', concurrency:4})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty``````

Table 5.4. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

direction

string

'OUTGOING'

yes

The relationship-direction to use in the algorithm

iterations

int

20

yes

How many iterations of PageRank to run

concurrency

int

available CPUs

yes

dampingFactor

float

0.85

yes

The damping factor of the PageRank calculation

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

defaultValue

float

0.0

yes

The default value of the weight in case it is missing or invalid

write

boolean

true

yes

Specify if the result should be written back as a node property

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

Table 5.5. Results
Name Type Description

nodes

int

The number of nodes considered

iterations

int

The number of iterations run

dampingFactor

float

The damping factor used

writeProperty

string

The property name written back to

write

boolean

Specifies if the result was written back as node property

int

computeMillis

int

Milliseconds for running the algorithm

writeMillis

int

Milliseconds for writing result data back

The following will run the algorithm and stream results:

``````CALL algo.pageRank.stream(label:String, relationship:String,
{direction:'OUTGOING', iterations:20, dampingFactor:0.85, concurrency:4})
YIELD node, score``````

Table 5.6. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all nodes

direction

string

'OUTGOING'

yes

The relationship-direction to use in the algorithm

iterations

int

20

yes

Specify how many iterations of PageRank to run

concurrency

int

available CPUs

yes

dampingFactor

float

0.85

yes

The damping factor of the PageRank calculation

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

defaultValue

float

0.0

yes

The default value of the weight in case it is missing or invalid

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

Table 5.7. Results
Name Type Description

nodeId

long

Node ID

score

float

PageRank weight

### 5.1.10. Graph type support

The PageRank algorithm supports the following graph types:

• ✓ directed, unweighted:

• direction: 'INCOMING' or 'OUTGOING', weightProperty: null
• ✓ directed, weighted

• direction: 'INCOMING' or 'OUTGOING', weightProperty : 'weight'
• ✓ undirected, unweighted

• direction: 'BOTH', weightProperty: null
• ✓ undirected, weighted

• direction: 'BOTH', weightProperty: 'weight'