7.4.7. The Random Walk algorithm

This section describes the Random Walk algorithm in the Neo4j Labs Graph Algorithms library.

Random Walk is an algorithm that provides random paths in a graph.

A random walk means that we start at one node, choose a neighbor to navigate to at random or based on a provided probability distribution, and then do the same from that node, keeping the resulting path in a list. It’s similar to how a drunk person traverses a city.

The Random Walk algorithm was developed by the Neo4j Labs team and is not officially supported.

This section includes:

7.4.7.1. History and explanation

The term "random walk" was first mentioned by Karl Pearson in 1905 in a letter to Nature magazine titled The Problem of the Random Walk. Study of random walks date back even further to the Gambler’s ruin problem, where it could be used to show that a gambler would eventually go bankrupt against an opponent with infinite wealth.

It’s only in the last couple of decades, however, that researchers have studied them with respect to networks.

7.4.7.2. Use-cases - when to use the Random Walk algorithm

Random Walk is often used as part of other algorithms:

  • It can be used as part of the node2vec and graph2vec algorithms, that create node embeddings.
  • It can be used as part of the Walktrap and Infomap community detection algorithms. If a random walk returns a small set of nodes repeatedly, then it indicates that those set of nodes may have a community structure.
  • It can be used as part of the training process of machine learning model, as described in David Mack’s article Review prediction with Neo4j and TensorFlow.

You can read about more use cases in Random walks and diffusion on networks.

Many of the use-cases of PageRank also apply to Random Walks.

7.4.7.3. Constraints - when not to use the Random Walk algorithm

The constraints of PageRank also apply to Random Walks:

  • Dead-ends occur when pages have no out-links. In this case, the random walk will abort and a path containing only the first first node will be returned. This problem can be avoided by passing the direction: BOTH parameter, so that the random walk will traverse relationships in both directions
  • If there are no links from within a group of pages to outside of the group, then the group is considered a spider trap. Random walks starting from any of the nodes in that group will only traverse to the others in the group - our implementation of the algorithm doesn’t allow a random walk to jump to non-neighbouring nodes.
  • Sinks can occur when a network of links form an infinite cycle.

7.4.7.4. Random Walk algorithm sample

This sample will explain the Random Walk algorithm, using a simple graph:

pagerank

The following will create a sample graph: 

MERGE (home:Page {name:'Home'})
MERGE (about:Page {name:'About'})
MERGE (product:Page {name:'Product'})
MERGE (links:Page {name:'Links'})
MERGE (a:Page {name:'Site A'})
MERGE (b:Page {name:'Site B'})
MERGE (c:Page {name:'Site C'})
MERGE (d:Page {name:'Site D'})

MERGE (home)-[:LINKS]->(about)
MERGE (about)-[:LINKS]->(home)
MERGE (product)-[:LINKS]->(home)
MERGE (home)-[:LINKS]->(product)
MERGE (links)-[:LINKS]->(home)
MERGE (home)-[:LINKS]->(links)
MERGE (links)-[:LINKS]->(a)
MERGE (a)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(b)
MERGE (b)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(c)
MERGE (c)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(d)
MERGE (d)-[:LINKS]->(home)

The following will run the algorithm starting from the Home page and returning a 1 random walk, of path length 3: 

MATCH (home:Page {name: "Home"})
CALL algo.randomWalk.stream(id(home), 3, 1)
YIELD nodeIds

UNWIND nodeIds AS nodeId

RETURN algo.asNode(nodeId).name AS page

Table 7.80. Results
page

"Home"

"Site C"

"Links"

"Site A"

7.4.7.5. Cypher projection

If node label and relationship type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 2.2, “Cypher projection” section of the manual.

MATCH (home:Page {name: "Home"})
CALL algo.randomWalk.stream(id(home), 5, 1, {
  nodeQuery: "MATCH (p:Page) RETURN id(p) as id",
  relationshipQuery: "MATCH (p1:Page)-[:LINKS]->(p2:Page) RETURN id(p1) as source, id(p2) as target",
  graph: "cypher"
})
YIELD nodeIds

UNWIND nodeIds AS nodeId

RETURN algo.asNode(nodeId).name AS page

7.4.7.6. Syntax

The following will run the algorithm and stream results: 

CALL algo.randomWalk.stream(start:Object, steps: 100, walks: 10000,
    {graph:'huge', nodeQuery:'label or query', relationshipQuery:' type or query', direction:"IN/OUT/BOTH",
     mode:"node2vec"/"random", inOut: 1.0, return: 1.0, path:false, concurrency:4})
YIELD nodes, path

Table 7.81. Parameters
Name Type Default Optional Description

start

object

null

yes

Starting points: null - whole graph, "Label" - nodes with that label, node-id - that node, list of node-ids - these nodes.

steps

int

10

yes

Length of paths returned, in case of error only path of lenght 1 is returned.

walks

int

1

yes

Number of paths returned.

graph

string

'huge'

yes

Use 'huge' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node statement and relationship statement.

nodeQuery

string

null

yes

The label or node-query to load from the graph. If null, load all nodes.

relationshipQuery

string

null

yes

The relationship type or query to load from the graph. If null, load all relationships.

direction

string

'BOTH'

yes

Direction of relationships to follow.

mode

string

random

yes

Strategy for choosing the next relationship, modes: random and node2vec.

inOut

float

1.0

yes

Parameter for node2vec.

return

float

1.0

yes

Parameter for node2vec.

path

boolean

false

yes

If the more expensive operation of creating a path from node-ids should be performed and returned in results.

concurrency

int

available CPUs

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'.

readConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for reading the graph.

Table 7.82. Results
Name Type Description

startNodeId

long

Node ID starting the path.

nodeIds

list of long

List of Node ID forming a path.

path

Path

Optional Path (with virtual relationships).

7.4.7.7. Graph type support

The Random Walk algorithm supports the following graph types:

  • ✓ directed, unweighted
  • ❏ undirected, unweighted