6.7. The Random Walk algorithm

This section describes the Random Walk algorithm in the Neo4j Graph Algorithms library.

Random Walk is an algorithm that provides random paths in a graph.

A random walk means that we start at one node, choose a neighbor to navigate to at random or based on a provided probability distribution, and then do the same from that node, keeping the resulting path in a list. It’s similar to how a drunk person traverses a city.

6.7.1. History and explanation

The term "random walk" was first mentioned by Karl Pearson in 1905 in a letter to Nature magazine titled The Problem of the Random Walk. Study of random walks date back even further to the Gambler’s ruin problem, where it could be used to show that a gambler would eventually go bankrupt against an opponent with infinite wealth.

It’s only in the last couple of decades, however, that researchers have studied them with respect to networks.

6.7.2. Use-cases - when to use the Random Walk algorithm

Random Walk is often used as part of other algorithms:

  • It can be used as part of the node2vec and graph2vec algorithms, that create node embeddings.
  • It can be used as part of the Walktrap and Infomap community detection algorithms. If a random walk returns a small set of nodes repeatedly, then it indicates that those set of nodes may have a community structure.
  • It can be used as part of the training process of machine learning model, as described in David Mack’s article Review prediction with Neo4j and TensorFlow.

You can read about more use cases in Random walks and diffusion on networks.

Many of the use-cases of PageRank also apply to Random Walks.

6.7.3. Constraints - when not to use the Random Walk algorithm

The constraints of PageRank also apply to Random Walks:

  • Dead-ends occur when pages have no out-links. In this case, the random walk will abort and a path containing only the first first node will be returned. This problem can be avoided by passing the direction: BOTH parameter, so that the random walk will traverse relationships in both directions
  • If there are no links from within a group of pages to outside of the group, then the group is considered a spider trap. Random walks starting from any of the nodes in that group will only traverse to the others in the group - our implementation of the algorithm doesn’t allow a random walk to jump to non-neighbouring nodes.
  • Sinks can occur when a network of links form an infinite cycle.

6.7.4. Random Walk algorithm sample

This sample will explain the Random Walk algorithm, using a simple graph:

pagerank

The following will create a sample graph: 

MERGE (home:Page {name:'Home'})
MERGE (about:Page {name:'About'})
MERGE (product:Page {name:'Product'})
MERGE (links:Page {name:'Links'})
MERGE (a:Page {name:'Site A'})
MERGE (b:Page {name:'Site B'})
MERGE (c:Page {name:'Site C'})
MERGE (d:Page {name:'Site D'})

MERGE (home)-[:LINKS]->(about)
MERGE (about)-[:LINKS]->(home)
MERGE (product)-[:LINKS]->(home)
MERGE (home)-[:LINKS]->(product)
MERGE (links)-[:LINKS]->(home)
MERGE (home)-[:LINKS]->(links)
MERGE (links)-[:LINKS]->(a)
MERGE (a)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(b)
MERGE (b)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(c)
MERGE (c)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(d)
MERGE (d)-[:LINKS]->(home)

The following will run the algorithm starting from the Home page and returning a 1 random walk, of path length 3: 

MATCH (home:Page {name: "Home"})
CALL algo.randomWalk.stream(id(home), 3, 1)
YIELD nodeIds

UNWIND nodeIds AS nodeId

RETURN algo.getNodeByid(nodeId).name AS page

Table 6.24. Results
page

"Home"

"Site C"

"Links"

"Site A"

6.7.5. Syntax

The following will run the algorithm and stream results: 

CALL algo.randomWalk.stream(start:Object, steps: 100, walks: 10000,
    {graph:'heavy', nodeQuery:'label or query', relationshipQuery:' type or query', direction:"IN/OUT/BOTH",
     mode:"node2vec"/"random", inOut: 1.0, return: 1.0, path:false, concurrency:4})
YIELD nodes, path

Table 6.25. Parameters
Name Type Default Optional Description

start

object

null

yes

starting points: null - whole graph, "Label" - nodes with that label, node-id - that node, list of node-ids - these nodes

steps

int

10

yes

length of paths returned, in case of error only path of lenght 1 is returned

walks

int

1

yes

number of paths returned

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

nodeQuery

string

null

yes

The label or node-query to load from the graph. If null, load all nodes

relationshipQuery

string

null

yes

The relationship-type or query to load from the graph. If null, load all relationships.

direction

string

'BOTH'

yes

direction of relationships to follow

mode

string

random

yes

strategy for choosing the next relationship, modes: random and node2vec

inOut

float

1.0

yes

parameter for node2vec

return

float

1.0

yes

parameter for node2vec

path

boolean

false

yes

if the more expensive operation of creating a path from node-ids should be performed and returned in results

concurrency

int

available CPUs

yes

The number of concurrent threads

Table 6.26. Results
Name Type Description

startNodeId

long

Node ID starting the path

nodeIds

list of long

List of Node ID forming a path

path

Path

Optional Path (with virtual relationships)

6.7.6. Cypher projection

If label and relationship-type are not selective enough to describe a subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. You must ensure that graph:'cypher' is set in the config:

MATCH (home:Page {name: "Home"})
CALL algo.randomWalk.stream(id(home), 5, 1, {
  nodeQuery: "MATCH (p:Page) RETURN id(p) as id",
  relationshipQuery: "MATCH (p1:Page)-[:LINKS]->(p2:Page) RETURN id(p1) as source, id(p2) as target",
  graph: "cypher"
})
YIELD nodeIds

UNWIND nodeIds AS nodeId

RETURN algo.getNodeById(nodeId).name AS page

6.7.7. Graph type support

The Random Walk algorithm supports the following graph types:

  • ✓ directed, unweighted
  • ❏ undirected, unweighted