Random Walk
This section describes the Random Walk algorithm in the Neo4j Graph Data Science library.
Random Walk is an algorithm that provides random paths in a graph.
A random walk means that we start at one node, choose a neighbor to navigate to at random or based on a provided probability distribution, and then do the same from that node, keeping the resulting path in a list. It’s similar to how a drunk person traverses a city.
This algorithm is in the alpha tier. For more information on algorithm tiers, see Algorithms.
1. History and explanation
The term "random walk" was first mentioned by Karl Pearson in 1905 in a letter to Nature magazine titled The Problem of the Random Walk. Study of random walks date back even further to the Gambler’s ruin problem, where it could be used to show that a gambler would eventually go bankrupt against an opponent with infinite wealth.
It’s only in the last couple of decades, however, that researchers have studied them with respect to networks.
2. Use-cases - when to use the Random Walk algorithm
-
It has be shown to relate to Brownian motion and also to the movement and dispersal of animals in the study of Random walk models in biology.
-
It has been used to analyse ALSI index of the JSE stock exchange and show that the index followed the random walk hypothesis between years 2000 and 2011. This means the movement of stock prices was random and the ability of investors to perform relied more on luck than anything else. Find this study in The Random Walk Theory And Stock Prices: Evidence From Johannesburg Stock Exchange
Random Walk is often used as part of other algorithms:
-
It can be used as part of the node2vec and graph2vec algorithms, that create node embeddings.
-
It can be used as part of the Walktrap and Infomap community detection algorithms. If a random walk returns a small set of nodes repeatedly, then it indicates that those set of nodes may have a community structure.
-
It can be used as part of the training process of machine learning model, as described in David Mack’s article Review prediction with Neo4j and TensorFlow.
You can read about more use cases in Random walks and diffusion on networks.
3. Constraints - when not to use the Random Walk algorithm
The constraints of Page Rank also apply to Random Walks:
-
Dead-ends occur when pages have no out-links. In this case, the random walk will abort and a path containing only the first first node will be returned. This problem can be avoided by running on an undirected graph, so that the random walk will traverse relationships in both directions.
-
If there are no links from within a group of pages to outside of the group, then the group is considered a spider trap. Random walks starting from any of the nodes in that group will only traverse to the others in the group - our implementation of the algorithm doesn’t allow a random walk to jump to non-neighbouring nodes.
-
Sinks can occur when a network of links form an infinite cycle.
4. Syntax
CALL gds.alpha.randomWalk.stream(configuration: Map)
YIELD startNodeId, nodeIds, path
Name | Type | Default | Optional | Description |
---|---|---|---|---|
start |
Object |
null |
yes |
Starting points: null - whole graph, "Label" - nodes with that label, node-id - that node, list of node-ids - these nodes. |
steps |
Integer |
10 |
yes |
Length of paths returned, in case of error only path of lenght 1 is returned. |
walks |
Integer |
1 |
yes |
Number of paths returned. |
mode |
String |
random |
yes |
Strategy for choosing the next relationship, modes: random and node2vec. |
inOut |
Float |
1.0 |
yes |
Parameter for node2vec. |
return |
Float |
1.0 |
yes |
Parameter for node2vec. |
path |
Boolean |
false |
yes |
If the more expensive operation of creating a path from node-ids should be performed and returned in results. |
concurrency |
Integer |
4 |
yes |
The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'. |
readConcurrency |
Integer |
value of 'concurrency' |
yes |
The number of concurrent threads used for reading the graph. |
Name | Type | Description |
---|---|---|
startNodeId |
Integer |
Node ID starting the path. |
nodeIds |
Integer[] |
List of Node ID forming a path. |
path |
Path |
Optional Path (with virtual relationships). |
5. Random Walk algorithm sample
This sample will explain the Random Walk algorithm, using a simple graph:

CREATE (home:Page {name: 'Home'}),
(about:Page {name: 'About'}),
(product:Page {name: 'Product'}),
(links:Page {name: 'Links'}),
(a:Page {name: 'Site A'}),
(b:Page {name: 'Site B'}),
(c:Page {name: 'Site C'}),
(d:Page {name: 'Site D'}),
(home)-[:LINKS]->(about),
(about)-[:LINKS]->(home),
(product)-[:LINKS]->(home),
(home)-[:LINKS]->(product),
(links)-[:LINKS]->(home),
(home)-[:LINKS]->(links),
(links)-[:LINKS]->(a),
(a)-[:LINKS]->(home),
(links)-[:LINKS]->(b),
(b)-[:LINKS]->(home),
(links)-[:LINKS]->(c),
(c)-[:LINKS]->(home),
(links)-[:LINKS]->(d),
(d)-[:LINKS]->(home)
MATCH (home:Page {name: 'Home'})
CALL gds.alpha.randomWalk.stream({
nodeProjection: '*',
relationshipProjection: {
LINKS: {
type: 'LINKS',
orientation: 'UNDIRECTED'
}
},
start: id(home),
steps: 3,
walks: 1
})
YIELD nodeIds
UNWIND nodeIds AS nodeId
RETURN gds.util.asNode(nodeId).name AS page
page |
---|
"Home" |
"Site C" |
"Links" |
"Site A" |
6. Cypher projection
MATCH (home:Page {name: 'Home'})
CALL gds.alpha.randomWalk.stream({
nodeQuery: 'MATCH (p:Page) RETURN id(p) AS id',
relationshipQuery: 'MATCH (p1:Page)-[:LINKS]->(p2:Page) RETURN id(p1) AS source, id(p2) AS target',
start: id(home),
steps: 5,
walks: 1
})
YIELD nodeIds
UNWIND nodeIds AS nodeId
RETURN gds.util.asNode(nodeId).name AS page
Was this page helpful?