This section describes the Random Walk algorithm in the Neo4j Labs Graph Algorithms library.
This is documentation for the Graph Algorithms Library, which has been deprecated by the Graph Data Science Library (GDS). |
Random Walk is an algorithm that provides random paths in a graph.
A random walk means that we start at one node, choose a neighbor to navigate to at random or based on a provided probability distribution, and then do the same from that node, keeping the resulting path in a list. It’s similar to how a drunk person traverses a city.
The Random Walk algorithm was developed by the Neo4j Labs team and is not officially supported. |
This section includes:
The term "random walk" was first mentioned by Karl Pearson in 1905 in a letter to Nature magazine titled The Problem of the Random Walk. Study of random walks date back even further to the Gambler’s ruin problem, where it could be used to show that a gambler would eventually go bankrupt against an opponent with infinite wealth.
It’s only in the last couple of decades, however, that researchers have studied them with respect to networks.
Random Walk is often used as part of other algorithms:
You can read about more use cases in Random walks and diffusion on networks.
Many of the use-cases of PageRank also apply to Random Walks.
The constraints of PageRank also apply to Random Walks:
direction: BOTH
parameter, so that the random walk will traverse relationships in both directions
This sample will explain the Random Walk algorithm, using a simple graph:
The following will create a sample graph:
MERGE (home:Page {name:'Home'})
MERGE (about:Page {name:'About'})
MERGE (product:Page {name:'Product'})
MERGE (links:Page {name:'Links'})
MERGE (a:Page {name:'Site A'})
MERGE (b:Page {name:'Site B'})
MERGE (c:Page {name:'Site C'})
MERGE (d:Page {name:'Site D'})
MERGE (home)-[:LINKS]->(about)
MERGE (about)-[:LINKS]->(home)
MERGE (product)-[:LINKS]->(home)
MERGE (home)-[:LINKS]->(product)
MERGE (links)-[:LINKS]->(home)
MERGE (home)-[:LINKS]->(links)
MERGE (links)-[:LINKS]->(a)
MERGE (a)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(b)
MERGE (b)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(c)
MERGE (c)-[:LINKS]->(home)
MERGE (links)-[:LINKS]->(d)
MERGE (d)-[:LINKS]->(home)
The following will run the algorithm starting from the Home page and returning a 1 random walk, of path length 3:
MATCH (home:Page {name: "Home"})
CALL algo.randomWalk.stream(id(home), 3, 1)
YIELD nodeIds
UNWIND nodeIds AS nodeId
RETURN algo.asNode(nodeId).name AS page
page |
---|
"Home" |
"Site C" |
"Links" |
"Site A" |
If node label and relationship type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 2.2, “Cypher projection” section of the manual.
MATCH (home:Page {name: "Home"})
CALL algo.randomWalk.stream(id(home), 5, 1, {
nodeQuery: "MATCH (p:Page) RETURN id(p) as id",
relationshipQuery: "MATCH (p1:Page)-[:LINKS]->(p2:Page) RETURN id(p1) as source, id(p2) as target",
graph: "cypher"
})
YIELD nodeIds
UNWIND nodeIds AS nodeId
RETURN algo.asNode(nodeId).name AS page
The following will run the algorithm and stream results:
CALL algo.randomWalk.stream(start:Object, steps: 100, walks: 10000,
{graph:'huge', nodeQuery:'label or query', relationshipQuery:' type or query', direction:"IN/OUT/BOTH",
mode:"node2vec"/"random", inOut: 1.0, return: 1.0, path:false, concurrency:4})
YIELD nodes, path
Name | Type | Default | Optional | Description |
---|---|---|---|---|
start |
object |
null |
yes |
Starting points: null - whole graph, "Label" - nodes with that label, node-id - that node, list of node-ids - these nodes. |
steps |
int |
10 |
yes |
Length of paths returned, in case of error only path of lenght 1 is returned. |
walks |
int |
1 |
yes |
Number of paths returned. |
graph |
string |
'huge' |
yes |
Use 'huge' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node statement and relationship statement. |
nodeQuery |
string |
null |
yes |
The label or node-query to load from the graph. If null, load all nodes. |
relationshipQuery |
string |
null |
yes |
The relationship type or query to load from the graph. If null, load all relationships. |
direction |
string |
'BOTH' |
yes |
Direction of relationships to follow. |
mode |
string |
random |
yes |
Strategy for choosing the next relationship, modes: random and node2vec. |
inOut |
float |
1.0 |
yes |
Parameter for node2vec. |
return |
float |
1.0 |
yes |
Parameter for node2vec. |
path |
boolean |
false |
yes |
If the more expensive operation of creating a path from node-ids should be performed and returned in results. |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'. |
readConcurrency |
int |
value of 'concurrency' |
yes |
The number of concurrent threads used for reading the graph. |
Name | Type | Description |
---|---|---|
startNodeId |
long |
Node ID starting the path. |
nodeIds |
list of long |
List of Node ID forming a path. |
path |
Path |
Optional Path (with virtual relationships). |
The Random Walk algorithm supports the following graph types: