Random Walk

Glossary

Directed: Directed trait. The algorithm is well-defined on a directed graph.
Directed: Directed trait. The algorithm ignores the direction of the graph.
Directed: Directed trait. The algorithm does not run on a directed graph.
Undirected: Undirected trait. The algorithm is well-defined on an undirected graph.
Undirected: Undirected trait. The algorithm ignores the undirectedness of the graph.
Heterogeneous nodes: Heterogeneous nodes fully supported. The algorithm has the ability to distinguish between nodes of different types.
Heterogeneous nodes: Heterogeneous nodes allowed. The algorithm treats all selected nodes similarly regardless of their label.
Heterogeneous relationships: Heterogeneous relationships fully supported. The algorithm has the ability to distinguish between relationships of different types.
Heterogeneous relationships: Heterogeneous relationships allowed. The algorithm treats all selected relationships similarly regardless of their type.
Weighted relationships: Weighted trait. The algorithm supports a relationship property to be used as weight, specified via the relationshipWeightProperty configuration parameter.
Weighted relationships: Weighted trait. The algorithm treats each relationship as equally important, discarding the value of any relationship weight.

Random Walk is an algorithm that provides random paths in a graph.

A random walk simulates a traversal of the graph in which the traversed relationships are chosen at random. In a classic random walk, each relationship has the same, possibly weighted, probability of being picked. This probability is not influenced by the previously visited nodes. The random walk implementation of the Neo4j Graph Data Science library supports the concept of second order random walks. This method tries to model the transition probability based on the currently visited node v, the node t visited before the current one, and the node x which is the target of a candidate relationship. Random walks are thus influenced by two parameters: the returnFactor and the inOutFactor:

The returnFactor is used if t equals x, i.e., the random walk returns to the previously visited node.
The inOutFactor is used if the distance from t to x is equal to 2, i.e., the walk traverses further away from the node t

The probabilities for traversing a relationship during a random walk can be further influenced by specifying a relationshipWeightProperty. A relationship property value greater than 1 will increase the likelihood of a relationship being traversed, a property value between 0 and 1 will decrease that probability.

To obtain a random walk where the transition probability is independent of the previously visited nodes both the returnFactor and the inOutFactor can be set to 1.0.

Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Memory Estimation.

Syntax

RandomWalk syntax per mode

Run RandomWalk in stream mode on a named graph.

CALL gds.randomWalk.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeIds: List of Integer,
  path: Path

Table 1. Parameters
Name	Type	Default	Optional	Description
graphName	String	`n/a`	no	The name of a graph stored in the catalog.
configuration	Map	`{}`	yes	Configuration for algorithm-specifics and/or graph filtering.

Table 2. Configuration
Name	Type	Default	Optional	Description
nodeLabels	List of String	`['*']`	yes	Filter the named graph using the given node labels. Nodes with any of the given labels will be included.
relationshipTypes	List of String	`['*']`	yes	Filter the named graph using the given relationship types. Relationships with any of the given types will be included.
concurrency	Integer	`4`	yes	The number of concurrent threads used for running the algorithm.
jobId	String	`Generated internally`	yes	An ID that can be provided to more easily track the algorithm’s progress.
logProgress	Boolean	`true`	yes	If disabled the progress percentage will not be logged.
sourceNodes	List of Integer	`List of all nodes`	yes	The list of nodes from which to do a random walk.
walkLength	Integer	`80`	yes	The number of steps in a single random walk.
walksPerNode	Integer	`10`	yes	The number of random walks generated for each node.
inOutFactor	Float	`1.0`	yes	Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local.
returnFactor	Float	`1.0`	yes	Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency.
relationshipWeightProperty	String	`null`	yes	Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted.
randomSeed	Integer	`random`	yes	Seed value for the random number generator used to generate the random walks.
walkBufferSize	Integer	`1000`	yes	The number of random walks to complete before starting training.

Table 3. Results
Name	Type	Description
`nodeIds`	List of Integer	The nodes of the random walk.
`path`	Path	A `Path` object of the random walk.

Run RandomWalk in stats mode on a named graph.

CALL gds.randomWalk.stats(
  graphName: String,
  configuration: Map
)
YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  configuration: Map

Table 4. Parameters
Name	Type	Default	Optional	Description
graphName	String	`n/a`	no	The name of a graph stored in the catalog.
configuration	Map	`{}`	yes	Configuration for algorithm-specifics and/or graph filtering.

Table 5. Configuration
Name	Type	Default	Optional	Description
nodeLabels	List of String	`['*']`	yes	Filter the named graph using the given node labels. Nodes with any of the given labels will be included.
relationshipTypes	List of String	`['*']`	yes	Filter the named graph using the given relationship types. Relationships with any of the given types will be included.
concurrency	Integer	`4`	yes	The number of concurrent threads used for running the algorithm.
jobId	String	`Generated internally`	yes	An ID that can be provided to more easily track the algorithm’s progress.
logProgress	Boolean	`true`	yes	If disabled the progress percentage will not be logged.
sourceNodes	List of Integer	`List of all nodes`	yes	The list of nodes from which to do a random walk.
walkLength	Integer	`80`	yes	The number of steps in a single random walk.
walksPerNode	Integer	`10`	yes	The number of random walks generated for each node.
inOutFactor	Float	`1.0`	yes	Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local.
returnFactor	Float	`1.0`	yes	Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency.
relationshipWeightProperty	String	`null`	yes	Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted.
randomSeed	Integer	`random`	yes	Seed value for the random number generator used to generate the random walks.
walkBufferSize	Integer	`1000`	yes	The number of random walks to complete before starting training.

Table 6. Results
Name	Type	Description
`preProcessingMillis`	Integer	Milliseconds for preprocessing the data.
`computeMillis`	Integer	Milliseconds for running the algorithm.
`configuration`	Map	The configuration used for running the algorithm.

Examples

All the examples below should be run in an empty database.

The examples use native projections as the norm, although Cypher projections can be used as well.

Consider the graph created by the following Cypher statement:

CREATE (home:Page {name: 'Home'}),
       (about:Page {name: 'About'}),
       (product:Page {name: 'Product'}),
       (links:Page {name: 'Links'}),
       (a:Page {name: 'Site A'}),
       (b:Page {name: 'Site B'}),
       (c:Page {name: 'Site C'}),
       (d:Page {name: 'Site D'}),

       (home)-[:LINKS]->(about),
       (about)-[:LINKS]->(home),
       (product)-[:LINKS]->(home),
       (home)-[:LINKS]->(product),
       (links)-[:LINKS]->(home),
       (home)-[:LINKS]->(links),
       (links)-[:LINKS]->(a),
       (a)-[:LINKS]->(home),
       (links)-[:LINKS]->(b),
       (b)-[:LINKS]->(home),
       (links)-[:LINKS]->(c),
       (c)-[:LINKS]->(home),
       (links)-[:LINKS]->(d),
       (d)-[:LINKS]->(home)

CALL gds.graph.project(
    'myGraph',
    '*',
    { LINKS: { orientation: 'UNDIRECTED' } }
);

Without specified source nodes

Run the RandomWalk algorithm on myGraph

CALL gds.randomWalk.stream(
  'myGraph',
  {
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS pages

Table 7. Results
nodeIds	pages
[0, 5, 0]	["Home", "Site B", "Home"]
[1, 0, 4]	["About", "Home", "Site A"]
[2, 0, 3]	["Product", "Home", "Links"]
[3, 7, 3]	["Links", "Site D", "Links"]
[4, 3, 0]	["Site A", "Links", "Home"]
[5, 0, 2]	["Site B", "Home", "Product"]
[6, 0, 4]	["Site C", "Home", "Site A"]
[7, 0, 2]	["Site D", "Home", "Product"]

With specified source nodes

Run the RandomWalk algorithm on myGraph with specified sourceNodes

MATCH (page:Page)
WHERE page.name IN ['Home', 'About']
WITH COLLECT(page) as sourceNodes
CALL gds.randomWalk.stream(
  'myGraph',
  {
    sourceNodes: sourceNodes,
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS pages

Table 8. Results
nodeIds	pages
[0, 5, 0]	["Home", "Site B", "Home"]
[1, 0, 4]	["About", "Home", "Site A"]

Stats

Run the RandomWalk stats on myGraph

CALL gds.randomWalk.stats(
  'myGraph',
  {
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)

Table 9. Results
preProcessingMillis	computeMillis	configuration
0	1	{randomSeed=42, walkLength=3, jobId=b77f3147-6683-4249-8633-4db7da03f24d, sourceNodes=[], walksPerNode=1, inOutFactor=1.0, nodeLabels=[], sudo=false, relationshipTypes=[], walkBufferSize=1000, returnFactor=1.0, concurrency=1}