Random Walk

Random Walk is an algorithm that provides random paths in a graph.

A random walk simulates a traversal of the graph in which the traversed relationships are chosen at random. In a classic random walk, each relationship has the same, possibly weighted, probability of being picked. This probability is not influenced by the previously visited nodes. The random walk implementation of the Neo4j Graph Data Science library supports the concept of second order random walks. This method tries to model the transition probability based on the currently visited node v, the node t visited before the current one, and the node x which is the target of a candidate relationship. Random walks are thus influenced by two parameters: the returnFactor and the inOutFactor:

  • The returnFactor is used if t equals x, i.e., the random walk returns to the previously visited node.

  • The inOutFactor is used if the distance from t to x is equal to 2, i.e., the walk traverses further away from the node t

Visuzalition of random walk parameters

The probabilities for traversing a relationship during a random walk can be further influenced by specifying a relationshipWeightProperty. A relationship property value greater than 1 will increase the likelihood of a relationship being traversed, a property value between 0 and 1 will decrease that probability.

To obtain a random walk where the transition probability is independent of the previously visited nodes both the returnFactor and the inOutFactor can be set to 1.0.

Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Memory Estimation.

1. Syntax

RandomWalk syntax per mode
Run RandomWalk in stream mode on a named graph.
CALL gds.randomWalk.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeIds: List of Integer,
  path: Path
Table 1. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. Configuration
Name Type Default Optional Description

nodeLabels

List of String

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the algorithm’s progress.

sourceNodes

List of Integer

List of all nodes

yes

The list of nodes from which to do a random walk.

walkLength

Integer

80

yes

The number of steps in a single random walk.

walksPerNode

Integer

10

yes

The number of random walks generated for each node.

inOutFactor

Float

1.0

yes

Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local.

returnFactor

Float

1.0

yes

Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency.

relationshipWeightProperty

String

null

yes

Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted.

randomSeed

Integer

random

yes

Seed value for the random number generator used to generate the random walks.

walkBufferSize

Integer

1000

yes

The number of random walks to complete before starting training.

Table 3. Results
Name Type Description

nodeIds

List of Integer

The nodes of the random walk.

path

Path

A Path object of the random walk.

Run RandomWalk in stats mode on a named graph.
CALL gds.randomWalk.stats(
  graphName: String,
  configuration: Map
)
YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  configuration: Map
Table 4. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5. Configuration
Name Type Default Optional Description

nodeLabels

List of String

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

List of String

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the algorithm’s progress.

sourceNodes

List of Integer

List of all nodes

yes

The list of nodes from which to do a random walk.

walkLength

Integer

80

yes

The number of steps in a single random walk.

walksPerNode

Integer

10

yes

The number of random walks generated for each node.

inOutFactor

Float

1.0

yes

Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local.

returnFactor

Float

1.0

yes

Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency.

relationshipWeightProperty

String

null

yes

Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted.

randomSeed

Integer

random

yes

Seed value for the random number generator used to generate the random walks.

walkBufferSize

Integer

1000

yes

The number of random walks to complete before starting training.

Table 6. Results
Name Type Description

preProcessingMillis

Integer

Milliseconds for preprocessing the data.

computeMillis

Integer

Milliseconds for running the algorithm.

configuration

Map

The configuration used for running the algorithm.

2. Examples

Consider the graph created by the following Cypher statement:

CREATE (home:Page {name: 'Home'}),
       (about:Page {name: 'About'}),
       (product:Page {name: 'Product'}),
       (links:Page {name: 'Links'}),
       (a:Page {name: 'Site A'}),
       (b:Page {name: 'Site B'}),
       (c:Page {name: 'Site C'}),
       (d:Page {name: 'Site D'}),

       (home)-[:LINKS]->(about),
       (about)-[:LINKS]->(home),
       (product)-[:LINKS]->(home),
       (home)-[:LINKS]->(product),
       (links)-[:LINKS]->(home),
       (home)-[:LINKS]->(links),
       (links)-[:LINKS]->(a),
       (a)-[:LINKS]->(home),
       (links)-[:LINKS]->(b),
       (b)-[:LINKS]->(home),
       (links)-[:LINKS]->(c),
       (c)-[:LINKS]->(home),
       (links)-[:LINKS]->(d),
       (d)-[:LINKS]->(home)
CALL gds.graph.project(
    'myGraph',
    '*',
    { LINKS: { orientation: 'UNDIRECTED' } }
);

2.1. Without specified source nodes

Run the RandomWalk algorithm on myGraph
CALL gds.randomWalk.stream(
  'myGraph',
  {
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS pages
Table 7. Results
nodeIds pages

[0, 5, 0]

[Home, Site B, Home]

[1, 0, 4]

[About, Home, Site A]

[2, 0, 3]

[Product, Home, Links]

[3, 7, 3]

[Links, Site D, Links]

[4, 3, 0]

[Site A, Links, Home]

[5, 0, 2]

[Site B, Home, Product]

[6, 0, 4]

[Site C, Home, Site A]

[7, 0, 2]

[Site D, Home, Product]

2.2. With specified source nodes

Run the RandomWalk algorithm on myGraph with specified sourceNodes
MATCH (page:Page)
WHERE page.name IN ['Home', 'About']
WITH COLLECT(page) as sourceNodes
CALL gds.randomWalk.stream(
  'myGraph',
  {
    sourceNodes: sourceNodes,
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS pages
Table 8. Results
nodeIds pages

[0, 5, 0]

[Home, Site B, Home]

[1, 0, 4]

[About, Home, Site A]

2.3. Stats

Run the RandomWalk stats on myGraph
CALL gds.randomWalk.stats(
  'myGraph',
  {
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
Table 9. Results
preProcessingMillis computeMillis configuration

0

1

{randomSeed=42, walkLength=3, jobId=b77f3147-6683-4249-8633-4db7da03f24d, sourceNodes=[], walksPerNode=1, inOutFactor=1.0, nodeLabels=[], sudo=false, relationshipTypes=[], walkBufferSize=1000, returnFactor=1.0, concurrency=1}