6.5.7. Random Walk

This section describes the Random Walk algorithm in the Neo4j Graph Data Science library.

Random Walk is an algorithm that provides random paths in a graph.

A random walk means that we start at one node, choose a neighbor to navigate to at random or based on a provided probability distribution, and then do the same from that node, keeping the resulting path in a list. It’s similar to how a drunk person traverses a city.

This algorithm is in the alpha tier. For more information on algorithm tiers, see Chapter 6, Algorithms.

This section includes:

6.5.7.1. History and explanation

The term "random walk" was first mentioned by Karl Pearson in 1905 in a letter to Nature magazine titled The Problem of the Random Walk. Study of random walks date back even further to the Gambler’s ruin problem, where it could be used to show that a gambler would eventually go bankrupt against an opponent with infinite wealth.

It’s only in the last couple of decades, however, that researchers have studied them with respect to networks.

6.5.7.2. Use-cases - when to use the Random Walk algorithm

Random Walk is often used as part of other algorithms:

  • It can be used as part of the node2vec and graph2vec algorithms, that create node embeddings.
  • It can be used as part of the Walktrap and Infomap community detection algorithms. If a random walk returns a small set of nodes repeatedly, then it indicates that those set of nodes may have a community structure.
  • It can be used as part of the training process of machine learning model, as described in David Mack’s article Review prediction with Neo4j and TensorFlow.

You can read about more use cases in Random walks and diffusion on networks.

6.5.7.3. Constraints - when not to use the Random Walk algorithm

The constraints of Page Rank also apply to Random Walks:

  • Dead-ends occur when pages have no out-links. In this case, the random walk will abort and a path containing only the first first node will be returned. This problem can be avoided by running on an undirected graph, so that the random walk will traverse relationships in both directions.
  • If there are no links from within a group of pages to outside of the group, then the group is considered a spider trap. Random walks starting from any of the nodes in that group will only traverse to the others in the group - our implementation of the algorithm doesn’t allow a random walk to jump to non-neighbouring nodes.
  • Sinks can occur when a network of links form an infinite cycle.

6.5.7.4. Syntax

The following will run the algorithm and stream results: 

CALL gds.alpha.randomWalk.stream(configuration: Map)
YIELD startNodeId, nodeIds, path

Table 6.409. Configuration
Name Type Default Optional Description

start

Object

null

yes

Starting points: null - whole graph, "Label" - nodes with that label, node-id - that node, list of node-ids - these nodes.

steps

Integer

10

yes

Length of paths returned, in case of error only path of lenght 1 is returned.

walks

Integer

1

yes

Number of paths returned.

mode

String

random

yes

Strategy for choosing the next relationship, modes: random and node2vec.

inOut

Float

1.0

yes

Parameter for node2vec.

return

Float

1.0

yes

Parameter for node2vec.

path

Boolean

false

yes

If the more expensive operation of creating a path from node-ids should be performed and returned in results.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'.

readConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for reading the graph.

Table 6.410. Results
Name Type Description

startNodeId

Integer

Node ID starting the path.

nodeIds

Integer[]

List of Node ID forming a path.

path

Path

Optional Path (with virtual relationships).

6.5.7.5. Random Walk algorithm sample

This sample will explain the Random Walk algorithm, using a simple graph:

pagerank

The following will create a sample graph: 

CREATE (home:Page {name: 'Home'}),
       (about:Page {name: 'About'}),
       (product:Page {name: 'Product'}),
       (links:Page {name: 'Links'}),
       (a:Page {name: 'Site A'}),
       (b:Page {name: 'Site B'}),
       (c:Page {name: 'Site C'}),
       (d:Page {name: 'Site D'}),

       (home)-[:LINKS]->(about),
       (about)-[:LINKS]->(home),
       (product)-[:LINKS]->(home),
       (home)-[:LINKS]->(product),
       (links)-[:LINKS]->(home),
       (home)-[:LINKS]->(links),
       (links)-[:LINKS]->(a),
       (a)-[:LINKS]->(home),
       (links)-[:LINKS]->(b),
       (b)-[:LINKS]->(home),
       (links)-[:LINKS]->(c),
       (c)-[:LINKS]->(home),
       (links)-[:LINKS]->(d),
       (d)-[:LINKS]->(home)

The following will run the algorithm starting from the Home page and returning a 1 random walk, of path length 3: 

MATCH (home:Page {name: 'Home'})
CALL gds.alpha.randomWalk.stream({
  nodeProjection: '*',
  relationshipProjection: {
    LINKS: {
      type: 'LINKS',
      orientation: 'UNDIRECTED'
    }
  },
  start: id(home),
  steps: 3,
  walks: 1
})
YIELD nodeIds
UNWIND nodeIds AS nodeId
RETURN gds.util.asNode(nodeId).name AS page

Table 6.411. Results
page

"Home"

"Site C"

"Links"

"Site A"

6.5.7.6. Cypher projection

If node label and relationship type are not selective enough to create the graph projection to run the algorithm on, you can use Cypher queries to project your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 4.3, “Cypher projection” section of the manual.

MATCH (home:Page {name: 'Home'})
CALL gds.alpha.randomWalk.stream({
  nodeQuery: 'MATCH (p:Page) RETURN id(p) AS id',
  relationshipQuery: 'MATCH (p1:Page)-[:LINKS]->(p2:Page) RETURN id(p1) AS source, id(p2) AS target',
  start: id(home),
  steps: 5,
  walks: 1
})
YIELD nodeIds
UNWIND nodeIds AS nodeId
RETURN gds.util.asNode(nodeId).name AS page