Node2Vec

This section describes the Node2Vec node embedding algorithm in the Neo4j Graph Data Science library.

Node2Vec is a node embedding algorithm that computes a vector representation of a node based on random walks in the graph. The neighborhood is sampled through random walks. Using a number of random neighborhood samples, the algorithm trains a single hidden layer neural network. The neural network is trained to predict the likelihood that a node will occur in a walk based on the occurrence of another node.

For more information on this algorithm, see:

1. Syntax

Example 1. Node2Vec syntax per mode
Run Node2Vec in stream mode on a named graph.
CALL gds.alpha.node2vec.stream(
  graphName: String,
  configuration: Map
) YIELD
  nodeId: Integer,
  embedding: List<Float>
Table 1. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

Table 3. Algorithm specific configuration
Name Type Default Optional Description

walkLength

Integer

80

yes

Number of steps in a random walk.

walksPerNode

Integer

10

yes

Number of random walks to starting at each node.

windowSize

Integer

10

yes

Size of the context window when training the neural network.

walkBufferSize

Integer

1000

yes

Number of random walks to complete before starting training.

inOutFactor

Float

1.0

yes

Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local.

returnFactor

Float

1.0

yes

Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency.

negativeSamplingRate

Integer

5

yes

Number of negative samples to produce for each positive sample.

centerSamplingFactor

Float

0.001

yes

Factor for influencing the sampling distribution for center words. A higher value increases the probability that frequent words are down-sampled.

contextSamplingExponent

Float

0.75

yes

Exponent applied to the context word frequency to obtain the context word sampling distribution. A value of 1.0 samples proportionally to the frequency distribution. A value of 0.0 samples each word equally.

embeddingDimension

Integer

128

yes

Size of the computed node embeddings. Denoted by dimensions in the Node2Vec paper.

initialLearningRate

Float

0.01

yes

Learning rate used initially for training the neural network. The learning rate decreases during training.

minLearningRate

Float

0.0001

yes

Lower bound for learning rate as it is decreased during training.

iterations

Integer

1

yes

Number of training iterations.

Table 4. Results
Name Type Description

nodeId

Integer

The Neo4j node ID.

embedding

List<Float>

The computed node embedding.

Run Node2Vec in write mode on a graph stored in the catalog.
CALL gds.alpha.node2vec.write(
  graphName: String,
  configuration: Map
)
YIELD
  createMillis: Integer,
  computeMillis: Integer,
  writeMillis: Integer,
  nodeCount: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map
Table 5. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 6. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'writeConcurrency'.

writeConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for writing the result to Neo4j.

writeProperty

String

n/a

no

The node property in the Neo4j database to which the embedding is written.

Table 7. Algorithm specific configuration
Name Type Default Optional Description

walkLength

Integer

80

yes

Number of steps in a random walk.

walksPerNode

Integer

10

yes

Number of random walks to starting at each node.

windowSize

Integer

10

yes

Size of the context window when training the neural network.

walkBufferSize

Integer

1000

yes

Number of random walks to complete before starting training.

inOutFactor

Float

1.0

yes

Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local.

returnFactor

Float

1.0

yes

Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency.

negativeSamplingRate

Integer

5

yes

Number of negative samples to produce for each positive sample.

centerSamplingFactor

Float

0.001

yes

Factor for influencing the sampling distribution for center words. A higher value increases the probability that frequent words are down-sampled.

contextSamplingExponent

Float

0.75

yes

Exponent applied to the context word frequency to obtain the context word sampling distribution. A value of 1.0 samples proportionally to the frequency distribution. A value of 0.0 samples each word equally.

embeddingDimension

Integer

128

yes

Size of the computed node embeddings. Denoted by dimensions in the Node2Vec paper.

initialLearningRate

Float

0.01

yes

Learning rate used initially for training the neural network. The learning rate decreases during training.

minLearningRate

Float

0.0001

yes

Lower bound for learning rate as it is decreased during training.

iterations

Integer

1

yes

Number of training iterations.

Table 8. Results
Name Type Description

nodesCount

Integer

The number of nodes processed.

nodePropertiesWritten

Integer

The number of node properties written.

createMillis

Integer

Milliseconds for loading data.

computeMillis

Integer

Milliseconds for running the algorithm.

writeMillis

Integer

Milliseconds for writing result data back to Neo4j.

configuration

Map

The configuration used for running the algorithm.

1.1. Anonymous graphs

It is also possible to execute the algorithm on a graph that is projected in conjunction with the algorithm execution. In this case, the graph does not have a name, and we call it anonymous. When executing over an anonymous graph the configuration map contains a graph projection configuration as well as an algorithm configuration. All execution modes support execution on anonymous graphs, although we only show syntax and mode-specific configuration for the write mode for brevity.

For more information on syntax variants, see Syntax overview.

Run Node2Vec in write mode on an anonymous graph.
CALL gds.alpha.node2vec.write(
  configuration: Map
)
YIELD
  createMillis: Integer,
  computeMillis: Integer,
  writeMillis: Integer,
  nodeCount: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map
Table 9. General configuration for algorithm execution on an anonymous graph.
Name Type Default Optional Description

nodeProjection

String, String[] or Map

null

yes

The node projection used for anonymous graph creation via a Native projection.

relationshipProjection

String, String[] or Map

null

yes

The relationship projection used for anonymous graph creation a Native projection.

nodeQuery

String

null

yes

The Cypher query used to select the nodes for anonymous graph creation via a Cypher projection.

relationshipQuery

String

null

yes

The Cypher query used to select the relationships for anonymous graph creation via a Cypher projection.

nodeProperties

String, String[] or Map

null

yes

The node properties to project during anonymous graph creation.

relationshipProperties

String, String[] or Map

null

yes

The relationship properties to project during anonymous graph creation.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'.

readConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for creating the graph.

writeConcurrency

Integer

value of 'concurrency'

yes

WRITE mode only: The number of concurrent threads used for writing the result.

writeProperty

String

n/a

no

WRITE mode only: The node property to which the embedding is written to.

Table 10. Algorithm specific configuration
Name Type Default Optional Description

walkLength

Integer

80

yes

Number of steps in a random walk.

walksPerNode

Integer

10

yes

Number of random walks to starting at each node.

windowSize

Integer

10

yes

Size of the context window when training the neural network.

walkBufferSize

Integer

1000

yes

Number of random walks to complete before starting training.

inOutFactor

Float

1.0

yes

Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local.

returnFactor

Float

1.0

yes

Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency.

negativeSamplingRate

Integer

5

yes

Number of negative samples to produce for each positive sample.

centerSamplingFactor

Float

0.001

yes

Factor for influencing the sampling distribution for center words. A higher value increases the probability that frequent words are down-sampled.

contextSamplingExponent

Float

0.75

yes

Exponent applied to the context word frequency to obtain the context word sampling distribution. A value of 1.0 samples proportionally to the frequency distribution. A value of 0.0 samples each word equally.

embeddingDimension

Integer

128

yes

Size of the computed node embeddings. Denoted by dimensions in the Node2Vec paper.

initialLearningRate

Float

0.01

yes

Learning rate used initially for training the neural network. The learning rate decreases during training.

minLearningRate

Float

0.0001

yes

Lower bound for learning rate as it is decreased during training.

iterations

Integer

1

yes

Number of training iterations.

The results are the same as for running write mode with a named graph, see the write mode syntax above.

2. Examples

Consider the graph created by the following Cypher statement:

CREATE (alice:Person {name: 'Alice'})
CREATE (bob:Person {name: 'Bob'})
CREATE (carol:Person {name: 'Carol'})
CREATE (dave:Person {name: 'Dave'})
CREATE (eve:Person {name: 'Eve'})
CREATE (guitar:Instrument {name: 'Guitar'})
CREATE (synth:Instrument {name: 'Synthesizer'})
CREATE (bongos:Instrument {name: 'Bongos'})
CREATE (trumpet:Instrument {name: 'Trumpet'})

CREATE (alice)-[:LIKES]->(guitar)
CREATE (alice)-[:LIKES]->(synth)
CREATE (alice)-[:LIKES]->(bongos)
CREATE (bob)-[:LIKES]->(guitar)
CREATE (bob)-[:LIKES]->(synth)
CREATE (carol)-[:LIKES]->(bongos)
CREATE (dave)-[:LIKES]->(guitar)
CREATE (dave)-[:LIKES]->(synth)
CREATE (dave)-[:LIKES]->(bongos);
CALL gds.graph.create('myGraph', ['Person', 'Instrument'], 'LIKES');
CALL gds.alpha.node2vec.stream('myGraph', {embeddingDimension: 2})
Table 11. Results
nodeId embedding

0

[-0.14295829832553864, 0.08884537220001221]

1

[0.016700705513358116, 0.2253911793231964]

2

[-0.06589698046445847, 0.042405471205711365]

3

[0.05862073227763176, 0.1193704605102539]

4

[0.10888434946537018, -0.18204474449157715]

5

[0.16728264093399048, 0.14098615944385529]

6

[-0.007779224775731564, 0.02114257402718067]

7

[-0.213893860578537, 0.06195802614092827]

8

[0.2479933649301529, -0.137322798371315]