Node2Vec
This section describes the Node2Vec node embedding algorithm in the Neo4j Graph Data Science library.
Node2Vec is a node embedding algorithm that computes a vector representation of a node based on random walks in the graph. The neighborhood is sampled through random walks. Using a number of random neighborhood samples, the algorithm trains a single hidden layer neural network. The neural network is trained to predict the likelihood that a node will occur in a walk based on the occurrence of another node.
For more information on this algorithm, see:
1. Random Walks
A main concept of the Node2Vec algorithm are the second order random walks.
A random walk simulates a traversal of the graph in which the traversed relationships are chosen at random.
In a classic random walk, each relationship has the same, possibly weighted, probability of being picked.
This probability is not influenced by the previously visited nodes.
The concept of second order random walks, however, tries to model the transition probability based on the currently visited node v
, the node t
visited before the current one, and the node x
which is the target of a candidate relationship.
Node2Vec random walks are thus influenced by two parameters: the returnFactor
and the inOutFactor
:

The
returnFactor
is used ift
equalsx
, i.e., the random walk returns to the previously visited node. 
The
inOutFactor
is used if the distance fromt
tox
is equal to 2, i.e., the walk traverses further away from the nodet
The probabilities for traversing a relationship during a random walk can be further influenced by specifying a relationshipWeightProperty
.
A relationship property value greater than 1 will increase the likelihood of a relationship being traversed, a property value between 0 and 1 will decrease that probability.
For every node in the graph Node2Vec generates a series of random walks with the particular node as start node.
The number of random walks per node can be influenced by the walkPerNode
configuration parameters, the walk length is controlled by the walkLength
parameter.
2. Syntax
CALL gds.beta.node2vec.stream(
graphName: String,
configuration: Map
) YIELD
nodeId: Integer,
embedding: List of Float
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 
Name  Type  Default  Optional  Description 

walkLength 
Integer 

yes 
The number of steps in a single random walk. 
walksPerNode 
Integer 

yes 
The number of random walks generated for each node. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
String 

yes 
Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted. 

windowSize 
Integer 

yes 
Size of the context window when training the neural network. 
negativeSamplingRate 
Integer 

yes 
Number of negative samples to produce for each positive sample. 
positiveSamplingFactor 
Float 

yes 
Factor for influencing the distribution for positive samples. A higher value increases the probability that frequent nodes are downsampled. 
negativeSamplingExponent 
Float 

yes 
Exponent applied to the node frequency to obtain the negative sampling distribution. A value of 1.0 samples proportionally to the frequency. A value of 0.0 samples each node equally. 
embeddingDimension 
Integer 

yes 
Size of the computed node embeddings. 
iterations 
Integer 

yes 
Number of training iterations. 
initialLearningRate 
Float 

yes 
Learning rate used initially for training the neural network. The learning rate decreases after each training iteration. 
minLearningRate 
Float 

yes 
Lower bound for learning rate as it is decreased during training. 
randomSeed 
Integer 

yes 
Seed value used to generate the random walks, which are used as the training set of the neural network. Note, that the generated embeddings are still nondeterministic. 
walkBufferSize 
Integer 

yes 
The number of random walks to complete before starting training. 
Name  Type  Description 


Integer 
The Neo4j node ID. 

List of Float 
The computed node embedding. 
CALL gds.beta.node2vec.mutate(
graphName: String,
configuration: Map
)
YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
postProcessingMillis: Integer,
mutateMillis: Integer,
nodeCount: Integer,
nodePropertiesWritten: Integer,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

mutateProperty 
String 

no 
The node property in the GDS graph to which the embedding is written. 
Name  Type  Default  Optional  Description 

walkLength 
Integer 

yes 
The number of steps in a single random walk. 
walksPerNode 
Integer 

yes 
The number of random walks generated for each node. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
String 

yes 
Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted. 

windowSize 
Integer 

yes 
Size of the context window when training the neural network. 
negativeSamplingRate 
Integer 

yes 
Number of negative samples to produce for each positive sample. 
positiveSamplingFactor 
Float 

yes 
Factor for influencing the distribution for positive samples. A higher value increases the probability that frequent nodes are downsampled. 
negativeSamplingExponent 
Float 

yes 
Exponent applied to the node frequency to obtain the negative sampling distribution. A value of 1.0 samples proportionally to the frequency. A value of 0.0 samples each node equally. 
embeddingDimension 
Integer 

yes 
Size of the computed node embeddings. 
iterations 
Integer 

yes 
Number of training iterations. 
initialLearningRate 
Float 

yes 
Learning rate used initially for training the neural network. The learning rate decreases after each training iteration. 
minLearningRate 
Float 

yes 
Lower bound for learning rate as it is decreased during training. 
randomSeed 
Integer 

yes 
Seed value used to generate the random walks, which are used as the training set of the neural network. Note, that the generated embeddings are still nondeterministic. 
walkBufferSize 
Integer 

yes 
The number of random walks to complete before starting training. 
Name  Type  Description 

nodeCount 
Integer 
The number of nodes processed. 
nodePropertiesWritten 
Integer 
The number of node properties written. 
preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
mutateMillis 
Integer 
Milliseconds for adding properties to the projected graph. 
postProcessingMillis 
Integer 
Milliseconds for postprocessing of the results. 
configuration 
Map 
The configuration used for running the algorithm. 
CALL gds.beta.node2vec.write(
graphName: String,
configuration: Map
)
YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
nodeCount: Integer,
nodePropertiesWritten: Integer,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. Also provides the default value for 'writeConcurrency'. 

Integer 

yes 
The number of concurrent threads used for writing the result to Neo4j. 

String 

no 
The node property in the Neo4j database to which the embedding is written. 
Name  Type  Default  Optional  Description 

walkLength 
Integer 

yes 
The number of steps in a single random walk. 
walksPerNode 
Integer 

yes 
The number of random walks generated for each node. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
String 

yes 
Name of the relationship property to use as weights to influence the probabilities of the random walks. The weights need to be >= 0. If unspecified, the algorithm runs unweighted. 

windowSize 
Integer 

yes 
Size of the context window when training the neural network. 
negativeSamplingRate 
Integer 

yes 
Number of negative samples to produce for each positive sample. 
positiveSamplingFactor 
Float 

yes 
Factor for influencing the distribution for positive samples. A higher value increases the probability that frequent nodes are downsampled. 
negativeSamplingExponent 
Float 

yes 
Exponent applied to the node frequency to obtain the negative sampling distribution. A value of 1.0 samples proportionally to the frequency. A value of 0.0 samples each node equally. 
embeddingDimension 
Integer 

yes 
Size of the computed node embeddings. 
iterations 
Integer 

yes 
Number of training iterations. 
initialLearningRate 
Float 

yes 
Learning rate used initially for training the neural network. The learning rate decreases after each training iteration. 
minLearningRate 
Float 

yes 
Lower bound for learning rate as it is decreased during training. 
randomSeed 
Integer 

yes 
Seed value used to generate the random walks, which are used as the training set of the neural network. Note, that the generated embeddings are still nondeterministic. 
walkBufferSize 
Integer 

yes 
The number of random walks to complete before starting training. 
Name  Type  Description 

nodeCount 
Integer 
The number of nodes processed. 
nodePropertiesWritten 
Integer 
The number of node properties written. 
preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
writeMillis 
Integer 
Milliseconds for writing result data back to Neo4j. 
configuration 
Map 
The configuration used for running the algorithm. 
3. Examples
Consider the graph created by the following Cypher statement:
CREATE (alice:Person {name: 'Alice'})
CREATE (bob:Person {name: 'Bob'})
CREATE (carol:Person {name: 'Carol'})
CREATE (dave:Person {name: 'Dave'})
CREATE (eve:Person {name: 'Eve'})
CREATE (guitar:Instrument {name: 'Guitar'})
CREATE (synth:Instrument {name: 'Synthesizer'})
CREATE (bongos:Instrument {name: 'Bongos'})
CREATE (trumpet:Instrument {name: 'Trumpet'})
CREATE (alice)[:LIKES]>(guitar)
CREATE (alice)[:LIKES]>(synth)
CREATE (alice)[:LIKES]>(bongos)
CREATE (bob)[:LIKES]>(guitar)
CREATE (bob)[:LIKES]>(synth)
CREATE (carol)[:LIKES]>(bongos)
CREATE (dave)[:LIKES]>(guitar)
CREATE (dave)[:LIKES]>(synth)
CREATE (dave)[:LIKES]>(bongos);
CALL gds.graph.project('myGraph', ['Person', 'Instrument'], 'LIKES');
myGraph
CALL gds.beta.node2vec.stream('myGraph', {embeddingDimension: 2})
YIELD nodeId, embedding
RETURN nodeId, embedding
nodeId  embedding 

0 
[0.14295829832553864, 0.08884537220001221] 
1 
[0.016700705513358116, 0.2253911793231964] 
2 
[0.06589698046445847, 0.042405471205711365] 
3 
[0.05862073227763176, 0.1193704605102539] 
4 
[0.10888434946537018, 0.18204474449157715] 
5 
[0.16728264093399048, 0.14098615944385529] 
6 
[0.007779224775731564, 0.02114257402718067] 
7 
[0.213893860578537, 0.06195802614092827] 
8 
[0.2479933649301529, 0.137322798371315] 
Was this page helpful?