Node2Vec
This section describes the Node2Vec node embedding algorithm in the Neo4j Graph Data Science library.
Node2Vec is a node embedding algorithm that computes a vector representation of a node based on random walks in the graph. The neighborhood is sampled through random walks. Using a number of random neighborhood samples, the algorithm trains a single hidden layer neural network. The neural network is trained to predict the likelihood that a node will occur in a walk based on the occurrence of another node.
For more information on this algorithm, see:
1. Syntax
CALL gds.alpha.node2vec.stream(
graphName: String,
configuration: Map
) YIELD
nodeId: Integer,
embedding: List<Float>
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

nodeLabels 
String[] 

yes 
Filter the named graph using the given node labels. 
String[] 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 
Name  Type  Default  Optional  Description 

walkLength 
Integer 

yes 
Number of steps in a random walk. 
walksPerNode 
Integer 

yes 
Number of random walks to starting at each node. 
windowSize 
Integer 

yes 
Size of the context window when training the neural network. 
walkBufferSize 
Integer 

yes 
Number of random walks to complete before starting training. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
negativeSamplingRate 
Integer 

yes 
Number of negative samples to produce for each positive sample. 
centerSamplingFactor 
Float 

yes 
Factor for influencing the sampling distribution for center words. A higher value increases the probability that frequent words are downsampled. 
contextSamplingExponent 
Float 

yes 
Exponent applied to the context word frequency to obtain the context word sampling distribution. A value of 1.0 samples proportionally to the frequency distribution. A value of 0.0 samples each word equally. 
embeddingDimension 
Integer 

yes 
Size of the computed node embeddings. Denoted by 
initialLearningRate 
Float 

yes 
Learning rate used initially for training the neural network. The learning rate decreases during training. 
minLearningRate 
Float 

yes 
Lower bound for learning rate as it is decreased during training. 
iterations 
Integer 

yes 
Number of training iterations. 
Name  Type  Description 


Integer 
The Neo4j node ID. 

List<Float> 
The computed node embedding. 
CALL gds.alpha.node2vec.write(
graphName: String,
configuration: Map
)
YIELD
createMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
nodeCount: Integer,
nodePropertiesWritten: Integer,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

nodeLabels 
String[] 

yes 
Filter the named graph using the given node labels. 
String[] 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. Also provides the default value for 'writeConcurrency'. 

Integer 

yes 
The number of concurrent threads used for writing the result to Neo4j. 

String 

no 
The node property in the Neo4j database to which the embedding is written. 
Name  Type  Default  Optional  Description 

walkLength 
Integer 

yes 
Number of steps in a random walk. 
walksPerNode 
Integer 

yes 
Number of random walks to starting at each node. 
windowSize 
Integer 

yes 
Size of the context window when training the neural network. 
walkBufferSize 
Integer 

yes 
Number of random walks to complete before starting training. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
negativeSamplingRate 
Integer 

yes 
Number of negative samples to produce for each positive sample. 
centerSamplingFactor 
Float 

yes 
Factor for influencing the sampling distribution for center words. A higher value increases the probability that frequent words are downsampled. 
contextSamplingExponent 
Float 

yes 
Exponent applied to the context word frequency to obtain the context word sampling distribution. A value of 1.0 samples proportionally to the frequency distribution. A value of 0.0 samples each word equally. 
embeddingDimension 
Integer 

yes 
Size of the computed node embeddings. Denoted by 
initialLearningRate 
Float 

yes 
Learning rate used initially for training the neural network. The learning rate decreases during training. 
minLearningRate 
Float 

yes 
Lower bound for learning rate as it is decreased during training. 
iterations 
Integer 

yes 
Number of training iterations. 
Name  Type  Description 

nodesCount 
Integer 
The number of nodes processed. 
nodePropertiesWritten 
Integer 
The number of node properties written. 
createMillis 
Integer 
Milliseconds for loading data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
writeMillis 
Integer 
Milliseconds for writing result data back to Neo4j. 
configuration 
Map 
The configuration used for running the algorithm. 
1.1. Anonymous graphs
It is also possible to execute the algorithm on a graph that is projected in conjunction with the algorithm execution.
In this case, the graph does not have a name, and we call it anonymous.
When executing over an anonymous graph the configuration map contains a graph projection configuration as well as an algorithm configuration.
All execution modes support execution on anonymous graphs, although we only show syntax and modespecific configuration for the write
mode for brevity.
For more information on syntax variants, see Syntax overview.
CALL gds.alpha.node2vec.write(
configuration: Map
)
YIELD
createMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
nodeCount: Integer,
nodePropertiesWritten: Integer,
configuration: Map
Name  Type  Default  Optional  Description 

nodeProjection 
String, String[] or Map 

yes 
The node projection used for anonymous graph creation via a Native projection. 
relationshipProjection 
String, String[] or Map 

yes 
The relationship projection used for anonymous graph creation a Native projection. 
nodeQuery 
String 

yes 
The Cypher query used to select the nodes for anonymous graph creation via a Cypher projection. 
relationshipQuery 
String 

yes 
The Cypher query used to select the relationships for anonymous graph creation via a Cypher projection. 
nodeProperties 
String, String[] or Map 

yes 
The node properties to project during anonymous graph creation. 
relationshipProperties 
String, String[] or Map 

yes 
The relationship properties to project during anonymous graph creation. 
Integer 

yes 
The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'. 

readConcurrency 
Integer 

yes 
The number of concurrent threads used for creating the graph. 
Integer 

yes 
WRITE mode only: The number of concurrent threads used for writing the result. 

String 

no 
WRITE mode only: The node property to which the embedding is written to. 
Name  Type  Default  Optional  Description 

walkLength 
Integer 

yes 
Number of steps in a random walk. 
walksPerNode 
Integer 

yes 
Number of random walks to starting at each node. 
windowSize 
Integer 

yes 
Size of the context window when training the neural network. 
walkBufferSize 
Integer 

yes 
Number of random walks to complete before starting training. 
inOutFactor 
Float 

yes 
Tendency of the random walk to stay close to the start node or fan out in the graph. Higher value means stay local. 
returnFactor 
Float 

yes 
Tendency of the random walk to return to the last visited node. A value below 1.0 means a higher tendency. 
negativeSamplingRate 
Integer 

yes 
Number of negative samples to produce for each positive sample. 
centerSamplingFactor 
Float 

yes 
Factor for influencing the sampling distribution for center words. A higher value increases the probability that frequent words are downsampled. 
contextSamplingExponent 
Float 

yes 
Exponent applied to the context word frequency to obtain the context word sampling distribution. A value of 1.0 samples proportionally to the frequency distribution. A value of 0.0 samples each word equally. 
embeddingDimension 
Integer 

yes 
Size of the computed node embeddings. Denoted by 
initialLearningRate 
Float 

yes 
Learning rate used initially for training the neural network. The learning rate decreases during training. 
minLearningRate 
Float 

yes 
Lower bound for learning rate as it is decreased during training. 
iterations 
Integer 

yes 
Number of training iterations. 
The results are the same as for running write mode with a named graph, see the write mode syntax above.
2. Examples
Consider the graph created by the following Cypher statement:
CREATE (alice:Person {name: 'Alice'})
CREATE (bob:Person {name: 'Bob'})
CREATE (carol:Person {name: 'Carol'})
CREATE (dave:Person {name: 'Dave'})
CREATE (eve:Person {name: 'Eve'})
CREATE (guitar:Instrument {name: 'Guitar'})
CREATE (synth:Instrument {name: 'Synthesizer'})
CREATE (bongos:Instrument {name: 'Bongos'})
CREATE (trumpet:Instrument {name: 'Trumpet'})
CREATE (alice)[:LIKES]>(guitar)
CREATE (alice)[:LIKES]>(synth)
CREATE (alice)[:LIKES]>(bongos)
CREATE (bob)[:LIKES]>(guitar)
CREATE (bob)[:LIKES]>(synth)
CREATE (carol)[:LIKES]>(bongos)
CREATE (dave)[:LIKES]>(guitar)
CREATE (dave)[:LIKES]>(synth)
CREATE (dave)[:LIKES]>(bongos);
CALL gds.graph.create('myGraph', ['Person', 'Instrument'], 'LIKES');
CALL gds.alpha.node2vec.stream('myGraph', {embeddingDimension: 2})
nodeId  embedding 

0 
[0.14295829832553864, 0.08884537220001221] 
1 
[0.016700705513358116, 0.2253911793231964] 
2 
[0.06589698046445847, 0.042405471205711365] 
3 
[0.05862073227763176, 0.1193704605102539] 
4 
[0.10888434946537018, 0.18204474449157715] 
5 
[0.16728264093399048, 0.14098615944385529] 
6 
[0.007779224775731564, 0.02114257402718067] 
7 
[0.213893860578537, 0.06195802614092827] 
8 
[0.2479933649301529, 0.137322798371315] 
Was this page helpful?