Node embeddings

Node embedding algorithms compute low-dimensional vector representations of nodes in a graph. These vectors, also called embeddings, can be used for machine learning. The Neo4j Graph Data Science library contains the following node embedding algorithms:

Production-quality
- FastRP
Beta

Generalization across graphs

Node embeddings are typically used as input to downstream machine learning tasks such as node classification, link prediction and kNN similarity graph construction.

Often the graph used for constructing the embeddings and training the downstream model differs from the graph on which predictions are made. Compared to normal machine learning where we just have a stream of independent examples from some distribution, we now have graphs that are used to generate a set of labeled examples. Therefore, we must ensure that the set of training examples is representative of the set of labeled examples derived from the prediction graph. For this to work, certain things are required of the embedding algorithm, and we denote such algorithms as inductive ^[1].

In the GDS library the algorithms

GraphSAGE
HashGNN with featureProperties and a randomSeed
FastRP with propertyRatio=1.0 and a randomSeed

are inductive.

Embedding algorithms that are not inductive we call transductive. Their usage should be limited to the case where the test graph and predict graph are the same. An example of such an algorithm is Node2Vec.

1. This practical definition of induction may not agree completely with definitions elsewhere