Graph Embeddings

Prerequisites
Please have Neo4j (version 4.0 or later) and Graph Data Science Library (version 1.3 or later) downloaded and installed to use graph embeddings.

Intermediate

What are graph embeddings?

A graph embedding determines a fixed length vector representation for each entity (usually nodes) in our graph. These embeddings are a lower dimensional representation of the graph and preserve the graph’s topology.

Node embedding techniques usually consist of the following functions:

Similarity function

measures the similarity between nodes

Encoder function

generates the node embedding

Decoder function

reconstructs pairwise similarity

Loss function

checks the quality of the reconstruction

Use cases for graph embeddings

There are several use cases that are well suited for graph embeddings:

  • We can visually explore the data by reducing the embeddings to 2 or 3 dimensions with the help of algorithms like t-distributed stochastic neighbor embedding (t-SNE) and Principle Component Analysis (PCA).

  • We could build a kNN similarity graph from the embeddings. The similarity graph could then be used to make recommendations as part of a k-Nearest Neighbors query.

  • We can use the embeddings as the features to feed into a machine learning model, rather than generating those features by hand. In this use case, embeddings can be considered an implementation of Representational Learning.

Neo4j’s graph embeddings

The Neo4j Graph Data Science Library supports several graph embedding algorithms.

Name Speed Supports node properties? Implementation details

Random Projection

Fast

No

Linear Algebra

node2Vec

Intermediate

No

Neural Network

GraphSAGE

Slow

Yes

Neural Network

All the embedding algorithms work on a monopartite undirected input graph.

Random Projection

The Random Projection embedding uses sparse random projections to generate embeddings. It is an implementation of the FastRP algorithm. It is the fastest of the embedding algorithms and can therefore be useful for obtaining baseline embeddings. The embeddings it generates are often equally performant as more complex algorithms that take longer to run.

node2Vec

node2Vec computes embeddings based on biased random walks of a node’s neighborhood. The algorithm trains a single-layer feedforward neural network, which is used to predict the likelihood that a node will occur in a walk based on the occurrence of another node. node2Vec has parameters that can be tuned to control whether the random walks behave more like breadth first or depth first searches. This tuning allows the embedding to either capture homophily (similar embeddings capture network communities) or structural equivalence (similar embeddings capture similar structural roles of nodes).

GraphSAGE

This algorithm is the only one that supports node properties. Training embeddings that include node properties can be useful for including information beyond the topology of the graph, like meta data, attributes, or the results of other graph algorithms. GraphSAGE differs from the other algorithms in that it learns a function to calculate an embedding rather than training individual embeddings for each node.