Node classification pipelines

This feature is in the beta tier. For more information on feature tiers, see API Tiers.

Node classification pipelines are featured in the end-to-end example Jupyter notebooks:

Node Classification is a common machine learning task applied to graphs: training models to classify nodes. Concretely, Node Classification models are used to predict the classes of unlabeled nodes as a node properties based on other node properties. During training, the property representing the class of the node is referred to as the target property. GDS supports both binary and multi-class node classification.

In GDS, we have Node Classification pipelines which offer an end-to-end workflow, from feature extraction to node classification. The training pipelines reside in the pipeline catalog. When a training pipeline is executed, a classification model is created and stored in the model catalog.

A training pipeline is a sequence of two phases:

The graph is augmented with new node properties in a series of steps.
The augmented graph is used for training a node classification model.

One can configure which steps should be included above. The steps execute GDS algorithms that create new node properties. After configuring the node property steps, one can select a subset of node properties to be used as features. The training phase (II) trains multiple model candidates using cross-validation, selects the best one, and reports relevant performance metrics.

After training the pipeline, a classification model is created. This model includes the node property steps and feature configuration from the training pipeline and uses them to generate the relevant features for classifying unlabeled nodes. The classification model can be applied to predict the class of previously unseen nodes. In addition to the predicted class for each node, the predicted probability for each class may also be retained on the nodes. The order of the probabilities matches the order of the classes registered in the model.

Classification can only be done with a classification model (not with a training pipeline).

This segment is divided into the following pages: