Machine Learning

This section covers migration for Machine Learning algorithms in the Neo4j Graph Data Science library.

1. Node Classification

The original alpha version of node classification has been completely removed and incorporated into node classification pipelines. Before training a node classification model, you must create and configure a training pipeline.

1.1. Train

Some parts of the training are now configured in specific configuration procedures for the training pipeline. These must precede calling the train procedure in order to be effective. The remaining parts are moved to the pipeline train procedure. Please see the table below.

Table 1. Changes in configuration for train
1.x 2.x

modelName

This parameter is now only configured in gds.beta.pipeline.nodeClassification.train.

featuresProperties

This parameter is replaced by gds.beta.pipeline.nodeClassification.selectFeatures. There is now also a procedure gds.beta.pipeline.nodeClassification.addNodeProperty to compute node properties for the input graph in the training pipeline and produced classification model.

targetProperty

This parameter is now only configured in gds.beta.pipeline.nodeClassification.train.

holdoutFraction

This parameter is now named testFraction and configured in gds.beta.pipeline.nodeClassification.configureSplit.

validationFolds

This parameter is now only configured in gds.beta.pipeline.nodeClassification.configureSplit.

metrics

This parameter is now only configured in gds.beta.pipeline.nodeClassification.train.

params

This parameter is replaced by gds.beta.pipeline.nodeClassification.addLogisticRegression, allowing configuration for a single model candidate. The procedure can be called several times to add several model candidates. There is also a new option for using random forest as a model candidate with gds.alpha.pipeline.nodeClassification.addRandomForest.

randomSeed

This parameter is now only configured in gds.beta.pipeline.nodeClassification.train.

Table 2. Changes in configuration for the pipeline
1.x 2.x

gds.beta.pipeline.nodeClassification.configureParams

1.2. Predict

Apart from the parameters listed below, the API for node classification prediction is the same as before but with different procedures. These procedures are gds.beta.pipeline.nodeClassification.predict.[mutate,stream,write].

Table 3. Changes in configuration for predict
1.x 2.x

batchSize

Batch size is optimized internally and no longer user-configurable.

Table 4. Prediction procedure replacements:
1.x 2.x

gds.alpha.ml.nodeClassification.predict.stream

gds.beta.pipeline.nodeClassification.predict.stream

gds.alpha.ml.nodeClassification.predict.mutate

gds.beta.pipeline.nodeClassification.predict.mutate

gds.alpha.ml.nodeClassification.predict.write

gds.beta.pipeline.nodeClassification.predict.write

The original alpha version of link prediction has been completely removed and incorporated into link prediction pipelines. Before training a link prediction model, you must create and configure a training pipeline.

2.1. Train

Some parts of the training are now configured in specific configuration procedures for the training pipeline. These must precede calling the train procedure in order to be effective. The remaining parts are moved to the pipeline train procedure. Please see the table below.

Table 5. Changes in configuration for train
1.x 2.x

modelName

This parameter is now only configured in gds.beta.pipeline.linkPrediction.train.

featuresProperties

Replaced by nodeProperties in gds.beta.pipeline.linkPrediction.addFeature. There is also a procedure gds.beta.pipeline.linkPrediction.addNodeProperty to compute node properties for the input graph in the training pipeline and produced classification model.

linkFeatureCombiner

Replaced by the second positional argument to gds.beta.pipeline.linkPrediction.addFeature, called featureType.

trainRelationshipType and testRelationshipType

These parameters are removed. Use gds.beta.pipeline.linkPrediction.configureSplit to set up the dataset split.

validationFolds

This parameter is now only configured in gds.beta.pipeline.linkPrediction.configureSplit.

negativeClassWeight

This parameter is now only configured in gds.beta.pipeline.linkPrediction.train.

params

This parameter is replaced by gds.beta.pipeline.linkPrediction.addLogisticRegression, allowing configuration for a single model candidate. The procedure can be called several times to add several model candidates. There is also a new option for using random forest as a model candidate with gds.alpha.pipeline.linkPrediction.addRandomForest.

randomSeed

This parameter is now only configured in gds.beta.pipeline.linkPrediction.train.

Table 6. Changes in configuration for the pipeline
1.x 2.x

gds.beta.pipeline.linkPrediction.configureParams

This procedure, which is no longer present, added logistic regression model candidates. Adding logistic regression candidates, can instead be done by calling gds.beta.pipeline.linkPrediction.addLogisticRegression one or multiple times.

2.2. Predict

The API for link prediction classification is the same as before, but with different procedures. These procedures are gds.beta.pipeline.linkPrediction.predict.[mutate,stream]. However, there’s no longer a write mode for link prediction classification, but it’s still possible to emulate this behavior using the mutate mode followed by gds.graph.writeRelationships.

Table 7. Prediction procedure replacements:
1.x 2.x

gds.alpha.ml.linkPrediction.predict.stream

gds.beta.pipeline.linkPrediction.predict.stream

gds.alpha.ml.linkPrediction.predict.mutate

gds.beta.pipeline.linkPrediction.predict.mutate

gds.alpha.ml.linkPrediction.predict.write

-