Machine Learning

Node Classification

The original alpha version of node classification has been completely removed and incorporated into node classification pipelines. Before training a node classification model, you must create and configure a training pipeline.

Train

Some parts of the training are now configured in specific configuration procedures for the training pipeline. These must precede calling the train procedure in order to be effective. The remaining parts are moved to the pipeline train procedure. Please see the table below.

Table 1. Changes in configuration for train
1.x	2.x
`modelName`	This parameter is now only configured in `gds.beta.pipeline.nodeClassification.train`.
`featuresProperties`	This parameter is replaced by `gds.beta.pipeline.nodeClassification.selectFeatures`. There is now also a procedure `gds.beta.pipeline.nodeClassification.addNodeProperty` to compute node properties for the input graph in the training pipeline and produced classification model.
`targetProperty`	This parameter is now only configured in `gds.beta.pipeline.nodeClassification.train`.
`holdoutFraction`	This parameter is now named `testFraction` and configured in `gds.beta.pipeline.nodeClassification.configureSplit`.
`validationFolds`	This parameter is now only configured in `gds.beta.pipeline.nodeClassification.configureSplit`.
`metrics`	This parameter is now only configured in `gds.beta.pipeline.nodeClassification.train`.
`params`	This parameter is replaced by `gds.beta.pipeline.nodeClassification.addLogisticRegression`, allowing configuration for a single model candidate. The procedure can be called several times to add several model candidates. There is also a new option for using random forest as a model candidate with `gds.beta.pipeline.nodeClassification.addRandomForest`.
`randomSeed`	This parameter is now only configured in `gds.beta.pipeline.nodeClassification.train`.

Table 2. Changes in configuration for the pipeline
1.x	2.x
`gds.beta.pipeline.nodeClassification.configureParams`	This procedure, which is no longer present, added logistic regression model candidates. Adding logistic regression candidates, can instead be done by calling `gds.beta.pipeline.nodeClassification.addLogisticRegression` one or multiple times.

Predict

Apart from the parameters listed below, the API for node classification prediction is the same as before but with different procedures. These procedures are gds.beta.pipeline.nodeClassification.predict.[mutate,stream,write].

Table 3. Changes in configuration for predict
1.x	2.x
`batchSize`	Batch size is optimized internally and no longer user-configurable.

Table 4. Prediction procedure replacements:
1.x	2.x
`gds.alpha.ml.nodeClassification.predict.stream`	`gds.beta.pipeline.nodeClassification.predict.stream`
`gds.alpha.ml.nodeClassification.predict.mutate`	`gds.beta.pipeline.nodeClassification.predict.mutate`
`gds.alpha.ml.nodeClassification.predict.write`	`gds.beta.pipeline.nodeClassification.predict.write`

Link Prediction

The original alpha version of link prediction has been completely removed and incorporated into link prediction pipelines. Before training a link prediction model, you must create and configure a training pipeline.

Train

Table 5. Changes in configuration for train
1.x	2.x
`modelName`	This parameter is now only configured in `gds.beta.pipeline.linkPrediction.train`.
`featuresProperties`	Replaced by `nodeProperties` in `gds.beta.pipeline.linkPrediction.addFeature`. There is also a procedure `gds.beta.pipeline.linkPrediction.addNodeProperty` to compute node properties for the input graph in the training pipeline and produced classification model.
`linkFeatureCombiner`	Replaced by the second positional argument to `gds.beta.pipeline.linkPrediction.addFeature`, called `featureType`.
`trainRelationshipType` and `testRelationshipType`	These parameters are removed. Use `gds.beta.pipeline.linkPrediction.configureSplit` to set up the dataset split.
`validationFolds`	This parameter is now only configured in `gds.beta.pipeline.linkPrediction.configureSplit`.
`negativeClassWeight`	This parameter is now only configured in `gds.beta.pipeline.linkPrediction.train`.
`params`	This parameter is replaced by `gds.beta.pipeline.linkPrediction.addLogisticRegression`, allowing configuration for a single model candidate. The procedure can be called several times to add several model candidates. There is also a new option for using random forest as a model candidate with `gds.beta.pipeline.linkPrediction.addRandomForest`.
`randomSeed`	This parameter is now only configured in `gds.beta.pipeline.linkPrediction.train`.

Table 6. Changes in configuration for the pipeline
1.x	2.x
`gds.beta.pipeline.linkPrediction.configureParams`	This procedure, which is no longer present, added logistic regression model candidates. Adding logistic regression candidates, can instead be done by calling `gds.beta.pipeline.linkPrediction.addLogisticRegression` one or multiple times.

Predict

The API for link prediction classification is the same as before, but with different procedures. These procedures are gds.beta.pipeline.linkPrediction.predict.[mutate,stream]. However, there’s no longer a write mode for link prediction classification, but it’s still possible to emulate this behavior using the mutate mode followed by gds.graph.relationship.write.

Table 7. Prediction procedure replacements:
1.x	2.x
`gds.alpha.ml.linkPrediction.predict.stream`	`gds.beta.pipeline.linkPrediction.predict.stream`
`gds.alpha.ml.linkPrediction.predict.mutate`	`gds.beta.pipeline.linkPrediction.predict.mutate`
`gds.alpha.ml.linkPrediction.predict.write`	`-`