Applying a trained model for prediction

This feature is in the alpha tier. For more information on feature tiers, see API Tiers.

In the previous sections we have seen how to build up a Node Regression training pipeline and train it to produce a regression model. After training, the produced, runnable model is of type NodeRegression and resides in the model catalog. The regression model can be applied on a graph in the graph catalog to predict a property value for previously unseen nodes.

Since the model has been trained on features which are created using the feature pipeline, the same feature pipeline is stored within the model and executed at prediction time. As during training, intermediate node properties created by the node property steps in the feature pipeline are transient and not visible after execution.

The predict graph must contain the properties that the pipeline requires and the used array properties must have the same dimensions as in the train graph. If the predict and train graphs are distinct, it is also beneficial that they have similar origins and semantics, so that the model is able to generalize well.

Syntax

Node Regression prediction syntax per mode
Run Node Regression in stream mode:
CALL gds.alpha.pipeline.nodeRegression.predict.stream(
  graphName: String,
  configuration: Map
) YIELD
  nodeId: Integer,
  predictedValue: Float
Table 1. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. Configuration
Name Type Default Optional Description

modelName

String

n/a

no

The name of a NodeRegression model in the model catalog.

targetNodeLabels

List of String

from trainConfig

yes

Filter the named graph using the given targetNodeLabels.

relationshipTypes

List of String

from trainConfig

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the algorithm’s progress.

Table 3. Results
Name Type Description

nodeId

Integer

Node ID.

predictedValue

Float

Predicted property value for this node.

Run Node Regression in mutate mode:
CALL gds.alpha.pipeline.nodeRegression.predict.mutate(
  graphName: String,
  configuration: Map
) YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  mutateMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map
Table 4. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5. Configuration
Name Type Default Optional Description

modelName

String

n/a

no

The name of a NodeRegression model in the model catalog.

mutateProperty

String

n/a

no

The node property in the GDS graph to which the predicted property is written.

targetNodeLabels

List of String

from trainConfig

yes

Filter the named graph using the given targetNodeLabels.

relationshipTypes

List of String

from trainConfig

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the algorithm’s progress.

Table 6. Results
Name Type Description

preProcessingMillis

Integer

Milliseconds for preprocessing the graph.

computeMillis

Integer

Milliseconds for running the algorithm.

postProcessingMillis

Integer

Milliseconds for computing the global metrics.

mutateMillis

Integer

Milliseconds for adding properties to the in-memory graph.

nodePropertiesWritten

Integer

Number of node properties written.

configuration

Map

Configuration used for running the algorithm.

Examples

In the following examples we will show how to use a regression model to predict a property value of a node in your in-memory graph. In order to do this, we must first have an already trained model registered in the Model Catalog. We will use the model which we trained in the train example which we gave the name 'nr-pipeline-model'.

Stream

CALL gds.alpha.pipeline.nodeRegression.predict.stream('myGraph', {
  modelName: 'nr-pipeline-model',
  targetNodeLabels: ['UnknownHouse']
}) YIELD nodeId, predictedValue
WITH gds.util.asNode(nodeId) AS houseNode, predictedValue AS predictedPrice
RETURN
  houseNode.color AS houseColor, predictedPrice
  ORDER BY predictedPrice
Table 7. Results
houseColor predictedPrice

"Tan"

87.26599999999999

"Yellow"

107.572

"Pink"

124.43800000000002

As we can see, the model is predicting the "Tan" house to be the cheaper than the "Yellow" house. This may not seem accurate given that the "Yellow" house has only one story. To get a prediction that better matches our expectations, we may need to tune the model candidate parameters.

Mutate

The mutate execution mode updates the named graph with a new node property containing the predicted value for each node. The name of the new property is specified using the mandatory configuration parameter mutateProperty. The result is a single summary row including information about timings and how many properties were written. The mutate mode is especially useful when multiple algorithms are used in conjunction.

For more details on the mutate mode in general, see Mutate.

CALL gds.alpha.pipeline.nodeRegression.predict.mutate('myGraph', {
  targetNodeLabels: ['UnknownHouse'],
  modelName: 'nr-pipeline-model',
  mutateProperty: 'predictedPrice'
}) YIELD nodePropertiesWritten
Table 8. Results
nodePropertiesWritten

3

The output tells us that we added a property for each of the UnknownHouse nodes. To use this property, we can run another algorithm using the predictedPrice property, or inspect it using gds.graph.nodeProperty.stream.