Applying a trained model for prediction

This feature is in the alpha tier. For more information on feature tiers, see API Tiers.

In the previous sections we have seen how to build up a Node Regression training pipeline and train it to produce a regression model. After training, the produced, runnable model is of type NodeRegression and resides in the model catalog. The regression model can be applied on a graph in the graph catalog to predict a property value for previously unseen nodes.

Since the model has been trained on features which are created using the feature pipeline, the same feature pipeline is stored within the model and executed at prediction time. As during training, intermediate node properties created by the node property steps in the feature pipeline are transient and not visible after execution.

The predict graph must contain the properties that the pipeline requires and the used array properties must have the same dimensions as in the train graph. If the predict and train graphs are distinct, it is also beneficial that they have similar origins and semantics, so that the model is able to generalize well.

Syntax

Node Regression prediction syntax per mode

Run Node Regression in stream mode:

CALL gds.alpha.pipeline.nodeRegression.predict.stream(
  graphName: String,
  configuration: Map
) YIELD
  nodeId: Integer,
  predictedValue: Float

Table 1. Parameters
Name	Type	Default	Optional	Description
graphName	String	`n/a`	no	The name of a graph stored in the catalog.
configuration	Map	`{}`	yes	Configuration for algorithm-specifics and/or graph filtering.

Table 2. Configuration
Name	Type	Default	Optional	Description
modelName	String	`n/a`	no	The name of a NodeRegression model in the model catalog.
targetNodeLabels	List of String	`from trainConfig`	yes	Filter the named graph using the given targetNodeLabels.
relationshipTypes	List of String	`from trainConfig`	yes	Filter the named graph using the given relationship types.
concurrency	Integer	`4 ^[1]`	yes	The number of concurrent threads used for running the algorithm.
jobId	String	`Generated internally`	yes	An ID that can be provided to more easily track the algorithm’s progress.
logProgress	Boolean	`true`	yes	If disabled the progress percentage will not be logged.
1. In a GDS Session the default is the number of available processors

Table 3. Results
Name	Type	Description
nodeId	Integer	Node ID.
predictedValue	Float	Predicted property value for this node.

Run Node Regression in mutate mode:

CALL gds.alpha.pipeline.nodeRegression.predict.mutate(
  graphName: String,
  configuration: Map
) YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  mutateMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map

Table 4. Parameters
Name	Type	Default	Optional	Description
graphName	String	`n/a`	no	The name of a graph stored in the catalog.
configuration	Map	`{}`	yes	Configuration for algorithm-specifics and/or graph filtering.

Table 5. Configuration
Name	Type	Default	Optional	Description
modelName	String	`n/a`	no	The name of a NodeRegression model in the model catalog.
mutateProperty	String	`n/a`	no	The node property in the GDS graph to which the predicted property is written.
targetNodeLabels	List of String	`from trainConfig`	yes	Filter the named graph using the given targetNodeLabels.
relationshipTypes	List of String	`from trainConfig`	yes	Filter the named graph using the given relationship types.
concurrency	Integer	`4 ^[2]`	yes	The number of concurrent threads used for running the algorithm.
jobId	String	`Generated internally`	yes	An ID that can be provided to more easily track the algorithm’s progress.
logProgress	Boolean	`true`	yes	If disabled the progress percentage will not be logged.
2. In a GDS Session the default is the number of available processors

Table 6. Results
Name	Type	Description
preProcessingMillis	Integer	Milliseconds for preprocessing the graph.
computeMillis	Integer	Milliseconds for running the algorithm.
postProcessingMillis	Integer	Milliseconds for computing the global metrics.
mutateMillis	Integer	Milliseconds for adding properties to the in-memory graph.
nodePropertiesWritten	Integer	Number of node properties written.
configuration	Map	Configuration used for running the algorithm.

Examples

In the following examples we will show how to use a regression model to predict a property value of a node in your in-memory graph. In order to do this, we must first have an already trained model registered in the Model Catalog. We will use the model which we trained in the train example which we gave the name 'nr-pipeline-model'.

Stream

CALL gds.alpha.pipeline.nodeRegression.predict.stream('myGraph', {
  modelName: 'nr-pipeline-model',
  targetNodeLabels: ['UnknownHouse']
}) YIELD nodeId, predictedValue
WITH gds.util.asNode(nodeId) AS houseNode, predictedValue AS predictedPrice
RETURN
  houseNode.color AS houseColor, predictedPrice
  ORDER BY predictedPrice

Table 7. Results
houseColor	predictedPrice
`"Tan"`	`87.26599999999999`
`"Yellow"`	`107.572`
`"Pink"`	`124.43800000000002`

As we can see, the model is predicting the "Tan" house to be the cheaper than the "Yellow" house. This may not seem accurate given that the "Yellow" house has only one story. To get a prediction that better matches our expectations, we may need to tune the model candidate parameters.

Mutate

The mutate execution mode updates the named graph with a new node property containing the predicted value for each node. The name of the new property is specified using the mandatory configuration parameter mutateProperty. The result is a single summary row including information about timings and how many properties were written. The mutate mode is especially useful when multiple algorithms are used in conjunction.

For more details on the mutate mode in general, see Mutate.

CALL gds.alpha.pipeline.nodeRegression.predict.mutate('myGraph', {
  targetNodeLabels: ['UnknownHouse'],
  modelName: 'nr-pipeline-model',
  mutateProperty: 'predictedPrice'
}) YIELD nodePropertiesWritten

Table 8. Results
nodePropertiesWritten
3

The output tells us that we added a property for each of the UnknownHouse nodes. To use this property, we can run another algorithm using the predictedPrice property, or inspect it using gds.graph.nodeProperty.stream.