Applying a trained model for prediction
This feature is in the beta tier. For more information on feature tiers, see API Tiers.
In the previous sections we have seen how to build up a Node Classification training pipeline and train it to produce a classification pipeline.
After training, the runnable model is of type NodeClassification and resides in the model catalog.
The classification model can be executed with a graph in the graph catalog to predict the class of previously unseen nodes. In addition to the predicted class for each node, the predicted probability for each class may also be retained on the nodes. The order of the probabilities matches the order of the classes registered in the model.
Since the model has been trained on features which are created using the feature pipeline, the same feature pipeline is stored within the model and executed at prediction time. As during training, intermediate node properties created by the node property steps in the feature pipeline are transient and not visible after execution.
The predict graph must contain the properties that the pipeline requires and the used array properties must have the same dimensions as in the train graph. If the predict and train graphs are distinct, it is also beneficial that they have similar origins and semantics, so that the model is able to generalize well.
Syntax
CALL gds.beta.pipeline.nodeClassification.predict.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeId: Integer,
  predictedClass: Integer,
  predictedProbabilities: List of Float| Name | Type | Default | Optional | Description | 
|---|---|---|---|---|
| graphName | String | 
 | no | The name of a graph stored in the catalog. | 
| configuration | Map | 
 | yes | Configuration for algorithm-specifics and/or graph filtering. | 
| Name | Type | Default | Optional | Description | 
|---|---|---|---|---|
| modelName | String | 
 | no | The name of a NodeClassification model in the model catalog. | 
| targetNodeLabels | List of String | 
 | yes | Filter the named graph using the given targetNodeLabels. | 
| relationshipTypes | List of String | 
 | yes | Filter the named graph using the given relationship types. | 
| Integer | 
 | yes | The number of concurrent threads used for running the algorithm. | |
| String | 
 | yes | An ID that can be provided to more easily track the algorithm’s progress. | |
| Boolean | 
 | yes | If disabled the progress percentage will not be logged. | |
| includePredictedProbabilities | Boolean | 
 | yes | Whether to return the probability for each class. If  | 
| 1. In a GDS Session, the default is the number of available processors. | ||||
| Name | Type | Description | 
|---|---|---|
| nodeId | Integer | Node ID. | 
| predictedClass | Integer | Predicted class for this node. | 
| predictedProbabilities | List of Float | Probabilities for all classes, for this node. | 
CALL gds.beta.pipeline.nodeClassification.predict.mutate(
  graphName: String,
  configuration: Map
)
YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  mutateMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map| Name | Type | Default | Optional | Description | 
|---|---|---|---|---|
| graphName | String | 
 | no | The name of a graph stored in the catalog. | 
| configuration | Map | 
 | yes | Configuration for algorithm-specifics and/or graph filtering. | 
| Name | Type | Default | Optional | Description | 
|---|---|---|---|---|
| modelName | String | 
 | no | The name of a NodeClassification model in the model catalog. | 
| mutateProperty | String | 
 | no | The node property in the GDS graph to which the predicted property is written. | 
| targetNodeLabels | List of String | 
 | yes | Filter the named graph using the given targetNodeLabels. | 
| relationshipTypes | List of String | 
 | yes | Filter the named graph using the given relationship types. | 
| Integer | 
 | yes | The number of concurrent threads used for running the algorithm. | |
| String | 
 | yes | An ID that can be provided to more easily track the algorithm’s progress. | |
| Boolean | 
 | yes | If disabled the progress percentage will not be logged. | |
| predictedProbabilityProperty | String | 
 | yes | The node property in which the class probability list is stored. If omitted, the probability list is discarded. The order of the classes can be inspected in the  | 
| 2. In a GDS Session, the default is the number of available processors. | ||||
| Name | Type | Description | 
|---|---|---|
| preProcessingMillis | Integer | Milliseconds for preprocessing the graph. | 
| computeMillis | Integer | Milliseconds for running the algorithm. | 
| postProcessingMillis | Integer | Milliseconds for computing the global metrics. | 
| mutateMillis | Integer | Milliseconds for adding properties to the in-memory graph. | 
| nodePropertiesWritten | Integer | Number of node properties written. | 
| configuration | Map | Configuration used for running the algorithm. | 
CALL gds.beta.pipeline.nodeClassification.predict.write(
  graphName: String,
  configuration: Map
)
YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  writeMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map| Name | Type | Default | Optional | Description | 
|---|---|---|---|---|
| graphName | String | 
 | no | The name of a graph stored in the catalog. | 
| configuration | Map | 
 | yes | Configuration for algorithm-specifics and/or graph filtering. | 
| Name | Type | Default | Optional | Description | 
|---|---|---|---|---|
| modelName | String | 
 | no | The name of a NodeClassification model in the model catalog. | 
| targetNodeLabels | List of String | 
 | yes | Filter the named graph using the given targetNodeLabels. | 
| relationshipTypes | List of String | 
 | yes | Filter the named graph using the given relationship types. | 
| Integer | 
 | yes | The number of concurrent threads used for running the algorithm. | |
| String | 
 | yes | An ID that can be provided to more easily track the algorithm’s progress. | |
| Boolean | 
 | yes | If disabled the progress percentage will not be logged. | |
| Integer | 
 | yes | The number of concurrent threads used for writing the result to Neo4j. | |
| String | 
 | no | The node property in the Neo4j database to which the predicted property is written. | |
| predictedProbabilityProperty | String | 
 | yes | The node property in which the class probability list is stored. If omitted, the probability list is discarded. The order of the classes can be inspected in the  | 
| 3. In a GDS Session, the default is the number of available processors. | ||||
| Name | Type | Description | 
|---|---|---|
| preProcessingMillis | Integer | Milliseconds for preprocessing the graph. | 
| computeMillis | Integer | Milliseconds for running the algorithm. | 
| postProcessingMillis | Integer | Milliseconds for computing the global metrics. | 
| writeMillis | Integer | Milliseconds for writing result back to Neo4j. | 
| nodePropertiesWritten | Integer | Number of node properties written. | 
| configuration | Map | Configuration used for running the algorithm. | 
Example
In the following examples we will show how to use a classification model to predict the class of a node in your in-memory graph.
In addition to the predicted class, we will also produce the probability for each class in another node property.
In order to do this, we must first have an already trained model registered in the Model Catalog.
We will use the model which we trained in the train example which we gave the name 'nc-pipeline-model'.
Memory Estimation
First off, we will estimate the cost of running the algorithm using the estimate procedure.
This can be done with any execution mode.
We will use the stream mode in this example.
Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have.
When you later actually run the algorithm in one of the execution modes the system will perform an estimation.
If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited.
To read more about this, see Automatic estimation and execution blocking.
For more details on estimate in general, see Memory Estimation.
CALL gds.beta.pipeline.nodeClassification.predict.stream.estimate('myGraph', {
  modelName: 'nc-pipeline-model',
  includePredictedProbabilities: true,
  targetNodeLabels: ['UnknownHouse']
})
YIELD requiredMemory| requiredMemory | 
|---|
| "792 Bytes" | 
| If a node property step does not have an estimation implemented, the step will be ignored in the estimation. | 
Stream
CALL gds.beta.pipeline.nodeClassification.predict.stream('myGraph', {
  modelName: 'nc-pipeline-model',
  includePredictedProbabilities: true,
  targetNodeLabels: ['UnknownHouse']
})
 YIELD nodeId, predictedClass, predictedProbabilities
WITH gds.util.asNode(nodeId) AS houseNode, predictedClass, predictedProbabilities
RETURN
  houseNode.color AS classifiedHouse,
  predictedClass,
  floor(predictedProbabilities[predictedClass] * 100) AS confidence
  ORDER BY classifiedHouse| classifiedHouse | predictedClass | confidence | 
|---|---|---|
| 
 | 
 | 
 | 
| 
 | 
 | 
 | 
| 
 | 
 | 
 | 
As we can see, the model was able to predict the pink house into class 0, tan house into class 1, and yellow house into class 2. This makes sense, as all houses in class 0 had three stories, class 1 two stories and class 2 one story, and the same is true of the pink, tan and yellow houses, respectively. Additionally, we see that the model is confident in these predictions, as the confidence is >=79% in all cases.
| The indices in the predictedProbabilitiescorrespond to the order of the classes in the classification model. To inspect the order of the classes, we can look at itsmodelInfo(see listing models). | 
Mutate
The mutate execution mode updates the named graph with a new node property containing the predicted class for that node.
The name of the new property is specified using the mandatory configuration parameter mutateProperty.
The result is a single summary row including information about timings and how many properties were written.
The mutate mode is especially useful when multiple algorithms are used in conjunction.
For more details on the mutate mode in general, see Mutate.
CALL gds.beta.pipeline.nodeClassification.predict.mutate('myGraph', {
  targetNodeLabels: ['UnknownHouse'],
  modelName: 'nc-pipeline-model',
  mutateProperty: 'predictedClass',
  predictedProbabilityProperty: 'predictedProbabilities'
}) YIELD nodePropertiesWritten| nodePropertiesWritten | 
|---|
| 6 | 
Since we specified also the predictedProbabilityProperty we are writing two properties for each of the 3 UnknownHouse nodes.
Write
The write execution mode writes the predicted property for each node as a property to the Neo4j database.
The name of the new property is specified using the mandatory configuration parameter writeProperty.
The result is a single summary row including information about timings and how many properties were written.
The write mode enables directly persisting the results to the database.
For more details on the write mode in general, see Write.
CALL gds.beta.pipeline.nodeClassification.predict.write('myGraph', {
  targetNodeLabels: ['UnknownHouse'],
  modelName: 'nc-pipeline-model',
  writeProperty: 'predictedClass',
  predictedProbabilityProperty: 'predictedProbabilities'
}) YIELD nodePropertiesWritten| nodePropertiesWritten | 
|---|
| 6 | 
Since we specified also the predictedProbabilityProperty we are writing two properties for each of the 3 UnknownHouse nodes.