Applying a trained model for prediction

This feature is in the beta tier. For more information on feature tiers, see Operations reference.

In the previous sections we have seen how to build up a Node Classification training pipeline and train it to produce a classification pipeline. After training, the runnable model is of type NodeClassification and resides in the model catalog.

The classification model can be executed with a graph in the graph catalog to predict the class of previously unseen nodes. In addition to the predicted class for each node, the predicted probability for each class may also be retained on the nodes. The order of the probabilities matches the order of the classes registered in the model.

Since the model has been trained on features which are created using the feature pipeline, the same feature pipeline is stored within the model and executed at prediction time. As during training, intermediate node properties created by the node property steps in the feature pipeline are transient and not visible after execution.

The predict graph must contain the properties that the pipeline requires and the used array properties must have the same dimensions as in the train graph. If the predict and train graphs are distinct, it is also beneficial that they have similar origins and semantics, so that the model is able to generalize well.

1. Syntax

Node Classification syntax per mode
Run Node Classification in stream mode on a named graph:
CALL gds.beta.pipeline.nodeClassification.predict.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeId: Integer,
  predictedClass: Integer,
  predictedProbabilities: List of Float
Table 1. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 2. Configuration
Name Type Default Optional Description

modelName

String

n/a

no

The name of a NodeClassification model in the model catalog.

targetNodeLabels

List of String

from trainConfig

yes

Filter the named graph using the given targetNodeLabels.

relationshipTypes

List of String

from trainConfig

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the algorithm’s progress.

includePredictedProbabilities

Boolean

false

yes

Whether to return the probability for each class. If false then null is returned in predictedProbabilites. The order of the classes can be inspected in the modelInfo of the classification model (see listing models).

Table 3. Results
Name Type Description

nodeId

Integer

Node ID.

predictedClass

Integer

Predicted class for this node.

predictedProbabilities

List of Float

Probabilities for all classes, for this node.

Run Node Classification in mutate mode on a named graph:
CALL gds.beta.pipeline.nodeClassification.predict.mutate(
  graphName: String,
  configuration: Map
)
YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  mutateMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map
Table 4. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5. Configuration
Name Type Default Optional Description

modelName

String

n/a

no

The name of a NodeClassification model in the model catalog.

mutateProperty

String

n/a

no

The node property in the GDS graph to which the predicted property is written.

targetNodeLabels

List of String

from trainConfig

yes

Filter the named graph using the given targetNodeLabels.

relationshipTypes

List of String

from trainConfig

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the algorithm’s progress.

predictedProbabilityProperty

String

n/a

yes

The node property in which the class probability list is stored. If omitted, the probability list is discarded. The order of the classes can be inspected in the modelInfo of the classification model (see listing models).

Table 6. Results
Name Type Description

preProcessingMillis

Integer

Milliseconds for preprocessing the graph.

computeMillis

Integer

Milliseconds for running the algorithm.

postProcessingMillis

Integer

Milliseconds for computing the global metrics.

mutateMillis

Integer

Milliseconds for adding properties to the in-memory graph.

nodePropertiesWritten

Integer

Number of node properties written.

configuration

Map

Configuration used for running the algorithm.

Run Node Classification in write mode on a named graph:
CALL gds.beta.pipeline.nodeClassification.predict.write(
  graphName: String,
  configuration: Map
)
YIELD
  preProcessingMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  writeMillis: Integer,
  nodePropertiesWritten: Integer,
  configuration: Map
Table 7. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 8. Configuration
Name Type Default Optional Description

modelName

String

n/a

no

The name of a NodeClassification model in the model catalog.

targetNodeLabels

List of String

from trainConfig

yes

Filter the named graph using the given targetNodeLabels.

relationshipTypes

List of String

from trainConfig

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm.

jobId

String

Generated internally

yes

An ID that can be provided to more easily track the algorithm’s progress.

writeConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for writing the result to Neo4j.

writeProperty

String

n/a

no

The node property in the Neo4j database to which the predicted property is written.

predictedProbabilityProperty

String

n/a

yes

The node property in which the class probability list is stored. If omitted, the probability list is discarded. The order of the classes can be inspected in the modelInfo of the classification model (see listing models).

Table 9. Results
Name Type Description

preProcessingMillis

Integer

Milliseconds for preprocessing the graph.

computeMillis

Integer

Milliseconds for running the algorithm.

postProcessingMillis

Integer

Milliseconds for computing the global metrics.

writeMillis

Integer

Milliseconds for writing result back to Neo4j.

nodePropertiesWritten

Integer

Number of node properties written.

configuration

Map

Configuration used for running the algorithm.

2. Example

In the following examples we will show how to use a classification model to predict the class of a node in your in-memory graph. In addition to the predicted class, we will also produce the probability for each class in another node property. In order to do this, we must first have an already trained model registered in the Model Catalog. We will use the model which we trained in the train example which we gave the name 'nc-pipeline-model'.

2.1. Memory Estimation

First off, we will estimate the cost of running the algorithm using the estimate procedure. This can be done with any execution mode. We will use the stream mode in this example. Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have. When you later actually run the algorithm in one of the execution modes the system will perform an estimation. If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited. To read more about this, see Automatic estimation and execution blocking.

For more details on estimate in general, see Memory Estimation.

The following will estimate the memory requirements for running the algorithm in stream mode:
CALL gds.beta.pipeline.nodeClassification.predict.stream.estimate('myGraph', {
  modelName: 'nc-pipeline-model',
  includePredictedProbabilities: true,
  targetNodeLabels: ['UnknownHouse']
})
YIELD requiredMemory
Table 10. Results
requiredMemory

"792 Bytes"

If a node property step does not have an estimation implemented, the step will be ignored in the estimation.

2.2. Stream

CALL gds.beta.pipeline.nodeClassification.predict.stream('myGraph', {
  modelName: 'nc-pipeline-model',
  includePredictedProbabilities: true,
  targetNodeLabels: ['UnknownHouse']
})
 YIELD nodeId, predictedClass, predictedProbabilities
WITH gds.util.asNode(nodeId) AS houseNode, predictedClass, predictedProbabilities
RETURN
  houseNode.color AS classifiedHouse,
  predictedClass,
  floor(predictedProbabilities[predictedClass] * 100) AS confidence
  ORDER BY classifiedHouse
Table 11. Results
classifiedHouse predictedClass confidence

"Pink"

0

96.0

"Tan"

1

97.0

"Yellow"

2

75.0

As we can see, the model was able to predict the pink house into class 0, tan house into class 1, and yellow house into class 2. This makes sense, as all houses in class 0 had three stories, class 1 two stories and class 2 one story, and the same is true of the pink, tan and yellow houses, respectively. Additionally, we see that the model is confident in these predictions, as the confidence is >=79% in all cases.

The indices in the predictedProbabilities correspond to the order of the classes in the classification model. To inspect the order of the classes, we can look at its modelInfo (see listing models).

2.3. Mutate

The mutate execution mode updates the named graph with a new node property containing the predicted class for that node. The name of the new property is specified using the mandatory configuration parameter mutateProperty. The result is a single summary row including information about timings and how many properties were written. The mutate mode is especially useful when multiple algorithms are used in conjunction.

For more details on the mutate mode in general, see Mutate.

CALL gds.beta.pipeline.nodeClassification.predict.mutate('myGraph', {
  targetNodeLabels: ['UnknownHouse'],
  modelName: 'nc-pipeline-model',
  mutateProperty: 'predictedClass',
  predictedProbabilityProperty: 'predictedProbabilities'
}) YIELD nodePropertiesWritten
Table 12. Results
nodePropertiesWritten

6

Since we specified also the predictedProbabilityProperty we are writing two properties for each of the 3 UnknownHouse nodes.

2.4. Write

The write execution mode writes the predicted property for each node as a property to the Neo4j database. The name of the new property is specified using the mandatory configuration parameter writeProperty. The result is a single summary row including information about timings and how many properties were written. The write mode enables directly persisting the results to the database.

For more details on the write mode in general, see Write.

CALL gds.beta.pipeline.nodeClassification.predict.write('myGraph', {
  targetNodeLabels: ['UnknownHouse'],
  modelName: 'nc-pipeline-model',
  writeProperty: 'predictedClass',
  predictedProbabilityProperty: 'predictedProbabilities'
}) YIELD nodePropertiesWritten
Table 13. Results
nodePropertiesWritten

6

Since we specified also the predictedProbabilityProperty we are writing two properties for each of the 3 UnknownHouse nodes.