GraphSAGE node classification prediction

In order to apply GraphSAGE node classification prediction, one must first have trained a GraphSAGE node classification model using the GraphSAGE node classification training endpoint.

This page provides instructions for how to use the GraphSAGE node classification prediction endpoint to predict class labels of new nodes.

Syntax

This section covers the syntax used to execute the GraphSAGE node classification prediction algorithm.

Run GraphSAGE node classification prediction.

CALL graph.gs_nc_predict(
  'CPU_X64_XS',                    (1)
  {
    ['defaultTablePrefix': '...',] (2)
    'project': {...},              (3)
    'compute': {...},              (4)
    'write':   {...}               (5)
  }
);

1	Compute pool selector.
2	Optional prefix for table references.
3	Project config.
4	Compute config.
5	Write config.

Table 1. Parameters
Name	Type	Default	Optional	Description
computePoolSelector	String	`n/a`	no	The selector for the compute pool on which to run the GraphSAGE node classification prediction job.
configuration	Map	`{}`	no	Configuration for graph project, algorithm compute and result write back.

For this algorithm we recommend using a GPU compute pool if the input graph is large or the model is deep, but otherwise it might be possible to get away with a CPU compute pool.

The configuration map consists of the following three entries.

For more details on below Project configuration, refer to the Project documentation.

Table 2. Project configuration
Name	Type
nodeTables	List of node tables.
relationshipTables	Map of relationship types to relationship tables.

Please note that in order for GraphSAGE to properly propagate updates of node embeddings, each type of node must be the target of at least one relationship type. The orientation parameter can be useful to add reverse direction relationships for types of nodes that are only the source of relationships (using the "REVERSE" or "UNDIRECTED" orientations).

Table 3. Compute configuration
Name	Type	Default	Optional	Description
modelname	String	`n/a`	no	The name of the trained model to use
batchSize	Integer	`Inherited`	yes	The number of target nodes to predict on in each batch. If not provided, the evaluation batch size that was used when training the model will be used
randomSeed	Integer	`A random integer`	yes	A number used to seed all randomness of the computation

For more details on below Write configuration, refer to the Write documentation.

Table 4. Write configuration
Name	Type	Default	Optional	Description
nodeLabel	String	`n/a`	no	Node label in the in-memory graph from which to write a node property.
outputTable	String	`n/a`	no	Table in Snowflake database to which node properties are written.

Example

For our example we will use an IMDB dataset with actors, directors, movies, and genres. These all have keywords associated with them, which we will use as features for the nodes. They are connected by relationships where actors act in movies and directors direct movies. The goal is to predict the genre of movies.

We have a database called imdb that contains the tables:

actor with columns nodeid and plot_keywords
movie with columns nodeid, plot_keywords and genre
director with columns nodeid and plot_keywords
acted_in with columns sourcenodeid and targetnodeid that represent actor and movie node IDs
directed_in with columns sourcenodeid and targetnodeid that represent director and movie node IDs

The plot_keywords columns contain keywords associated with the nodes, encoded as vectors of floats. The genre column contains the target class labels for the movie nodes, which we want to predict.

You can upload this dataset to your snowflake account by following the instructions at github: neo4j-product-examples/snowflake-graph-analytics.

The prediction query

We assume a model named nc-imdb has been trained using the GraphSAGE node classification training endpoint (see the example).

In the following predict query we specify the project configuration as we did during training. We only need to specify the modelname in the compute configuration as the rest is inherited from the training configuration.

Please also note that we provide write configuration to specify the tables where the computed predictions will be stored.

To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.

We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.

CALL Neo4j_Graph_Analytics.graph.gs_nc_predict('GPU_NV_S', {
    'defaultTablePrefix': 'imdb.gml',
    'project': {
        'nodeTables': ['actor', 'director', 'movie'],
        'relationshipTables': {
            'acted_in': {
                'sourceTable': 'actor',
                'targetTable': 'movie',
                'orientation': 'UNDIRECTED'
            },
            'directed_in': {
                'sourceTable': 'director',
                'targetTable': 'movie',
                'orientation': 'UNDIRECTED'
            }
        }
    },
    'compute': {
        'modelname': 'nc-imdb'
    },
    'write': [{
        'nodeLabel': 'movie',
        'outputTable': 'genre_predictions'
    }]
});

The above query should produce a result similar to the one below.

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_2223e0806c9842ddb3fe4b028335a500

2025-04-29 12:18:05.515

2025-04-29 12:18:51.723

{ "node_output_stats": { "movie": { "row_count": 4661, "table_name": "imdb.gml.genre_predictions" } } }

We can inspect the predictions and probabilities for the "first" 10 nodes by running

SELECT * FROM IMDB.GML.genre_predictions LIMIT 10;

which yields

NODEID	PREDICTED_CLASS	       PREDICTED_PROBABILITIES
  4467	              2	  [0.006307,0.028976,0.964717]
  4571	              2   [0.003825,0.039170,0.957005]
  3865	              2   [0.006841,0.058649,0.934510]
  2382	              2   [0.007916,0.048241,0.943842]
  2071	              2   [0.007994,0.015378,0.976628]
  3239	              1   [0.019813,0.941223,0.038963]
  2975	              1   [0.006499,0.946676,0.046826]
  4075	              1   [0.004404,0.948304,0.047292]
  2765	              2   [0.024119,0.007341,0.968539]
    29                0	  [0.915931,0.006881,0.077187]