GraphSAGE node embedding prediction
Neo4j Graph Analytics for Snowflake is in Public Preview and is not intended for production use. |
In order to apply GraphSAGE node embedding prediction, one must first have trained a GraphSAGE node embedding model using the GraphSAGE node embedding training endpoint.
This page provides instructions for how to use the GraphSAGE node embedding prediction endpoint to infer embeddings for new nodes.
Endpoint
The endpoint name is graph.gs_unsup_predict
, and it takes two positional arguments as input.
The first argument is a VARCHAR
that specifies which compute pool to use.
For this algorithm we recommend using a GPU compute pool if the input graph is large or the model is deep, but otherwise it might be possible to get away with a CPU compute pool.
The second argument is a JSON configuration map. This JSON must contain the three following keys:
Name |
Type |
Default |
Optional |
Description |
project |
Map |
n/a |
no |
Configuration for the input graph |
compute |
Map |
n/a |
no |
Configuration for algorithm-specific parameters |
write |
List |
n/a |
no |
Configuration for writing back algorithm results |
Input graph configuration
Name |
Type |
Default |
Optional |
Description |
nodeTables |
List |
n/a |
no |
A list of table names, which each represent a node label in the input graph |
relationshipTables |
Map |
n/a |
no |
A map from table names representing relationship types, to maps of configuration for that relationship type in the input graph (details below) |
defaultTablePrefix |
String |
n/a |
yes |
A default database and schema prefix to use for table names in the input graph. Should be of the format "<database>.<schema>" |
If a defaultTablePrefix
is not provided, all table names must be qualified with a database and schema name.
That is, they should be given a strings of the format "<database>.<schema>.<table>".
If a defaultTablePrefix
is provided, table names may also be given as "<table>", in which case the prefix will be prepended to them.
All provided node tables and relationship tables must have unique names. Not only must fully qualified table names be unique, but the table names themselves must also be unique.
Node tables
The nodeTables
list of the project
map must contain an entry for each type of node in the input graph.
In each such table, nodes are represented by rows.
There must be at least one column in each table; one that represents the node ID, and this column must be named nodeid
(case insensitive).
Each node ID must be unique within its table.
The type of the nodeid
column must be either BIGINT
or VARCHAR
.
In addition to the nodeid
column, the table may contain additional columns that represent node properties, for example features of the nodes.
Relationship tables
The relationshipTables
map of the project
must contain an entry for each type of relationship in the input graph.
Each key in the relationshipTables
map is the name of the table containing the relationships for one type of relationships, and each value is a map of configuration for that type of relationship.
The configuration map for each relationship type looks like the following:
Name |
Type |
Default |
Optional |
Description |
sourceTable |
String |
n/a |
no |
The name of the table that contains the source nodes of the relationships |
targetTable |
String |
n/a |
no |
The name of the table that contains the target nodes of the relationships |
orientation |
String |
"NATURAL" |
yes |
How to interpret the orientation (direction) of the provided relationships. Possible values are "NATURAL", "REVERSE" and "UNDIRECTED" |
In each provided relationship table, relationships are represented by rows.
There are exactly two columns that must be present in each relationship table: sourcenodeid
and targetnodeid
(both case-insensitive).
These specify the source and target nodes of the relationship, respectively, and should correspond to the node IDs in the provided source and target tables.
The type of the sourcenodeid
and targetnodeid
columns must be either BIGINT
or VARCHAR
.
The orientation
parameter specifies how to interpret the direction of the relationships.
By default, relationships are interpreted as having the "NATURAL" orientation, meaning that they are assumed to be directed from the source node to the target node.
If the orientation
is set to "REVERSE", the relationships are interpreted as being directed from the target node to the source node.
And if the orientation
is set to "UNDIRECTED", the relationships are interpreted as being undirected, meaning that they are symmetric and can be traversed in either direction (independently of which node is the source and which is the target).
Please note that in order for GraphSAGE to properly propagate updates of node embeddings, each type of node must be the target of at least one relationship type.
The orientation
parameter can be useful to add reverse direction relationships for types of nodes that are only the source of relationships (using the "REVERSE" or "UNDIRECTED" orientations).
Algorithm configuration
The following parameters can be configured for the GraphSAGE node embedding prediction endpoint:
Name |
Type |
Default |
Optional |
Description |
modelname |
String |
n/a |
no |
The name of the trained model to use |
batchSize |
Integer |
Inherited |
yes |
The number of target nodes to predict on in each batch. If not provided, the batch size that was used when training the model will be used |
randomSeed |
Integer |
A random integer |
yes |
A number used to seed all randomness of the computation |
Examples
For our example we will use an IMDB dataset with actors, directors, movies, and genres. These all have keywords associated with them, which we will use as features for the nodes. They are connected by relationships where actors act in movies and directors direct movies.
We have a database called imdb
that contains the tables:
-
actor
with columnsnodeid
andplot_keywords
-
movie
with columnsnodeid
andplot_keywords
-
director
with columnsnodeid
andplot_keywords
-
acted_in
with columnssourcenodeid
andtargetnodeid
that representactor
andmovie
node IDs -
directed_in
with columnssourcenodeid
andtargetnodeid
that representdirector
andmovie
node IDs
The plot_keywords
columns contain keywords associated with the nodes, encoded as vectors of floats.
You can upload this dataset to your snowflake account by following the instructions at github: neo4j-product-examples/snowflake-graph-analytics.
The prediction query
We assume a model named unsup-imdb
has been trained using the GraphSAGE node embedding training endpoint (see the example).
In the following predict query we specify the project configuration as we did during training.
We only need to specify the modelname in the compute
configuration as the rest is inherited from the training configuration.
Please also note that we provide write
configuration to specify the tables where the computed embeddings will be stored.
To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.
We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.
CALL Neo4j_Graph_Analytics.graph.gs_unsup_predict('GPU_NV_S', {
'project': {
'defaultTablePrefix': 'imdb.gml',
'nodeTables': ['actor', 'director', 'movie'],
'relationshipTables': {
'acted_in': {
'sourceTable': 'actor',
'targetTable': 'movie',
'orientation': 'UNDIRECTED'
},
'directed_in': {
'sourceTable': 'director',
'targetTable': 'movie',
'orientation': 'UNDIRECTED'
}
}
},
'compute': {
'modelname': 'unsup-imdb'
},
'write': [
{
'nodeLabel': 'movie',
'outputTable': 'imdb.gml.movie_embeddings'
},
{
'nodeLabel': 'actor',
'outputTable': 'imdb.gml.actor_embeddings'
}
]
});
The above query should produce a result similar to the one below.
JOB_ID |
JOB_START |
JOB_END |
JOB_RESULT |
job_7c5303c7899547e5b71f42000393ea59 |
2025-04-29 13:02:41.287 |
2025-04-29 13:03:20.554 |
{ "node_output_stats": { "actor": { "row_count": 5841, "table_name": "imdb.gml.actor_embeddings" }, "movie": { "row_count": 4661, "table_name": "imdb.gml.movie_embeddings" } } } |
We can inspect the produced embeddings for the "first" two actor nodes by running
SELECT * FROM IMDB.GML.actor_embeddings LIMIT 2;
which yields
NODEID EMBEDDING
6931 [0.000000,0.275806,0.333710,0.000000,0.000000,0.000000,1.678880,0.000000,0.000000,0.000000,1.548835,0.705060,0.698967,0.000000,0.000000,0.748005,0.000000,0.000000,0.032963,0.000000,0.000000,0.000000,0.000000,1.082358,0.000000,0.000000,0.000000,0.168949,0.000000,2.504213,1.462919,0.125596,0.000000,0.000000,0.000000,0.020546,1.581930,0.000000,0.000000,2.310814,0.071558,2.554588,0.148772,0.000000,1.342975,0.522795,0.000000,0.664341,0.341649,0.280214,0.000000,0.923817,0.386520,0.000000,1.193098,0.915913,0.000000,0.000000,0.000000,0.538325,0.819232,0.000000,0.594658,0.000000,0.933429,0.000000,0.665206,0.000000,0.000000,0.577315,1.039834,0.000000,0.000000,0.000000,0.000000,0.000000,0.263641,1.260790,0.000000,0.285769,0.797815,0.143350,0.000000,0.000000,0.000000,0.760788,0.537654,0.835502,0.000000,0.000000,0.000000,0.000000,0.000000,2.075793,0.000000,0.000000,0.000000,0.000000,0.284028,0.000000,0.000000,0.979259,0.000000,0.000000,0.000000,0.000000,0.855099,0.100875,0.000000,1.053404,0.785612,1.723292,0.750773,0.000000,1.061991,0.000000,0.000000,0.345646,0.000000,0.000000,0.068293,0.208700,0.345753,0.000000,1.941105,1.733489,0.000000,1.319209,0.867830,0.000000,0.363041,0.000000,2.042518,0.490823,1.638796,0.181801,1.475707,0.669406,0.000000,0.000000,0.000000,0.000000,1.641088,0.000000,0.400986,0.173700,0.598307,0.000000,1.213337,2.599229,0.140189,0.758646,0.000000,0.462205,0.000000,0.843118,0.000000,0.480571,0.000000,0.000000,1.049532,1.766646,0.000000,1.403183,0.461133,0.000000,1.863557,0.000000,0.325326,0.000000,0.430773,0.000000,0.000000,0.405159,0.000000,0.056346,0.000000,0.000000,0.368162,0.000000,0.104882,0.000000,1.468014,1.360908,1.744029,0.000000,0.000000,1.179452,0.547919,0.000000,0.000000,1.504631,0.045264,0.644895,0.000000,0.000000,0.000000,0.101866,0.696652,0.000000,0.000000,0.000000,0.000000,0.000000,0.464310,0.218352,1.510026,0.000000,0.000000,0.000000,0.246313,0.000000,0.178400,0.000000,0.000000,0.000000,0.000000,0.701618,0.070749,0.000000,0.000000,0.603074,0.000000,0.000000,0.000000,1.001244,0.465114,0.570518,0.000000,0.000000,0.932442,0.325107,0.622439,1.470797,0.000000,0.316309,0.000000,0.247825,0.270058,0.946190,0.000000,0.000000,0.000000,1.873032,0.000000,0.000000,1.510353,0.562894,0.000000,1.618610,0.322268,0.000000,0.000000,0.512696,1.013735,0.036263]
6932 [0.000000,0.000000,0.530572,0.998598,0.000000,0.000000,1.851767,0.000000,0.000000,0.000000,1.551337,0.213426,1.226383,0.000000,0.000000,0.498449,0.364041,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.162470,1.721648,0.000000,0.000000,0.146792,0.460837,0.000000,0.542702,0.000000,0.000000,0.585533,0.000000,0.859640,0.915765,0.000000,0.837624,0.101824,0.000000,0.000000,0.000000,0.000000,0.000000,0.766046,1.116825,0.000000,1.394197,0.347508,0.319338,0.000000,0.000000,0.000000,0.669600,0.000000,0.554313,0.014044,0.573868,0.000000,0.214537,0.000000,0.000000,1.127173,0.000000,0.000000,0.000000,0.000000,0.000000,0.535387,0.000000,1.786094,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.804344,2.133009,0.948304,0.000000,0.000000,0.000000,0.000000,0.637896,1.149579,0.000000,0.000000,0.412694,0.507935,0.385693,0.979215,0.000000,1.737015,0.194008,0.000000,0.000000,0.000000,0.793839,1.421982,0.000000,0.228598,1.145930,0.539003,0.643983,0.897991,0.956065,0.000000,0.000000,0.000000,0.000000,0.000000,1.778283,1.129515,0.000000,0.280099,2.112762,2.177783,0.000000,0.820926,0.000000,0.000000,0.000000,0.000000,1.528605,0.000000,1.808216,0.539503,1.489559,0.437647,0.000000,0.000000,0.000000,1.395121,1.636038,0.000000,0.741754,0.000000,0.000000,0.118127,1.866470,1.642432,0.452591,1.244799,0.478923,1.421850,0.294502,1.133924,0.000000,0.000000,0.000000,0.000000,1.080479,0.568388,0.440139,1.385756,0.000000,0.000000,0.618767,0.000000,0.085425,0.000000,0.000000,1.528592,0.011202,1.152845,0.578253,0.000000,0.000000,1.087868,0.960426,0.000000,0.000000,0.000000,2.200223,2.432830,1.850158,0.238872,0.000000,0.000000,0.391092,0.000000,0.000000,0.612836,0.583207,0.000000,0.000000,0.281198,1.414375,0.000000,0.709002,0.000000,0.000000,0.000000,0.000000,0.000000,0.141193,0.231656,0.109030,0.277832,0.000000,0.000000,0.000000,0.449620,0.000000,0.000000,0.000000,0.000000,0.000000,0.660041,0.000000,0.378508,0.000000,0.000000,0.501338,0.000000,0.194963,0.765173,0.227336,0.000000,0.000000,0.223111,0.972678,0.485203,0.709181,0.705317,0.000000,0.661023,0.000000,0.000000,0.000000,0.458935,0.000000,0.000000,0.587035,1.170718,0.913786,0.000000,1.510381,0.428103,0.000000,1.663107,0.000000,0.477148,0.175759,0.983973,0.315653,0.736073]