FastPath — A path embedding algorithm
Neo4j Graph Analytics for Snowflake is in Public Preview and is not intended for production use. |
Elevator pitch
FastPath is a light-weight path embedding algorithm operating on temporal graphs. It computes embeddings for base nodes that are tied to event sequences. Events are associated to vectors depending on the event type and (continuously) on the elapsed time since the event occurred. These vectors are aggregated into embeddings that can be used in machine learning tasks, like customer journey analysis, and generalize across different time periods.
Short note on graph terminology
The FastPath algorithm conceptually works on graphs, so this documentation speaks of graphs, nodes, relationships and properties. Moreover, the graphs are heterogeneous, so nodes are tagged with node labels and relationships have types. The correspondence between graphs and actual Snowflake tables is described in the sections Node tables and Relationship tables.
Introduction
FastPath is a path embedding algorithm. This means that the input graph is a temporal graph consisting of
-
base nodes — Nodes of interest that the algorithm aims to compute embeddings for
-
event nodes — Nodes marked with time stamp and associated with base nodes
There are two supported graph schemas that encode the event chains.
The event nodes can hold information regarding what the event entails. This information can be in the form of feature vectors, categorical properties or relationships to context nodes.
In summary, the input consists of a collection of base nodes, each of which have a time series of information carrying events.
For each base node, an output time is specified. The output time governs two concerns of the embedding. First, only events up to the output time are used to arrive at the embedding. Second, the algorithm uses elapsed time rather than absolute time to compute the embeddings. When determining the contribution of an event on the embedding of a base node, the elapsed time is the amount of time between the event and the output time of the base node.
Let’s consider an example to make this more clear: One of the base nodes is a person called "Joe" and one of the events represents the fact that "Joe" consumed a dose of aspirin. The algorithm will store
-
the type of event — here "taking of aspirin"
-
the elapsed time — here 7 days
Further, each pair of event type and elapsed time, is associated with a "semi-random" vector. The vectors are constructed using both randomness, similar to the so called FastRP algorithm, and a time smoothing mechanism. This results in vectors for "taking of aspirin 7 days ago" being (relatively) similar to "taking of aspirin 8 days ago".
The embedding for Joe will be a weighted sum of the vector of "taking of aspirin 7 days ago" and vectors of all other events that occurred for Joe. How the weighting works is described in more detail below.
It may seem that the above only gives an intuition for discreet event information like categorical properties and relationships to context nodes. However, feature vectors on events can be thought of as a weighted sum of discreet events corresponding to the components of the vector.
The output node embeddings can for example be used in downstream machine learning tasks for solving problems related to e.g., customer journey. Because of the reliance on relative time rather than absolute time, such models can generalize from embeddings of past graphs to current graphs. In addition, it holds that FastPATH is equivariant over time; If all events for a base node are shifted by T time steps and the output time is also shifted by T time steps (in the same direction), the embedding of the base node stays the same.
Endpoint
The endpoint name is graph.fastpath
, and it takes two positional arguments as input.
The first argument is a VARCHAR
that specifies which compute pool to use.
For this algorithm a CPU compute pool is recommended, since the algorithm does not utilize GPUs.
The second argument is a JSON configuration map. This JSON must contain the three following keys:
Name |
Type |
Default |
Optional |
Description |
project |
Map |
n/a |
no |
Configuration for the input graph |
compute |
Map |
n/a |
no |
Configuration for algorithm-specific parameters |
write |
List |
n/a |
no |
Configuration for writing back algorithm results |
Configuration
In this section, we describe the configuration parameters that must be provided to the endpoint.
Input graph configuration
The project
map contains the following key pairs:
Name |
Type |
Default |
Optional |
Description |
nodeTables |
List |
n/a |
no |
A list of table names, which each represent a node label in the input graph |
relationshipTables |
Map |
n/a |
no |
A map from table names representing relationship types, to maps of configuration for that relationship type in the input graph (details below) |
defaultTablePrefix |
String |
n/a |
yes |
A default database and schema prefix to use for table names in the input graph. Should be of the format "<database>.<schema>" |
If a defaultTablePrefix
is not provided, all table names must be qualified with a database and schema name.
That is, they should be given a strings of the format "<database>.<schema>.<table>".
If a defaultTablePrefix
is provided, table names may also be given as "<table>", in which case the prefix will be prepended to them.
All provided node tables and relationship tables must have unique names. Not only must fully qualified table names be unique, but the table names themselves must also be unique.
Node tables
The nodeTables
list of the project
map must contain an entry for each type of node in the input graph.
In each such table, nodes are represented by rows.
There must be at least one column in each table; one that represents the node ID, and this column must be named nodeid
(case insensitive).
Each node ID must be unique within its table.
The type of the nodeid
column must be either BIGINT
or VARCHAR
.
In addition to the nodeid
column, the table may contain additional columns that represent node properties, for example features of the nodes.
Relationship tables
The relationshipTables
map of the project
must contain an entry for each type of relationship in the input graph.
Each key in the relationshipTables
map is the name of the table containing the relationships for one type of relationships, and each value is a map of configuration for that type of relationship.
The configuration map for each relationship type looks like the following:
Name |
Type |
Default |
Optional |
Description |
sourceTable |
String |
n/a |
no |
The name of the table that contains the source nodes of the relationships |
targetTable |
String |
n/a |
no |
The name of the table that contains the target nodes of the relationships |
orientation |
String |
"NATURAL" |
yes |
How to interpret the orientation (direction) of the provided relationships. Possible values are "NATURAL", "REVERSE" and "UNDIRECTED" |
In each provided relationship table, relationships are represented by rows.
There are exactly two columns that must be present in each relationship table: sourcenodeid
and targetnodeid
(both case-insensitive).
These specify the source and target nodes of the relationship, respectively, and should correspond to the node IDs in the provided source and target tables.
The type of the sourcenodeid
and targetnodeid
columns must be either BIGINT
or VARCHAR
.
The orientation
parameter specifies how to interpret the direction of the relationships.
By default, relationships are interpreted as having the "NATURAL" orientation, meaning that they are assumed to be directed from the source node to the target node.
If the orientation
is set to "REVERSE", the relationships are interpreted as being directed from the target node to the source node.
And if the orientation
is set to "UNDIRECTED", the relationships are interpreted as being undirected, meaning that they are symmetric and can be traversed in either direction (independently of which node is the source and which is the target).
Algorithm configuration
The following parameters can be configured for the FastPath endpoint:
Configuration parameter |
Type |
Default |
Optional |
Description |
baseNodeLabel |
String |
n/a |
No |
Node label for which output embeddings are desired |
eventNodeLabel |
String |
n/a |
No |
Node label of events related to base nodes |
contextNodeLabel |
String |
None |
Yes |
Node label of context nodes that describe events |
timeNodeProperty |
String |
None |
Yes |
Name of a node property representing time on the event nodes. May be int or float. |
nextRelationshipType |
String |
None |
Yes |
Name of a relationship type between event nodes, indicating order of events |
firstRelationshipType |
String |
None |
Yes |
Name of a relationship type from base nodes to event nodes, indicating the first event for each base node |
eventFeatures |
String |
None |
Yes |
Name of a node property on event nodes which holds numerical features in vector form |
categoricalEventProperties |
List of String |
None |
Yes |
Names of node properties on event nodes which hold categorical values or lists of categorical values |
ignoredEventCategory |
Integer |
-1 |
Yes |
A category value representing that a category is missing, and is ignored when constructing the event embedding |
outputTime |
Float |
None |
Yes |
The timestamp at which embeddings should be produced. Events at this timestamp or later are not processed. |
outputTimeProperty |
String |
None |
Yes |
Name of a node property on base nodes which holds a number indicating for what timestamp an output embedding should be produced |
numElapsedTimes |
Integer |
n/a |
No |
The number of times in the "grid" . Elapsed time from event to output time is rounded to this grid. If set to 1, timestamps of events have no effect. |
decayFactor |
Float |
1.0 |
Yes |
Coefficient controlling the speed of decay of influence of older events |
maxElapsedTime |
Integer |
n/a |
No |
The maximum age of events, relative to the output time, that are considered for the embedding. |
smoothingWindow |
Integer |
0 |
Yes |
Event embeddings are aggregated over a window containing up to 2*smoothingWindow + 1 grid times. |
smoothingRate |
Float |
0.0 |
Yes |
Controls how fast expected similarity of events decays as their time-distance increases. |
dimension |
Integer |
n/a |
No |
The output dimension of the embeddings |
randomSeed |
Integer |
Random number |
Yes |
A random seed which is used for all randomness in computing the embeddings. |
Configuration concepts explained
This section goes into more detail about how to configure certain aspects of the algorithm.
Graph schema and relationship configuration
The algorithm supports two different schemas for connecting base nodes to event nodes.
-
To use the first schema configuration you provide a
firstRelationshipType
andnextRelationshipType
, andtimeNodeProperty
is optional. With this configuration each base and event node is assumed to be part of a path of the pattern:(:baseNodeLabel)-[:firstRelationshipType]-(:eventNodeLabel)-[:nextRelationshipType]-…-[:nextRelationshipType]→(:eventNodeLabel)
-
To use the second schema configuration, you provide only a
timeNodeProperty
. It’s then assumed that each base node has outgoing relationships to each of its event nodes:(:baseNodeLabel)-[:]→(:eventNodeLabel)

Optionally, the schema can also contain a node label called contextNodeLabel
, see Event input features below.
Defining event timestamps
There are two ways of specifying timestamps for event nodes.
The first way is to provide a timeNodeProperty
, in which case each event node must hold a property of that name representing a timestamp.
If however, a nextRelationshipType
is provided, timeNodeProperty
does not have to be provided.
In that case, time stamps are generated in an intuitive way depicted in the picture below.
Figure "Timestamp generation": The length of the nextRelationshipType
path to an event will define its timestamp.
So then the first event node in a base node’s event chain, which is connected via a firstRelationshipType
relationship from the base node, will have timestamp 0.
The next event node, accessed via a nextRelationshipType
relationship from the first event node, will have timestamp 1, and so on.
Please note that you can still provide a timeNodeProperty
even if nextRelationshipType
is used.
In that case, the nextRelationshipType
will not be used to infer timestamps, but rather which base node an event node belongs to.
Event input features
An event’s feature representation can be impacted by three different input sources. They are independent of each other and can be used both together and separately. The event feature vector will be a sum over all input sources that are enabled.
-
To enable the Context nodes input source you provide a
contextNodeLabel
. In this case it’s assumed that there may be outgoing relationships from event nodes to context nodes, i.e., the pattern:
(:eventNodeLabel)-[:]→(:contextNodeLabel)
-
To enable the Event feature vector properties input source you provide
eventFeatures
. This should be the name of a vector property present on all event nodes. -
To enable the Categorical event properties input source you provide
categoricalEventProperties
. This should be the name of a property that holds integers representing categories or lists of such integers. The property must be present on all event nodes.
Time-smoothing
If smoothingRate
is large, actually a small amount of smoothing is applied and events that are even a small time apart (enough to be closest to different grid points) will generally have very different embeddings.
If smoothingWindow
is 0, the effect is similar to having a large smoothingRate
.
On the other hand, if smoothingWindow
is large in relation to numElapsedTimes
, and simultaneously smoothingRate
is small, identical events will have similar embeddings even when their timestamps differ a lot.
Output
The output is Snowflake table containing two columns;
-
nodeid — the node’s id according to an input node table
-
embedding — the FastPath embedding of the node
The name and location of the table is given in Input graph configuration.
Example
We assume that we have a Snowflake schema patient_db.general
containing the following tables:
-
patient
with columnsnodeid
andoutput_time
-
encounter
with columnsnodeid
,embedding
andtime
-
has_encounter
with columnssourcenodeid
andtargetnodeid
These tables represent a graph of medical patient records.
To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.
We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.
The FastPATH algorithm can be run on this graph by executing a statement such as:
CALL Neo4j_Graph_Analytics.graph.fastpath('CPU_X64_M', {
'project': {
'defaultTablePrefix': 'patient_db.general',
'nodeTables': ['patient', 'encounter'],
'relationshipTables': {
'has_encounter': {
'sourceTable': 'patient',
'targetTable': 'encounter'
}
}
},
'compute': {
'baseNodeLabel': 'patient',
'eventNodeLabel': 'encounter',
'eventFeatures': 'embedding',
'outputTimeProperty': 'output_time',
'timeNodeProperty': 'time',
'dimension': 32, -- in reality, a higher value is recommended
'numElapsedTimes': 30,
'maxElapsedTime': 3650,
'smoothingRate': 0.9,
'smoothingWindow': 2,
'decayFactor': 0.0
},
'write': [{
'nodeLabel': 'patient',
'outputTable': 'patient_db.general.embeddings'
}]
});
An example of the output is:
JOB_ID | JOB_START | JOB_END | JOB_RESULT |
---|---|---|---|
job_7bed5ef80dc24eec9b0d86d0cbe7baa4 |
2025-04-28 14:12:46.439 |
2025-04-28 14:12:57.447 |
{"node_output_stats": {"patient1": {"row_count": 1000, "table_name": "patient_db.general.embeddings"}}} |
The produced node embeddings can be simply explored like so:
SELECT * FROM patient_db.general.embeddings LIMIT 10;
which returns data of the form
NODEID EMBEDDING
0 [5.341047,3.848688,-5.973229,-6.814521,4.745209,9.643475,-0.045684,4.210951,-2.004807,4.428759,13.497086,-6.179376,-0.986118,-7.707472,-0.555529,-6.073474,4.083053,1.602635,1.958997,12.649292,-0.603967,10.158164,-0.338394,-0.608493,-4.837356,15.097547,-6.569559,-2.797304,0.338724,6.874722,9.505033,-8.754766]
1 [-6.724641,-11.390027,-4.453455,4.120544,-2.017518,-2.675358,2.027915,-7.672874,-2.879899,17.374033,11.240053,7.185143,10.593451,-0.032012,6.947659,-29.623604,0.166914,11.507900,-12.466205,-1.891875,-8.164936,7.390674,-10.806652,2.019349,-1.903723,29.574389,1.859032,3.853385,0.566263,-5.038345,4.792415,3.326340]
2 [-2.920011,-2.048257,-5.708614,-2.373963,-0.244315,2.887959,-23.091557,-0.587305,4.643905,-0.775859,7.227013,-1.861291,-4.952862,-7.039846,-11.932112,-8.986797,-1.304907,2.019032,3.064439,-4.360886,-2.295964,6.218091,0.975781,-3.919339,8.686658,17.961964,-7.780995,-1.146844,-6.269790,20.006420,-1.216949,-3.596368]
3 [-8.567083,-9.307947,-11.381653,-0.802077,-1.286643,-3.037168,5.494521,-1.814128,-1.623972,-0.085408,-8.678900,6.716533,-3.552778,10.777124,8.609731,2.346340,-6.656413,4.587024,-6.891663,0.081429,-3.976248,-3.998426,-0.445967,4.443917,-7.440049,15.908058,-6.126621,-9.205961,-3.693087,5.798036,-10.521414,25.272419]
4 [-3.993221,-15.716671,-7.102195,-22.005224,-0.855417,9.282584,-2.253866,-16.234423,24.112732,2.810601,-13.975744,-12.062764,-2.387400,20.061924,-1.067919,12.701017,-4.327281,-17.721920,23.729225,-9.824536,-8.265147,-2.768445,-25.310749,8.047261,13.448681,51.172272,-15.435410,14.756327,-4.342721,11.190716,28.090958,9.425692]
5 [-13.833055,-3.474358,-29.580704,15.354573,46.242706,13.694808,32.601025,26.757681,-7.090479,57.280357,-0.735581,38.106731,26.968494,3.217232,-23.623219,2.485056,-6.973878,0.877117,-15.995618,55.822735,18.801920,52.809269,-47.148773,39.193768,-4.173650,-12.601570,-14.213085,-21.923597,12.738949,-27.862555,-9.950893,-20.819094]
6 [13.704840,13.533702,-5.135115,-6.226133,-5.701371,-2.703825,5.956636,3.647114,2.735878,2.559484,-15.129639,9.952298,2.568221,1.664974,-0.048340,-10.921870,-17.725523,1.495797,-9.043520,25.950590,-0.110576,9.694299,-18.446062,-6.852248,6.842557,35.075626,8.634172,7.099162,-5.890983,9.706918,-8.494190,6.583567]
7 [-2.812654,2.941338,1.067755,-4.293394,-3.086962,-0.469450,3.613479,-2.819084,5.120227,7.050265,-2.466012,-2.169495,0.909174,2.145000,3.638202,-2.498263,-5.502569,-3.852499,5.146134,1.659534,-0.714558,-0.469029,2.176041,-1.869031,-4.940216,3.088835,3.752839,3.714232,-2.807177,-0.698254,-5.012471,2.528090]
8 [-7.303960,-7.729638,-4.773112,-9.641366,2.609788,-20.279972,6.826665,8.960463,19.718428,3.388449,7.409564,-17.707247,-0.936261,28.070925,-3.749211,-9.148918,16.058552,4.065396,10.297297,9.403608,-10.898782,-10.801480,-25.112486,3.316337,17.157797,12.428331,19.813198,10.889541,-1.723780,-1.651543,-0.261916,9.012156]
9 [-6.429783,-20.400585,16.191334,-16.042753,9.880158,-5.386495,-10.220762,15.675117,9.438731,-3.742106,10.518373,3.956760,6.789544,-0.625264,-8.893956,1.050145,-15.966327,-6.890019,-2.499497,8.597721,3.010286,3.782031,-10.274326,20.255251,1.046289,17.648533,-6.685064,13.613426,7.041289,13.350776,-3.143279,-5.788550]
Appendix
The following sections provide more detailed information about FastPath for those interested in the inner workings of the algorithm.
How the algorithm works
The below breakdown of the algorithm into steps is conceptual.
It does not reflect the actual implementation details such as for example as order of loops, datastructures and more.
We will describe how to compute the FastPATH embedding for a base node b
which has output time to
.
The output time is computed according to one of the chosen parameters outputTime
or outputTimeProperty
.
An arbitrary event which belongs to b
will be denoted by e
, and its time stamp is assumed to be te
.
At the time to
, at which we wish to compute the embedding of b
, the elapsed time of e
is
tel = to - te
,
that is, the amount of time between the timestamp of the event and the output time of the corresponding base node.
Note, that events may belong to multiple base nodes which have different output times. This implies that some computations involving an event have to be recomputed for each of its associated base nodes, but luckily these computations are independent of each other.
-
Preprocess
Pre-process the input into a graph with base nodes, event nodes and optionally context nodes. Base and event nodes are connected and so are events and context nodes. If time stamps are not given for events, time stamps are generated by traversing event chains per base node. -
Filter relevant events
Out of the events that belong to the base nodeb
, any event which has elapsed time exceedingmaxElapsedTime
, or any event which occurred at timeto
or later is removed. The remaining relevant events forb
will be calledE(b)
. -
Create grid
Create a grid over time subsequently referred to as the grid -
Generate random vectors
Sparse random vectors are generated independently per input sources below, if they are selected.
For each time in the grid and input source, generate vectors as follows:-
Event feature vector properties: a vector
ri,t
is generated for each dimensioni
of the input event feature vectors -
Context nodes: for each context node
c
, generaterc,t
-
Categorical event properties: for each categorical node property
prop
and for each valuev
of that property, initialize a random vectorrprop,v,t
.
If the property value for any event is a list of categories[v1, v2, …]
, andv
is any of these values, we initialize a random vectorr~prop,v
.
-
-
Compute event node pre-embeddings
The pre-embedding for an evente
at timet
is denoted byxe,t
and initialized to all zeros. For all input sources below that apply, weighted sums of random vectors generated in step 3 are added toxe,t
:-
Event feature vector properties:
xe,t += ∑i ri,t * inpi
whereinp
is the input vector ofe
andi
ranges over the dimensions ofinp
. -
Context nodes:
xe,t += ∑c rc,t
wherec
ranges over all neighbors of the evente
. -
Categorical event properties:
xe,t += ∑prop rprop,e.prop,t
whereprop
ranges over all selected categorical event properties.
Or whene.prop
is a list:xe,t = ∑prop ∑v in e.prop rprop,v,t
-
-
Compute event embeddings
We compute an embedding fore
using its earlier derived pre-embeddings at different times. We find the timetcl
in the grid which is the closest to the elapsed timetel = to - te
. Then adding up tosmoothingWindow
points in the grid on both sides oftcl
, we to obtain a windowWe
of relevant grid times for the event. In the picture,smoothingWindow = 1
andtcl=t4
. ThereforeWe = {t3,t4,t5}
. For eacht
inWe
we set the weight of the grid pointt
to be,
w(t) = exp(-|t - tel| * smoothingRate ) / Ze
,
where Ze is the constant that ensures∑t in W_e w(t) = 1
.
We can now finally set the embedding of the event at output timeto
to be
emb(e|to) = ∑t in W_e w(t) * xe,t
. We have hence constructed time-smoothed embeddings by taking an average of a grid of pre-embeddings, or basis vectors if you will. The purpose of the time-smoothing mechanism is to ensure that embeddings change continuously with respect to changes in elapsed time, all other factors being equal. -
Compute base node embeddings
The embedding forb
is finally given by
emb(b) = ∑e in E(b) wb,e * emb(e)
where
wb,e = exp(-decayFactor * (to - te))
andE(b)
was defined in step 2.
The figure below gives an overview of what influences event embeddings and how they flow to the embedding of a base node, based on the output time and elapsed times.
