FastPath
Elevator pitch
FastPath is a light-weight path embedding algorithm operating on temporal graphs. It computes embeddings for base nodes that are tied to event sequences. Events are associated to vectors depending on the event type and (continuously) on the elapsed time since the event occurred. These vectors are aggregated into embeddings that can be used in machine learning tasks, like customer journey analysis, and generalize across different time periods.
Short note on graph terminology
The FastPath algorithm conceptually works on graphs, so this documentation speaks of graphs, nodes, relationships and properties. Moreover, the graphs are heterogeneous, so nodes are tagged with node labels and relationships have types. The correspondence between graphs and actual Snowflake tables is described in the sections [_node_tables] and [_relationship_tables].
Introduction
FastPath is a path embedding algorithm. This means that the input graph is a temporal graph consisting of
-
base nodes — Nodes of interest that the algorithm aims to compute embeddings for
-
event nodes — Nodes marked with time stamp and associated with base nodes
There are two supported graph schemas that encode the event chains.
The event nodes can hold information regarding what the event entails. This information can be in the form of feature vectors, categorical properties or relationships to context nodes.
In summary, the input consists of a collection of base nodes, each of which have a time series of information carrying events.
For each base node, an output time is specified. The output time governs two concerns of the embedding. First, only events up to the output time are used to arrive at the embedding. Second, the algorithm uses elapsed time rather than absolute time to compute the embeddings. When determining the contribution of an event on the embedding of a base node, the elapsed time is the amount of time between the event and the output time of the base node.
Let’s consider an example to make this more clear: One of the base nodes is a person called "Joe" and one of the events represents the fact that "Joe" consumed a dose of aspirin. The algorithm will store
-
the type of event — here "taking of aspirin"
-
the elapsed time — here 7 days
Further, each pair of event type and elapsed time, is associated with a "semi-random" vector. The vectors are constructed using both randomness, similar to the so called FastRP algorithm, and a time smoothing mechanism. This results in vectors for "taking of aspirin 7 days ago" being (relatively) similar to "taking of aspirin 8 days ago".
The embedding for Joe will be a weighted sum of the vector of "taking of aspirin 7 days ago" and vectors of all other events that occurred for Joe. How the weighting works is described in more detail below.
It may seem that the above only gives an intuition for discreet event information like categorical properties and relationships to context nodes. However, feature vectors on events can be thought of as a weighted sum of discreet events corresponding to the components of the vector.
The output node embeddings can for example be used in downstream machine learning tasks for solving problems related to e.g., customer journey. Because of the reliance on relative time rather than absolute time, such models can generalize from embeddings of past graphs to current graphs. In addition, it holds that FastPATH is equivariant over time; If all events for a base node are shifted by T time steps and the output time is also shifted by T time steps (in the same direction), the embedding of the base node stays the same.
Syntax
This section covers the syntax used to execute the FastPath algorithm.
CALL graph.fastpath(
'CPU_X64_XS', (1)
{
['defaultTablePrefix': '...',] (2)
'project': {...}, (3)
'compute': {...}, (4)
'write': {...} (5)
}
);
1 | Compute pool selector. |
2 | Optional prefix for table references. |
3 | Project config. |
4 | Compute config. |
5 | Write config. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
computePoolSelector |
String |
|
no |
The selector for the compute pool on which to run the FastPath job. |
configuration |
Map |
|
no |
Configuration for graph project, algorithm compute and result write back. |
The configuration map consists of the following three entries.
For more details on below Project configuration, refer to the Project documentation. |
Name | Type |
---|---|
nodeTables |
List of node tables. |
relationshipTables |
Map of relationship types to relationship tables. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
baseNodeLabel |
String |
|
No |
Node label for which output embeddings are desired |
eventNodeLabel |
String |
|
No |
Node label of events related to base nodes |
contextNodeLabel |
String |
|
Yes |
Node label of context nodes that describe events |
timeNodeProperty |
String |
|
Yes |
Name of a node property representing time on the event nodes. May be int or float. |
nextRelationshipType |
String |
|
Yes |
Name of a relationship type between event nodes, indicating order of events |
firstRelationshipType |
String |
|
Yes |
Name of a relationship type from base nodes to event nodes, indicating the first event for each base node |
eventFeatures |
String |
|
Yes |
Name of a node property on event nodes which holds numerical features in vector form |
categoricalEventProperties |
List of String |
|
Yes |
Names of node properties on event nodes which hold categorical values or lists of categorical values |
ignoredEventCategory |
Integer |
|
Yes |
A category value representing that a category is missing, and is ignored when constructing the event embedding |
outputTime |
Float |
|
Yes |
The timestamp at which embeddings should be produced. Events at this timestamp or later are not processed. |
outputTimeProperty |
String |
|
Yes |
Name of a node property on base nodes which holds a number indicating for what timestamp an output embedding should be produced |
numElapsedTimes |
Integer |
|
No |
The number of times in the "grid" . Elapsed time from event to output time is rounded to this grid. If set to 1, timestamps of events have no effect. |
decayFactor |
Float |
|
Yes |
Coefficient controlling the speed of decay of influence of older events |
maxElapsedTime |
Integer |
|
No |
The maximum age of events, relative to the output time, that are considered for the embedding. |
smoothingWindow |
Integer |
|
Yes |
Event embeddings are aggregated over a window containing up to 2*smoothingWindow + 1 grid times. |
smoothingRate |
Float |
|
Yes |
Controls how fast expected similarity of events decays as their time-distance increases. |
dimension |
Integer |
|
No |
The output dimension of the embeddings |
randomSeed |
Integer |
|
Yes |
A random seed which is used for all randomness in computing the embeddings. |
For more details on below Write configuration, refer to the Write documentation. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
nodeLabel |
String |
|
no |
Node label in the in-memory graph from which to write a node property. |
outputTable |
String |
|
no |
Table in Snowflake database to which node properties are written. |
Configuration concepts explained
This section goes into more detail about how to configure certain aspects of the algorithm.
Graph schema and relationship configuration
The algorithm supports two different schemas for connecting base nodes to event nodes.
-
To use the first schema configuration you provide a
firstRelationshipType
andnextRelationshipType
, andtimeNodeProperty
is optional. With this configuration each base and event node is assumed to be part of a path of the pattern:(:baseNodeLabel)-[:firstRelationshipType]-(:eventNodeLabel)-[:nextRelationshipType]-…-[:nextRelationshipType]→(:eventNodeLabel)
-
To use the second schema configuration, you provide only a
timeNodeProperty
. It’s then assumed that each base node has outgoing relationships to each of its event nodes:(:baseNodeLabel)-[:]→(:eventNodeLabel)

Optionally, the schema can also contain a node label called contextNodeLabel
, see Event input features below.
Defining event timestamps
There are two ways of specifying timestamps for event nodes.
The first way is to provide a timeNodeProperty
, in which case each event node must hold a property of that name representing a timestamp.
If however, a nextRelationshipType
is provided, timeNodeProperty
does not have to be provided.
In that case, time stamps are generated in an intuitive way depicted in the picture below.
Figure "Timestamp generation": The length of the nextRelationshipType
path to an event will define its timestamp.
So then the first event node in a base node’s event chain, which is connected via a firstRelationshipType
relationship from the base node, will have timestamp 0.
The next event node, accessed via a nextRelationshipType
relationship from the first event node, will have timestamp 1, and so on.
Please note that you can still provide a timeNodeProperty
even if nextRelationshipType
is used.
In that case, the nextRelationshipType
will not be used to infer timestamps, but rather which base node an event node belongs to.
Event input features
An event’s feature representation can be impacted by three different input sources. They are independent of each other and can be used both together and separately. The event feature vector will be a sum over all input sources that are enabled.
-
To enable the Context nodes input source you provide a
contextNodeLabel
. In this case it’s assumed that there may be outgoing relationships from event nodes to context nodes, i.e., the pattern:
(:eventNodeLabel)-[:]→(:contextNodeLabel)
-
To enable the Event feature vector properties input source you provide
eventFeatures
. This should be the name of a vector property present on all event nodes. -
To enable the Categorical event properties input source you provide
categoricalEventProperties
. This should be the name of a property that holds integers representing categories or lists of such integers. The property must be present on all event nodes.
Time-smoothing
If smoothingRate
is large, actually a small amount of smoothing is applied and events that are even a small time apart (enough to be closest to different grid points) will generally have very different embeddings.
If smoothingWindow
is 0, the effect is similar to having a large smoothingRate
.
On the other hand, if smoothingWindow
is large in relation to numElapsedTimes
, and simultaneously smoothingRate
is small, identical events will have similar embeddings even when their timestamps differ a lot.
Output
The output is Snowflake table containing two columns;
-
nodeid — the node’s id according to an input node table
-
embedding — the FastPath embedding of the node
The name and location of the table is given in Write configuration.
Example
We assume that we have a Snowflake schema patient_db.general
containing the following tables:
-
patient
with columnsnodeid
andoutput_time
-
encounter
with columnsnodeid
,embedding
andtime
-
has_encounter
with columnssourcenodeid
andtargetnodeid
These tables represent a graph of medical patient records.
To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.
We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.
The FastPATH algorithm can be run on this graph by executing a statement such as:
CALL Neo4j_Graph_Analytics.graph.fastpath('CPU_X64_M', {
'defaultTablePrefix': 'patient_db.general',
'project': {
'nodeTables': ['patient', 'encounter'],
'relationshipTables': {
'has_encounter': {
'sourceTable': 'patient',
'targetTable': 'encounter'
}
}
},
'compute': {
'baseNodeLabel': 'patient',
'eventNodeLabel': 'encounter',
'eventFeatures': 'embedding',
'outputTimeProperty': 'output_time',
'timeNodeProperty': 'time',
'dimension': 32, -- in reality, a higher value is recommended
'numElapsedTimes': 30,
'maxElapsedTime': 3650,
'smoothingRate': 0.9,
'smoothingWindow': 2,
'decayFactor': 0.0
},
'write': [{
'nodeLabel': 'patient',
'outputTable': 'embeddings'
}]
});
An example of the output is:
JOB_ID | JOB_START | JOB_END | JOB_RESULT |
---|---|---|---|
job_7bed5ef80dc24eec9b0d86d0cbe7baa4 |
2025-04-28 14:12:46.439 |
2025-04-28 14:12:57.447 |
{"node_output_stats": {"patient1": {"row_count": 1000, "table_name": "patient_db.general.embeddings"}}} |
The produced node embeddings can be simply explored like so:
SELECT * FROM patient_db.general.embeddings LIMIT 10;
which returns data of the form
NODEID EMBEDDING
0 [5.341047,3.848688,-5.973229,-6.814521,4.745209,9.643475,-0.045684,4.210951,-2.004807,4.428759,13.497086,-6.179376,-0.986118,-7.707472,-0.555529,-6.073474,4.083053,1.602635,1.958997,12.649292,-0.603967,10.158164,-0.338394,-0.608493,-4.837356,15.097547,-6.569559,-2.797304,0.338724,6.874722,9.505033,-8.754766]
1 [-6.724641,-11.390027,-4.453455,4.120544,-2.017518,-2.675358,2.027915,-7.672874,-2.879899,17.374033,11.240053,7.185143,10.593451,-0.032012,6.947659,-29.623604,0.166914,11.507900,-12.466205,-1.891875,-8.164936,7.390674,-10.806652,2.019349,-1.903723,29.574389,1.859032,3.853385,0.566263,-5.038345,4.792415,3.326340]
2 [-2.920011,-2.048257,-5.708614,-2.373963,-0.244315,2.887959,-23.091557,-0.587305,4.643905,-0.775859,7.227013,-1.861291,-4.952862,-7.039846,-11.932112,-8.986797,-1.304907,2.019032,3.064439,-4.360886,-2.295964,6.218091,0.975781,-3.919339,8.686658,17.961964,-7.780995,-1.146844,-6.269790,20.006420,-1.216949,-3.596368]
3 [-8.567083,-9.307947,-11.381653,-0.802077,-1.286643,-3.037168,5.494521,-1.814128,-1.623972,-0.085408,-8.678900,6.716533,-3.552778,10.777124,8.609731,2.346340,-6.656413,4.587024,-6.891663,0.081429,-3.976248,-3.998426,-0.445967,4.443917,-7.440049,15.908058,-6.126621,-9.205961,-3.693087,5.798036,-10.521414,25.272419]
4 [-3.993221,-15.716671,-7.102195,-22.005224,-0.855417,9.282584,-2.253866,-16.234423,24.112732,2.810601,-13.975744,-12.062764,-2.387400,20.061924,-1.067919,12.701017,-4.327281,-17.721920,23.729225,-9.824536,-8.265147,-2.768445,-25.310749,8.047261,13.448681,51.172272,-15.435410,14.756327,-4.342721,11.190716,28.090958,9.425692]
5 [-13.833055,-3.474358,-29.580704,15.354573,46.242706,13.694808,32.601025,26.757681,-7.090479,57.280357,-0.735581,38.106731,26.968494,3.217232,-23.623219,2.485056,-6.973878,0.877117,-15.995618,55.822735,18.801920,52.809269,-47.148773,39.193768,-4.173650,-12.601570,-14.213085,-21.923597,12.738949,-27.862555,-9.950893,-20.819094]
6 [13.704840,13.533702,-5.135115,-6.226133,-5.701371,-2.703825,5.956636,3.647114,2.735878,2.559484,-15.129639,9.952298,2.568221,1.664974,-0.048340,-10.921870,-17.725523,1.495797,-9.043520,25.950590,-0.110576,9.694299,-18.446062,-6.852248,6.842557,35.075626,8.634172,7.099162,-5.890983,9.706918,-8.494190,6.583567]
7 [-2.812654,2.941338,1.067755,-4.293394,-3.086962,-0.469450,3.613479,-2.819084,5.120227,7.050265,-2.466012,-2.169495,0.909174,2.145000,3.638202,-2.498263,-5.502569,-3.852499,5.146134,1.659534,-0.714558,-0.469029,2.176041,-1.869031,-4.940216,3.088835,3.752839,3.714232,-2.807177,-0.698254,-5.012471,2.528090]
8 [-7.303960,-7.729638,-4.773112,-9.641366,2.609788,-20.279972,6.826665,8.960463,19.718428,3.388449,7.409564,-17.707247,-0.936261,28.070925,-3.749211,-9.148918,16.058552,4.065396,10.297297,9.403608,-10.898782,-10.801480,-25.112486,3.316337,17.157797,12.428331,19.813198,10.889541,-1.723780,-1.651543,-0.261916,9.012156]
9 [-6.429783,-20.400585,16.191334,-16.042753,9.880158,-5.386495,-10.220762,15.675117,9.438731,-3.742106,10.518373,3.956760,6.789544,-0.625264,-8.893956,1.050145,-15.966327,-6.890019,-2.499497,8.597721,3.010286,3.782031,-10.274326,20.255251,1.046289,17.648533,-6.685064,13.613426,7.041289,13.350776,-3.143279,-5.788550]
Appendix
The following sections provide more detailed information about FastPath for those interested in the inner workings of the algorithm.
How the algorithm works
The below breakdown of the algorithm into steps is conceptual.
It does not reflect the actual implementation details such as for example as order of loops, datastructures and more.
We will describe how to compute the FastPATH embedding for a base node b
which has output time to
.
The output time is computed according to one of the chosen parameters outputTime
or outputTimeProperty
.
An arbitrary event which belongs to b
will be denoted by e
, and its time stamp is assumed to be te
.
At the time to
, at which we wish to compute the embedding of b
, the elapsed time of e
is
tel = to - te
,
that is, the amount of time between the timestamp of the event and the output time of the corresponding base node.
Note, that events may belong to multiple base nodes which have different output times. This implies that some computations involving an event have to be recomputed for each of its associated base nodes, but luckily these computations are independent of each other.
-
Preprocess
Pre-process the input into a graph with base nodes, event nodes and optionally context nodes. Base and event nodes are connected and so are events and context nodes. If time stamps are not given for events, time stamps are generated by traversing event chains per base node. -
Filter relevant events
Out of the events that belong to the base nodeb
, any event which has elapsed time exceedingmaxElapsedTime
, or any event which occurred at timeto
or later is removed. The remaining relevant events forb
will be calledE(b)
. -
Create grid
Create a grid over time subsequently referred to as the grid -
Generate random vectors
Sparse random vectors are generated independently per input sources below, if they are selected.
For each time in the grid and input source, generate vectors as follows:-
Event feature vector properties: a vector
ri,t
is generated for each dimensioni
of the input event feature vectors -
Context nodes: for each context node
c
, generaterc,t
-
Categorical event properties: for each categorical node property
prop
and for each valuev
of that property, initialize a random vectorrprop,v,t
.
If the property value for any event is a list of categories[v1, v2, …]
, andv
is any of these values, we initialize a random vectorr~prop,v
.
-
-
Compute event node pre-embeddings
The pre-embedding for an evente
at timet
is denoted byxe,t
and initialized to all zeros. For all input sources below that apply, weighted sums of random vectors generated in step 3 are added toxe,t
:-
Event feature vector properties:
xe,t += ∑i ri,t * inpi
whereinp
is the input vector ofe
andi
ranges over the dimensions ofinp
. -
Context nodes:
xe,t += ∑c rc,t
wherec
ranges over all neighbors of the evente
. -
Categorical event properties:
xe,t += ∑prop rprop,e.prop,t
whereprop
ranges over all selected categorical event properties.
Or whene.prop
is a list:xe,t = ∑prop ∑v in e.prop rprop,v,t
-
-
Compute event embeddings
We compute an embedding fore
using its earlier derived pre-embeddings at different times. We find the timetcl
in the grid which is the closest to the elapsed timetel = to - te
. Then adding up tosmoothingWindow
points in the grid on both sides oftcl
, we to obtain a windowWe
of relevant grid times for the event. In the picture,smoothingWindow = 1
andtcl=t4
. ThereforeWe = {t3,t4,t5}
. For eacht
inWe
we set the weight of the grid pointt
to be,
w(t) = exp(-|t - tel| * smoothingRate ) / Ze
,
where Ze is the constant that ensures∑t in W_e w(t) = 1
.
We can now finally set the embedding of the event at output timeto
to be
emb(e|to) = ∑t in W_e w(t) * xe,t
. We have hence constructed time-smoothed embeddings by taking an average of a grid of pre-embeddings, or basis vectors if you will. The purpose of the time-smoothing mechanism is to ensure that embeddings change continuously with respect to changes in elapsed time, all other factors being equal. -
Compute base node embeddings
The embedding forb
is finally given by
emb(b) = ∑e in E(b) wb,e * emb(e)
where
wb,e = exp(-decayFactor * (to - te))
andE(b)
was defined in step 2.
The figure below gives an overview of what influences event embeddings and how they flow to the embedding of a base node, based on the output time and elapsed times.
