FastPath

Elevator pitch

FastPath is a light-weight path embedding algorithm operating on temporal graphs. It computes embeddings for base nodes that are tied to event sequences. Events are associated to vectors depending on the event type and (continuously) on the elapsed time since the event occurred. These vectors are aggregated into embeddings that can be used in machine learning tasks, like customer journey analysis, and generalize across different time periods.

Short note on graph terminology

The FastPath algorithm conceptually works on graphs, so this documentation speaks of graphs, nodes, relationships and properties. Moreover, the graphs are heterogeneous, so nodes are tagged with node labels and relationships have types. The correspondence between graphs and actual Snowflake tables is described in the sections [_node_tables] and [_relationship_tables].

Introduction

FastPath is a path embedding algorithm. This means that the input graph is a temporal graph consisting of

base nodes — Nodes of interest that the algorithm aims to compute embeddings for
event nodes — Nodes marked with time stamp and associated with base nodes

There are two supported graph schemas that encode the event chains.

The event nodes can hold information regarding what the event entails. This information can be in the form of feature vectors, categorical properties or relationships to context nodes.

In summary, the input consists of a collection of base nodes, each of which have a time series of information carrying events.

For each base node, an output time is specified. The output time governs two concerns of the embedding. First, only events up to the output time are used to arrive at the embedding. Second, the algorithm uses elapsed time rather than absolute time to compute the embeddings. When determining the contribution of an event on the embedding of a base node, the elapsed time is the amount of time between the event and the output time of the base node.

Let’s consider an example to make this more clear: One of the base nodes is a person called "Joe" and one of the events represents the fact that "Joe" consumed a dose of aspirin. The algorithm will store

the type of event — here "taking of aspirin"
the elapsed time — here 7 days

Further, each pair of event type and elapsed time, is associated with a "semi-random" vector. The vectors are constructed using both randomness, similar to the so called FastRP algorithm, and a time smoothing mechanism. This results in vectors for "taking of aspirin 7 days ago" being (relatively) similar to "taking of aspirin 8 days ago".

The embedding for Joe will be a weighted sum of the vector of "taking of aspirin 7 days ago" and vectors of all other events that occurred for Joe. How the weighting works is described in more detail below.

It may seem that the above only gives an intuition for discreet event information like categorical properties and relationships to context nodes. However, feature vectors on events can be thought of as a weighted sum of discreet events corresponding to the components of the vector.

The output node embeddings can for example be used in downstream machine learning tasks for solving problems related to e.g., customer journey. Because of the reliance on relative time rather than absolute time, such models can generalize from embeddings of past graphs to current graphs. In addition, it holds that FastPATH is equivariant over time; If all events for a base node are shifted by T time steps and the output time is also shifted by T time steps (in the same direction), the embedding of the base node stays the same.

Syntax

This section covers the syntax used to execute the FastPath algorithm.

Run FastPath.

CALL graph.fastpath(
  'CPU_X64_XS',                    (1)
  {
    ['defaultTablePrefix': '...',] (2)
    'project': {...},              (3)
    'compute': {...},              (4)
    'write':   {...}               (5)
  }
);

1	Compute pool selector.
2	Optional prefix for table references.
3	Project config.
4	Compute config.
5	Write config.

Table 1. Parameters
Name	Type	Default	Optional	Description
computePoolSelector	String	`n/a`	no	The selector for the compute pool on which to run the FastPath job.
configuration	Map	`{}`	no	Configuration for graph project, algorithm compute and result write back.

The configuration map consists of the following three entries.

For more details on below Project configuration, refer to the Project documentation.

Table 2. Project configuration
Name	Type
nodeTables	List of node tables.
relationshipTables	Map of relationship types to relationship tables.

Table 3. Compute configuration
Name	Type	Default	Optional	Description
mutateProperty	String	`"embedding"`	Yes	The node property that will be written back to the Snowflake database
baseNodeLabel	String	`n/a`	No	Node label for which output embeddings are desired
eventNodeLabel	String	`n/a`	No	Node label of events related to base nodes
contextNodeLabel	String	`None`	Yes	Node label of context nodes that describe events
timeNodeProperty	String	`None`	Yes	Name of a node property representing time on the event nodes. May be int or float.
nextRelationshipType	String	`None`	Yes	Name of a relationship type between event nodes, indicating order of events
firstRelationshipType	String	`None`	Yes	Name of a relationship type from base nodes to event nodes, indicating the first event for each base node
eventFeatures	String	`None`	Yes	Name of a node property on event nodes which holds numerical features in vector form
categoricalEventProperties	List of String	`None`	Yes	Names of node properties on event nodes which hold categorical values or lists of categorical values
ignoredEventCategory	Integer	`-1`	Yes	A category value representing that a category is missing, and is ignored when constructing the event embedding
outputTime	Float	`None`	Yes	The timestamp at which embeddings should be produced. Events at this timestamp or later are not processed.
outputTimeProperty	String	`None`	Yes	Name of a node property on base nodes which holds a number indicating for what timestamp an output embedding should be produced
numElapsedTimes	Integer	`n/a`	No	The number of times in the "grid" . Elapsed time from event to output time is rounded to this grid. If set to 1, timestamps of events have no effect.
decayFactor	Float	`1.0`	Yes	Coefficient controlling the speed of decay of influence of older events
maxElapsedTime	Integer	`n/a`	No	The maximum age of events, relative to the output time, that are considered for the embedding.
smoothingWindow	Integer	`0`	Yes	Event embeddings are aggregated over a window containing up to 2*smoothingWindow + 1 grid times.
smoothingRate	Float	`0.0`	Yes	Controls how fast expected similarity of events decays as their time-distance increases.
dimension	Integer	`n/a`	No	The output dimension of the embeddings
randomSeed	Integer	`Random number`	Yes	A random seed which is used for all randomness in computing the embeddings.

For more details on below Write configuration, refer to the Write documentation.

Table 4. Write configuration
Name	Type	Default	Optional	Description
nodeLabel	String	`n/a`	no	Node label in the in-memory graph from which to write a node property.
outputTable	String	`n/a`	no	Table in Snowflake database to which node properties are written.

Configuration concepts explained

This section goes into more detail about how to configure certain aspects of the algorithm.

Graph schema and relationship configuration

The algorithm supports two different schemas for connecting base nodes to event nodes.

To use the first schema configuration you provide a firstRelationshipType and nextRelationshipType, and timeNodeProperty is optional. With this configuration each base and event node is assumed to be part of a path of the pattern: (:baseNodeLabel)-[:firstRelationshipType]-(:eventNodeLabel)-[:nextRelationshipType]-…-[:nextRelationshipType]→(:eventNodeLabel)
To use the second schema configuration, you provide only a timeNodeProperty. It’s then assumed that each base node has outgoing relationships to each of its event nodes: (:baseNodeLabel)-[:]→(:eventNodeLabel)

Optionally, the schema can also contain a node label called contextNodeLabel, see Event input features below.

Defining event timestamps

There are two ways of specifying timestamps for event nodes. The first way is to provide a timeNodeProperty, in which case each event node must hold a property of that name representing a timestamp.

If however, a nextRelationshipType is provided, timeNodeProperty does not have to be provided. In that case, time stamps are generated in an intuitive way depicted in the picture below.

Timestamp generation

Figure "Timestamp generation": The length of the nextRelationshipType path to an event will define its timestamp. So then the first event node in a base node’s event chain, which is connected via a firstRelationshipType relationship from the base node, will have timestamp 0. The next event node, accessed via a nextRelationshipType relationship from the first event node, will have timestamp 1, and so on.

Please note that you can still provide a timeNodeProperty even if nextRelationshipType is used. In that case, the nextRelationshipType will not be used to infer timestamps, but rather which base node an event node belongs to.

Event input features

An event’s feature representation can be impacted by three different input sources. They are independent of each other and can be used both together and separately. The event feature vector will be a sum over all input sources that are enabled.

To enable the Context nodes input source you provide a contextNodeLabel. In this case it’s assumed that there may be outgoing relationships from event nodes to context nodes, i.e., the pattern:
(:eventNodeLabel)-[:]→(:contextNodeLabel)
To enable the Event feature vector properties input source you provide eventFeatures. This should be the name of a vector property present on all event nodes.
To enable the Categorical event properties input source you provide categoricalEventProperties. This should be the name of a property that holds integers representing categories or lists of such integers. The property must be present on all event nodes.

Time-smoothing

If smoothingRate is large, actually a small amount of smoothing is applied and events that are even a small time apart (enough to be closest to different grid points) will generally have very different embeddings.

If smoothingWindow is 0, the effect is similar to having a large smoothingRate. On the other hand, if smoothingWindow is large in relation to numElapsedTimes, and simultaneously smoothingRate is small, identical events will have similar embeddings even when their timestamps differ a lot.

Output

The output is Snowflake table containing two columns;

nodeid — the node’s id according to an input node table
embedding — the FastPath embedding of the node

The name and location of the table is given in Write configuration.

Example

We assume that we have a Snowflake schema patient_db.general containing the following tables:

patient with columns nodeid and output_time
encounter with columns nodeid, embedding and time
has_encounter with columns sourcenodeid and targetnodeid

These tables represent a graph of medical patient records.

To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.

We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.

The FastPATH algorithm can be run on this graph by executing a statement such as:

CALL Neo4j_Graph_Analytics.graph.fastpath('CPU_X64_M', {
    'defaultTablePrefix': 'patient_db.general',
    'project': {
        'nodeTables': ['patient', 'encounter'],
        'relationshipTables': {
            'has_encounter': {
                'sourceTable': 'patient',
                'targetTable': 'encounter'
            }
        }
    },
    'compute': {
        'baseNodeLabel': 'patient',
        'eventNodeLabel': 'encounter',
        'eventFeatures': 'embedding',
        'outputTimeProperty': 'output_time',
        'timeNodeProperty': 'time',
        'dimension': 32, -- in reality, a higher value is recommended
        'numElapsedTimes': 30,
        'maxElapsedTime': 3650,
        'smoothingRate': 0.9,
        'smoothingWindow': 2,
        'decayFactor': 0.0
    },
    'write': [{
        'nodeLabel': 'patient',
        'outputTable': 'embeddings'
    }]
});

An example of the output is:

JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_7bed5ef80dc24eec9b0d86d0cbe7baa4	2025-04-28 14:12:46.439	2025-04-28 14:12:57.447	{"node_output_stats": {"patient1": {"row_count": 1000, "table_name": "patient_db.general.embeddings"}}}

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_7bed5ef80dc24eec9b0d86d0cbe7baa4

2025-04-28 14:12:46.439

2025-04-28 14:12:57.447

{"node_output_stats": {"patient1": {"row_count": 1000, "table_name": "patient_db.general.embeddings"}}}

The produced node embeddings can be simply explored like so:

SELECT * FROM patient_db.general.embeddings LIMIT 10;

which returns data of the form

NODEID	EMBEDDING
0	    [5.341047,3.848688,-5.973229,-6.814521,4.745209,9.643475,-0.045684,4.210951,-2.004807,4.428759,13.497086,-6.179376,-0.986118,-7.707472,-0.555529,-6.073474,4.083053,1.602635,1.958997,12.649292,-0.603967,10.158164,-0.338394,-0.608493,-4.837356,15.097547,-6.569559,-2.797304,0.338724,6.874722,9.505033,-8.754766]
1	    [-6.724641,-11.390027,-4.453455,4.120544,-2.017518,-2.675358,2.027915,-7.672874,-2.879899,17.374033,11.240053,7.185143,10.593451,-0.032012,6.947659,-29.623604,0.166914,11.507900,-12.466205,-1.891875,-8.164936,7.390674,-10.806652,2.019349,-1.903723,29.574389,1.859032,3.853385,0.566263,-5.038345,4.792415,3.326340]
2	    [-2.920011,-2.048257,-5.708614,-2.373963,-0.244315,2.887959,-23.091557,-0.587305,4.643905,-0.775859,7.227013,-1.861291,-4.952862,-7.039846,-11.932112,-8.986797,-1.304907,2.019032,3.064439,-4.360886,-2.295964,6.218091,0.975781,-3.919339,8.686658,17.961964,-7.780995,-1.146844,-6.269790,20.006420,-1.216949,-3.596368]
3	    [-8.567083,-9.307947,-11.381653,-0.802077,-1.286643,-3.037168,5.494521,-1.814128,-1.623972,-0.085408,-8.678900,6.716533,-3.552778,10.777124,8.609731,2.346340,-6.656413,4.587024,-6.891663,0.081429,-3.976248,-3.998426,-0.445967,4.443917,-7.440049,15.908058,-6.126621,-9.205961,-3.693087,5.798036,-10.521414,25.272419]
4	    [-3.993221,-15.716671,-7.102195,-22.005224,-0.855417,9.282584,-2.253866,-16.234423,24.112732,2.810601,-13.975744,-12.062764,-2.387400,20.061924,-1.067919,12.701017,-4.327281,-17.721920,23.729225,-9.824536,-8.265147,-2.768445,-25.310749,8.047261,13.448681,51.172272,-15.435410,14.756327,-4.342721,11.190716,28.090958,9.425692]
5	    [-13.833055,-3.474358,-29.580704,15.354573,46.242706,13.694808,32.601025,26.757681,-7.090479,57.280357,-0.735581,38.106731,26.968494,3.217232,-23.623219,2.485056,-6.973878,0.877117,-15.995618,55.822735,18.801920,52.809269,-47.148773,39.193768,-4.173650,-12.601570,-14.213085,-21.923597,12.738949,-27.862555,-9.950893,-20.819094]
6	    [13.704840,13.533702,-5.135115,-6.226133,-5.701371,-2.703825,5.956636,3.647114,2.735878,2.559484,-15.129639,9.952298,2.568221,1.664974,-0.048340,-10.921870,-17.725523,1.495797,-9.043520,25.950590,-0.110576,9.694299,-18.446062,-6.852248,6.842557,35.075626,8.634172,7.099162,-5.890983,9.706918,-8.494190,6.583567]
7	    [-2.812654,2.941338,1.067755,-4.293394,-3.086962,-0.469450,3.613479,-2.819084,5.120227,7.050265,-2.466012,-2.169495,0.909174,2.145000,3.638202,-2.498263,-5.502569,-3.852499,5.146134,1.659534,-0.714558,-0.469029,2.176041,-1.869031,-4.940216,3.088835,3.752839,3.714232,-2.807177,-0.698254,-5.012471,2.528090]
8	    [-7.303960,-7.729638,-4.773112,-9.641366,2.609788,-20.279972,6.826665,8.960463,19.718428,3.388449,7.409564,-17.707247,-0.936261,28.070925,-3.749211,-9.148918,16.058552,4.065396,10.297297,9.403608,-10.898782,-10.801480,-25.112486,3.316337,17.157797,12.428331,19.813198,10.889541,-1.723780,-1.651543,-0.261916,9.012156]
9	    [-6.429783,-20.400585,16.191334,-16.042753,9.880158,-5.386495,-10.220762,15.675117,9.438731,-3.742106,10.518373,3.956760,6.789544,-0.625264,-8.893956,1.050145,-15.966327,-6.890019,-2.499497,8.597721,3.010286,3.782031,-10.274326,20.255251,1.046289,17.648533,-6.685064,13.613426,7.041289,13.350776,-3.143279,-5.788550]

Appendix

The following sections provide more detailed information about FastPath for those interested in the inner workings of the algorithm.

How the algorithm works

The below breakdown of the algorithm into steps is conceptual. It does not reflect the actual implementation details such as for example as order of loops, datastructures and more. We will describe how to compute the FastPATH embedding for a base node b which has output time t_o. The output time is computed according to one of the chosen parameters outputTime or outputTimeProperty. An arbitrary event which belongs to b will be denoted by e, and its time stamp is assumed to be t_e. At the time t_o, at which we wish to compute the embedding of b, the elapsed time of e is
t_el = t_o - t_e, that is, the amount of time between the timestamp of the event and the output time of the corresponding base node.

Note, that events may belong to multiple base nodes which have different output times. This implies that some computations involving an event have to be recomputed for each of its associated base nodes, but luckily these computations are independent of each other.

Preprocess
Pre-process the input into a graph with base nodes, event nodes and optionally context nodes. Base and event nodes are connected and so are events and context nodes. If time stamps are not given for events, time stamps are generated by traversing event chains per base node.
Filter relevant events
Out of the events that belong to the base node b, any event which has elapsed time exceeding maxElapsedTime, or any event which occurred at time t_o or later is removed. The remaining relevant events for b will be called E(b).
Create grid
Create a grid over time subsequently referred to as the grid
Generate random vectors
Sparse random vectors are generated independently per input sources below, if they are selected.
For each time in the grid and input source, generate vectors as follows:
1. Event feature vector properties: a vector r_i,t is generated for each dimension i of the input event feature vectors
2. Context nodes: for each context node c, generate r_c,t
3. Categorical event properties: for each categorical node property prop and for each value v of that property, initialize a random vector r_prop,v,t.
  If the property value for any event is a list of categories [v₁, v₂, …], and v is any of these values, we initialize a random vector r~prop,v.
Compute event node pre-embeddings
The pre-embedding for an event e at time t is denoted by x_e,t and initialized to all zeros. For all input sources below that apply, weighted sums of random vectors generated in step 3 are added to x_e,t:
1. Event feature vector properties:
  x_e,t += ∑_i r_i,t * inp_i
  where inp is the input vector of e and i ranges over the dimensions of inp.
2. Context nodes:
  x_e,t += ∑_c r_c,t
  where c ranges over all neighbors of the event e.
3. Categorical event properties:
  x_e,t += ∑_prop r_{prop,e.prop,t}
  where prop ranges over all selected categorical event properties.
  Or when e.prop is a list: x_e,t = ∑_prop ∑_{v in e.prop} r_prop,v,t
Compute event embeddings

We compute an embedding for e using its earlier derived pre-embeddings at different times. We find the time t_cl in the grid which is the closest to the elapsed time t_el = t_o - t_e. Then adding up to smoothingWindow points in the grid on both sides of t_cl, we to obtain a window W_e of relevant grid times for the event. In the picture, smoothingWindow = 1 and t_cl=t₄. Therefore W_e = {t₃,t₄,t₅}. For each t in W_e we set the weight of the grid point t to be,
w(t) = exp(-|t - t_el| * smoothingRate ) / Z_e,
where Z_e is the constant that ensures ∑_{t in W_e} w(t) = 1.
We can now finally set the embedding of the event at output time t_o to be
emb(e|t_o) = ∑_{t in W_e} w(t) * x_e,t. We have hence constructed time-smoothed embeddings by taking an average of a grid of pre-embeddings, or basis vectors if you will. The purpose of the time-smoothing mechanism is to ensure that embeddings change continuously with respect to changes in elapsed time, all other factors being equal.
Compute base node embeddings
The embedding for b is finally given by
emb(b) = ∑_{e in E(b)} w_b,e * emb(e)
where
w_b,e = exp(-decayFactor * (t_o - t_e)) and E(b) was defined in step 2.

The figure below gives an overview of what influences event embeddings and how they flow to the embedding of a base node, based on the output time and elapsed times.