Versioning
Every time you refactor your data model, you create new versions of it. Tracking changes in the data structure or showing a current and past value can be valuable for auditing purposes, trend analysis, etc. This page gives an overview of the different ways you could model data in order to keep track of changes over time.
Versioning of entities
You can keep track of changes in data by versioning relevant entities. This strategy is useful when you need to:
-
Access the many versions of specific entities (nodes, for instance) in a graph (e.g. the different names a product has had throughout time).
-
Retrieve the latest version only (e.g. the current name of a product).
With entities versioning:
-
The entity
Product
is linked to its different versions by an explicit relationship. -
The entity
Product
is immutable. Only the properties that are stored in the different versions (State
nodes) change. -
The
LATEST
relationship links the entityProduct
to its most recent version (State
), which also happens to be version 2 (V2
).
Pros and cons
Pros | Cons |
---|---|
Simple in terms of modeling, querying, and maintenance. |
Updating nodes requires the deletion of the |
Explicit for end users without any transformation. |
Can be limited if not using other versioning patterns, as it can be hard to know which version you want to retrieve if it’s not the latest. |
Query examples
These are examples of common queries that are useful with the entity versioning strategy:
Product
with the id '1'MATCH (:Product {id:1})-[:V2]->(s:State)
RETURN s.name
Product
with the id '1'MATCH (:Product {id:1})-[:LATEST]->(s:State)
RETURN s.name
Time-based versioning of entities
A variation of the entity versioning is a time-based approach. It is useful when you are interested in:
-
Graph snapshot by retrieving all valid elements (nodes and relationships) of the graph to a specific point in time (e.g. which products are available on Monday the 12.06.23).
-
Graph difference by comparing two graph snashots of different time stamps (e.g. which nodes are added, which are deleted, and which remain the same).
-
Temporal traversal by traversing only valid elements (node or relationships) of the graph to a specific point in time in order to find the chronological sequence of relationships which connect time-based events (e.g. bike sharing graph with trip relationships between stations as nodes).
-
Graph history by modeling the history of data changes.
With time-based versioning of entities:
-
Each element has dedicated
validFrom
/validTo
time properties. -
Nodes can only share a relationship if their validity timespan overlap.
-
Duplication of information is possible.
-
Complete history of the graph is usable.
Pros and cons
Pros | Cons |
---|---|
Every element has a well defined time interval in which the element is valid. |
If the state of a node changes, the node has to be duplicated and a new valid time interval should be assigned. |
States are bound to the specific element (no additional relationship required). |
Updating nodes requires the creation of a new relationship connecting to the new node/state and the assigning of A new valid interval to the relationship. |
Aggregation of all elements (or only valid ones at a certain time) is possible. |
Duplications of data cannot be avoided. |
Query examples
These are examples of common queries that are useful with the time-based entity versioning strategy:
Product
Rice CookerMATCH (p:Product)
WHERE p.name = “Rice Cooker” AND p.validTo = ∞
RETURN p.price
Product
Rice Cooker in NovemberMATCH (p:Product)
WHERE p.name = “Rice Cooker”
AND datetime(p.validFrom) <= datetime(“November”) <= datetime(p.validTo)
RETURN p.price
MATCH ()-[r:HAS_PRODUCT]->(p)
WHERE r.validTo = ∞
RETURN p.name, p.price
Linked list
A linked list is another modeling strategy that can be useful when the sequence of objects matters.
Linked lists are useful when:
-
The order of events is of interest, e.g. getting the order of transactions executed on a bank account.
-
You need the previous and next elements in a list, based on the relationship between them (e.g. what song is the next on a playlist, or undo an action on a text document) are .
With a linked list:
-
The entity
Product
is linked to the first element of the sequence, and can be linked to the last one. -
As with the the versioning of entities, the entity
Product
is also immutable here. -
Each element of the sequence is linked to the next one through a
NEXT
relationship.
Pros and cons
Pros | Cons |
---|---|
Efficient by using relationships to get the next/previous element. |
Limited to very specific use cases without using other versioning patterns. |
Simple modeling and maintenance. |
Difficult to find a specific version which is not the first or the last. |
Explicit for end users. |
Query examples
These are examples of common queries that are useful with the linked-list versioning strategy:
MATCH (:State{name: “Professional chair”})-[:NEXT]->(s:State)
RETURN s.name
MATCH (:Product {id:1})-[:LAST]->(:State)<-[:NEXT]-(s:State)
RETURN s.name
Timeline tree
As mentioned in Modeling designs, the timeline tree is a common modeling design. It can be a useful strategy when you want to track change. In this example, the timeline structure spans from years to days, and the rest of the non-time data nodes are the nodes that contain the important pieces of data in the graph:
Query examples
If you want to find all purchases that happened in a given time period, such as every purchase in the month of December 2012, the timeline tree can be navigated from 2012, to December, and then fetch everything from the connected leaf nodes (nodes with no descendants) under that branch:
MATCH (root:Timeline)-[:IN_YEAR]->(year:Year {value:2012})-[:IN_MONTH]->(month:Month {value:12})
WITH month
MATCH (month)-[:ON_DAY]->(day)
MATCH (purchase:Purchase)-[:OCCURRED]->(day)
RETURN purchase
Combined approach
Some complex use-cases require the combination of one or more of the previously mentioned modeling techniques since each has advantages and disadvantages.
The right combination depends on the specific use-case. Factors such as query times and the frequency of transactions should be considered as well.