Our main mission in the Drivers team is to offer an idiomatic and consistent developer experience, regardless of the target programming language and Neo4j deployment topology.
Consistency manifests itself in the concepts and API the various drivers expose, but also in the way they behave.
The main way we ensure this is through our shared suite of acceptance tests, also known as TestKit.
To that end, when my colleague Rouven worked on introducing date and time-related tests to TestKit he noticed something strange going on.
Neo4j History and Bolt Protocol 101
Before diving into the core of the issue, let us recap what purpose the Bolt protocol serves.
Skip this and the next section if you are already familiar with the protocol.
Historically, Neo4j started as a JVM-only embedded database.
Indeed, it would only run co-located to your application, sharing the same environment (shared heap, shared garbage collection cycles, …).
Around Neo4j 1.0 (ca. 2010), a REST API was introduced and Neo4j could be deployed as a standalone server with the default port 7474 that you all know and love.
Clustering appeared shortly after.
In a nutshell, the Bolt protocol specifies how clients and servers interact via specific Bolt messages, exchanging data following the
You can learn more about it here.
Among these data structures, you can find
They both encode a localized point in time, defined in seconds and nanoseconds.
They can be:
- Cypher query parameters
- Results of Cypher queries when invoking one of the temporal functions
- Returned nodes’ and/or relationships’ date/time properties
They differ by the way they localize the point in time:
DateTimespecifies the offset from UTC in seconds.
DateTimeZoneIdspecifies the localization with a timezone name.
Let’s illustrate how they work.
1970–01–01T02:15:00.000000042+01:00 as a
The corresponding UTC time is
Z denotes UTC).
The number of seconds since the Unix epoch is 1 hour and 15 minutes, i.e. 4500 seconds.
The offset is 1h, i.e. 3600 seconds.
The localized number of seconds is 4500+3600, i.e. 8100 seconds.
DateTime will therefore be as follows:
Let’s do the same with
The UTC offset for this timezone at that point in time is +1 hour.
Therefore, the UTC time is
From there, the same computations as above occur and the resulting
DateTimeZoneId is as follows:
Back to Sweden
Rouven noticed a problem with the following point in time:
1980–09–28T02:30:00[Europe/Stockholm], i.e. September 28, 1980 at 02:30 AM in the Europe/Stockholm timezone.
You can try it in a program by running the following Cypher query
RETURN datetime ("1980–09–28T02:30:00[Europe/Stockholm]") and extracting the results.
Did something happen on September 28, 1980 in Sweden that could cause an issue?
(Sweden — it has to be said — has some form in this area, as Jon Skeet famously pointed out.)
Here I Come to Save the Day(light)!
The answer is yes!
In 1980, Sweden started to implement Daylight Saving Time (DST), also known as Summer Time.
DST consists of shifting the clock at different times of the year to adjust waking hours to daylight hours.
The clock is usually advanced one hour in winter, thus creating a gap.
For example, a country could decide to implement time shifts at 2 AM: after 1:59:59 AM, the clock moves forward to 3:00:00 AM — 2:15:00 AM for instance does not occur.
The clock is usually set backward by 1 hour in summer, thus creating an overlap.
Following the same example, after 2:59:59 AM, the clock is set back to 2:00:00 AM — 2:15:00 AM e.g. occurs twice with different UTC offsets.
The Ambiguity of DateTimeZoneId
Let’s try to convert
1980–09–28T02:30:00[Europe/Stockholm] to a
DateTimeZoneId as represented in the Bolt protocol.
First, we need to determine the corresponding UTC time.
On that specific day, the clock was set back to 2:00:00 AM after the first occurring hour of 2:59:59 AM.
Therefore, there was two 2:30:00 AM with different UTC offsets.
Since this time represents an overlap, we cannot know which offset to use!
To make matters worse, most languages will silently resolve this datetime.
The following Go program, when run on my machine, will print an offset of 1 hour.
This will print:
The offset is 3600s.
The Go API
time.Date explicitly documents:
Date returns a time that is correct in one of the two zones involved in the transition, but it does not guarantee which.
The following Python program shows that the programmer has a little more control over the localization process (notice the second
localize call and its parameter
Overall, the ambiguity remains and may cause bugs further down the line.
Time for a Fix
The root cause of this issue is that the
seconds field of
DateTimeZoneId includes the offset we sometimes cannot resolve.
One way to resolve the ambiguity is to encode the
seconds field of
DateTimeZoneId as UTC time (as well as the
seconds field of
DateTime’s for consistency’s sake).
Indeed, UTC time is monotonic (it only ever grows) and is therefore non-ambiguous.
Going back to the dreaded September 28, 1980 at 2:30 AM in the Europe/Stockholm timezone, the
seconds field of
UTCDateTimeZoneId (the UTC-encoded replacement of
DateTimeZoneId) will either be:
- 338949000 seconds, i.e.
- Or 338952600 seconds, i.e.
No more ambiguity — problem solved!
The UTC-aware structures are available in all the Neo4j 5 releases.
They are also available in any Neo4j 4.4 release following 4.4.12 (included), if the driver requests it and the server accepts the request.
That way, older drivers connecting to newer servers or vice-versa will continue to work with the existing datetime structures, thus not causing any disruption.