Nom, nom, nom…
In the last week of June 2022, Neo4j Ops Manager (NOM) was released. Our colleague Chris Shelmerdine published a great post with many insights into functionality but also into technical details about how NOM is built: Introducing Neo4j Ops Manager: The Tool You Need to Boost Your Ops Team’s Productivity.
One part caught my eyes of course:
The Neo4j Ops Manager server is a Java application, which manages the main logic, and also hosts the UI and agent APIs. It requires Java17 to run, and leverages Neo4j Spring integration and SDN (Spring Data Neo4j) to persist the metadata into a Neo4j database (persistence). Ideally this is a dedicated single instance database (for which a limited use, resource limited license is included).
My colleagues Ali and Sascha haven’t cursed me too much while working on the backend which I take as an indicator that things went relatively smooth, which makes me really happy. It validates the approach that the whole SDN team took when developing version 6, which spun off SDN/RX and replaced SDN5+OGM in April 2021.
One of our main reasons to rework SDN was the goal to be fully compatible with an immutable domain-object-approach and be part of the ongoing reactive story.
Now, I need to take a short detour: My own blog post about resilience, preparing for failure and retrying Try. And then retry. There can be failure. has been shared widely and it is an important topic till this day: The way Neo4j Causal Cluster works requires diligence when setting up a connection to it. There are some error scenarios that can safely be retried. All the official drivers provide a built-in feature to do this.
However, those so called “transaction functions” bring also transaction management along: They receive a (hopefully) small unit of work and execute it until it succeeds.
That works well in many scenarios, but not everywhere. While it is a good idea for small applications, to execute a couple of statements and not needing to configure the retry mechanism, it doesn’t play well when your target ecosystem provides its own transaction management, such as Spring Framework and Spring Data in particular do.
Why is that? A transaction in a Spring application usually reaches at least across a service and your preferred data access mechanism of choice, often a repository. While repository based queries alone could, of course, be wrapped into a transactional function, an elaborate service call might even involve external systems.
How could the Neo4j driver then know whether the whole unit of work is safe to retry and even what constitutes the unit of work?
And apart from that: Wrapping repository calls into Neo4j transactional or retryable functions would mean moving the control of the transaction flow from the driving factor (the application framework and the build-in transaction manager guided by the application) to the underlying driver.
This can’t be the goal.
Does that mean Spring Data Neo4j doesn’t work with Neo4j cluster setups? No, of course not!
But you need to be diligent of potential error scenarios, for example in cases where you have a quickly changing cluster setup in the backend. While we generally recommend a retry mechanism (such as Resilience4j), the Neo4j driver is pretty good at catching error scenarios before a session and a transaction is opened and acquired. It will retry those anyway. If problems happen during the execution of a transaction, then not so much.
Back to NOM: It uses the reactive variants of Spring Data Neo4j, which makes it even a bit harder to get retries properly implemented. Luckily, there’s a build-in Retry-Operator in Project Reactor which can be combined with Springs TransactionalOperator. The latter allows making reactive flows fully transactional in the Spring world.
The thing to keep in mind is, that the Retry-Operator works by re-subscribing to the upstream publisher. That means build the whole flow, including transactional aspects and as the last step, apply the retry mechanics.
How does this look like?
First, a utility method is given to create a transactional operator with some defaults that work. It needs a properly configured
In a second step, a factory method named
retrySpec is defined:
Here, Project Reactors Retry operator is configured to use an exponential backoff strategy and to apply Spring Data Neo4j’s predicate to check if something can be retried according to the drivers definition (here checking for an exception type).
Both methods get combined into
ReactiveNeo4jTemplate interaction does roughly look like this:
If there’s already an ongoing, declarative transaction (Yes,
@Transactional will work too in a reactive world), the interaction would look roughly like this:
Yes, assessing the risk of failure and putting proper mitigations in place is a bit of effort, but usually worth it. Like our NOM-team you don’t want to retry a whole, big interaction of which you don’t know if can safely be retried, but only specific cases where the retry operation makes sense.
In the reactive world, additional operators are your tool of choice.
Big shoutout from the drivers and Spring Data Neo4j team at Neo4j towards the NOM team. 🎉