See “Of Farm Topologies and Time-Series Data” with data analyst and engineer Christoph Engelbert, live during Neo4j’s NODES 2022 live streaming webcast at 1420 GMT / 1520 CET Thursday, November 17. Registration for the 24-hour live, worldwide event is free.
Christoph Engelbert, the CTO and co-founder of Emsdetten, Germany-based agricultural software firm Clevabit, is a developer by heart, with strong bonds to the open-source world. As a seasoned speaker on international conferences, he loves to share his experience and ideas, especially in the areas of scalable system architectures and back-end technologies, as well as all things programming languages. In this interview, Chris shares his real-world experience with building scalable solutions, and gives us a sneak peek into his upcoming NODES 2022 session.
“That’s Totally a Graph”
Jennifer Reif, Neo4j: Where did you first encounter graph databases?
Chris Engelbert: I think that was actually Michael Hunger‘s fault. I knew they existed before, but I never really got into that kind of stuff. Until we set up Clevabit, I never really thought I had a use case for it.
A little bit of background: What Clevabit did was basically create a management observability tool for animal farms — anything from temperature, humidity, CO2, ammonia levels, all the way to medication and stuff. And when we started out, it was like, “Oh, how complicated can a farm be? There’s probably a barn and a couple of animals and that’s it.”
And then you figure out, well, people have multiple barns, sometimes even multiple stores or floors. They have compartments, they have inlets. . . and that is only the basic stuff. That would all be still a tree. Then you figure out that some water meters are actually for two compartments, but not a third one. And the weather station, which is outside, needs to provide data to all of those. And this is where I asked Michael, “Does that look like a graph to you?” “Oh yeah. That’s totally a graph.” “Alright, alright. I thought so.”
So we dug a little bit deeper, and looked into how building automation systems handle this kind of stuff, because that’s basically what it [our use case] was, in the end. The only difference was, by law, we had to make sure that we understood what the barn looked like three years ago, even if they completely remodeled it. That is where all the building automation kind of falls flat because they don’t support that. If you control HVAC, you don’t care what it looked like last year, you want to understand what it looks like now. But there are a few implementations, and one of them is actually based on a graph solution. So, I looked at that, and that was where we started modeling out the graph database based on Neo4j.
Jennifer Reif, Neo4j: Did you look at other solutions as well? Were you working with other tools at that point, and then move to a graph?
Chris Engelbert: We didn’t look at other graph databases, but we tried to model it in Postgres, which did not necessarily work. Until then, we started with a simple table. Then we move to the LTree plug-in, which is a plugin for Postgres to build out trees and make them officially queriable. We sometimes had weird things like, “For this customer, give me all meters.” Meters could be sensors or devices or an RFID tag on a pig’s ear. We wanted to find all of them or some of them. Sometimes you had to search the tree upwards, and sometimes you had to search downwards. Beautiful, right?
That worked for a while before we came up with the water meter challenge. And then it really got complicated. We did not look for other things because I knew Michael. I figured it’s probably the easiest way [to ask], and in general, I only heard good things about Neo4j so far. So I tried it, and it worked. And same thing for the time-series database. I saw Timescale and liked the idea that it was a Postgres plugin. I tried it out, and it just worked.
Jennifer Reif, Neo4j: What did you find was the most complicated piece about integrating the data stores for your project?
Chris Engelbert: I don’t think it was anything specific to Neo4j. It was the common thing of database migration, which is even worse than a version migration of an existing database. At that point in time, we still had only one API layer, and everything was not cleanly separated because we sliced and diced it out of one single prototype. So we had a lot of services going to the same tables and changing data and updating it. The biggest part was getting this clean separation into internal services, making sure that everything had one service that actually mutated data.
And then we started to slowly, but steadily, move one resource type or one element type after the other to Neo4j, until we had all the topology and customers, and stuff like that. It was the pure thing you always have to do with migration projects. There’s a lot of stuff that has to be written and tested. And it’s just no fun.
The Weird Guy at the Weird Party
Jennifer Reif, Neo4j: Could you tell us a bit about your background?
Chris Engelbert: I come from a pretty strong engineering background, mostly performance engineering. I often claim that if you have a party, and you have only technical people, I’m the weird person in the corner that nobody wants to speak to because I’m talking about garbage collection and how you can optimize stuff like that. So I’m the weird guy at the weird party. In general, a lot of everything [my work] comes down to performance optimization. That’s why I mostly stayed in all the back-end development.
Apart from that, I’ve been with the German Ubisoft brand for a while and did some game server programming. I’ve been with HRS [Hotel Reservation Service] writing a booking engine — the thing that actually creates your booking and tries to find the best rates. I’ve been with Hazelcast telling people how awesome in-memory data grids are. I’ve been with Instana talking about observability, and here we are. Now, I’m talking about time-series data.
Jennifer Reif, Neo4j: You mentioned you are a developer with special focus on scalable systems and back-end technologies. What do you find most interesting or challenging about those areas?
Jennifer Reif, Neo4j: Which programming languages do you have the most experience with?
With my own startup, we had these hacking games where you got an exercise, and you had to select a programming language and do it as quickly as possible with as many green unit tests as possible. My engineers loved to challenge me by picking the programming languages that I had never even heard of.
Jennifer Reif, Neo4j: Do you have preferences? Ones you gravitate towards?
Chris Engelbert: It was Java for a long time. With my own startup, we started implementing in Go because we needed subdomains. We wanted the Acme protocol and Let’s Encrypt stuff, so that was only 15 lines of code in Go. We started slicing stuff out into microservices, but we mostly stayed with Go.
These days, I’m certainly going back to Java. I think my personal preference would actually be either Java or Kotlin.
Jennifer Reif, Neo4j: Could you tell us a bit about the inspiration for your NODES presentation on integrating multiple data stores (including Neo4j)? How did you end up submitting for NODES?
Chris Engelbert: There was a lot of history behind it, and we started with, “How hard can it be?” And then we figured out it was a little bit harder. And every single iteration of the implementation, “Oh that’s still not it,” before we finally ended up with a graph that we could model all the ways we ever wanted it. It didn’t matter if it was upstream or downstream or just a search for everything with a specific set of tags and stuff.
I think that is really interesting, because we used the graph database to model the farm topology, to figure out what devices were reachable, how they were connected to certain things, like compartments or barns. But then we took this information and went down to the time-series database, to collect or to gather the actual IoT metrics.
I think the use case is more common in the IoT world, but we’re probably not the only ones that have that kind of issue. Sharing experience is always nice, especially when you can embarrass yourself with all your failures in the past.
Jennifer Reif, Neo4j: What do you hope attendees learn or discover through your presentation?
Chris Engelbert: For me, it was always interesting to see migration projects happening — how they went, what kind of problems they encountered, even if it wasn’t directly related to my specific use case or migration phase.
The first thing you have to try and figure out is, “Do I actually have clean separation?” If you do, you’re probably good. Otherwise, here’s your first six-month project: clean it up, right? For us, it was a little bit longer. Also, how we approached the old data model and how we transferred it because we sliced data differently. We had way more options to give nodes multiple labels and say, “Hey, you’re a device, but you’re also a meter. You’re a sensor, but also a meter.” At first, we had to search for devices or sensors or ear tags. Then, everything was a meter, and it was nicely tagged, and it [a search] was easy to do.
Those kinds of experiences are valuable, especially because it was the first project with the graph database for me. It was quite a lot of learning involved and trying to figure stuff out. I’m still not 100% percent sure I did it correctly, but it works so far.
NODES 2022 is a free, online graph tech conference that will be live on November 16 and 17. The agenda spans twenty-four hours, providing content for time zones around the globe, and is packed full of beginner, intermediate, and advanced content for technologists and graph-lovers. For those interested in attending Chris’s session (like myself!), register for the event at neo4j.com/nodes-2022 and catch his presentation on November 17, 14:20 GMT / 15:20 CET.