For Tim, it’s all about flexibility, scalability and resilience. As the CEO (and an engineer) at CluedIn, he needed a primary data store to join together a tech stack with a variety of databases. With a graph database, Tim was able to automatically join vast and deep amounts of data and scale right along with a major company growth spurt that went from 15 customers to a database of 280 million nodes with close to a billion relationships in the graph.
In this week’s five-minute interview (conducted at GraphConnect New York) we discuss how Tim’s company uses Neo4j to enhance their polyglot tech stack design with new engineering techniques, including machine learning.
Talk to us about how you guys use Neo4j at CluedIn.
Tim Ward: My background is mainly in engineering. I’ve been a software engineer for 12 years, and I try different platforms out. And that’s really what brought us to Neo4j – it was different type of technology that was worth investigating.
At CluedIn, we take a very polyglot persistence design to our technology stack. We actually use a whole variety of different databases, and the way that we use Neo4j is actually as our primary data store.
It’s really based off this ethos that we have, that connected data is always more interesting than disconnected data, especially when you’re wanting to do something like we do, where we’re integrating data automatically from different systems.
It requires this kind of database where context, graph theory itself and the design patterns that are in it are really just necessary for solving this problem to a higher precision than the other types of technologies we are used to.
What made you choose Neo4j?
Ward: So I started working with Neo4J close to six years ago, and I started on an early 1.5 release. I think the interesting reason we were looking in the graph space was because of the new possibilities in engineering techniques it gave us.
Graph technology allows us a different data structure that solves problems that inherently other data stores you would typically bend to solve the same problem.
The three main points why we chose Neo4j were: first, its ability to join across huge amounts of data, no matter the kind of depth of the connection. The next was the pattern matching techniques and, finally, it was actually the kind of path traversals, the ability for us to kind of take two discrete nodes that were in our graph and to reverse engineer the connections between those two data points.
Neo4j, of course, for us, when we were looking in the market, it just seemed like the obvious choice. It had a fantastic company behind it. It had a lot of growth. It had some funding, which we knew that that technology could get the necessary attention that needed to fulfill the graph tech story.
So I think that and the combination of it integrating well with our stack and having the APIs available for us to work with it in an agnostic way, no matter what libraries we were using or languages, really helped – it was an easy choice for us to choose Neo4j.
Can you talk to me about some of your most interesting or surprising results you’d had while using Neo4j?
Ward: I think the most interesting results that we’ve had was our scale story.
We started off with 15 customers and grew our company into a database of 280 million nodes right now with close to a billion edges (or vertices) in the graph.
What surprised us, and that also challenged us, was the resilience behind the platform and, that being a generic graph model, you could really take control of the platform in the parts where potentially focus was needed from the product.
Especially around things like indexing and scaling using a schema instead of – we went through this era in the NoSQL area where no schema was something that was sold as a big plus. And what you really realize when you go to production is that, well, a schema is actually something that’s extremely necessary.
In Neo4j, our ability to influence how the core platform actually works with things like indexing, it was the flexibility that surprised us in a very positive manner.
If you could start over with Neo4j, taking everything you know now, what would you do differently?
Ward: I think what we realized after time is that there are some odd things that you might need to do with your model to cater for some of these scalability type of complexities and, to be honest, only really show themselves when you are in production at huge concurrent read and write levels.
You’re also dealing with such a diverse amount of data that’s not necessarily fitting all into the same model. We work with different customer data, and one customer’s industry looks completely different to the data from another industry.
The ability to have a model where, at a later point, we could bend and change, I think that was one of the things we would probably revisit. But in hindsight, maybe we wouldn’t have discovered that if we didn’t go with our original easy way of modeling the data.
What do you think the future of graph technology looks like in your industry or sector?
Ward: We’re on a very similar mission to Neo4j. We’re wanting to connect the enterprise, and we’re using a lot of the same techniques that Neo4j is also saying is in their vision. So, we’re using machine learning techniques as well.
Where we see the market going is that a lot more people are adopting graphs as just one of the extra types of databases you use to solve problems. And I think where it’s going is the application of machine learning, combined with things like the graph, to be able to produce results where companies actually start to utilize their data.
Companies can become data-driven. We can get out of these archaic ways of manually integrating systems in a very tedious, manual approach and move towards a company’s data telling us how things are connected.
In the future, maybe models for the graph will be inferred from the data instead of the other way around. I’d like to see if that’s where we’ll go.
Anything else you want to add or say?
Powers: It’s been great to talk to the people from the Neo4j product team and give them feedback from the field.
There’s huge value in being able to talk to the actual engineers changing those things, who make a tangible impact on allowing companies like us to be able to scale to some of the biggest companies in the world. I think that’s kind of priceless.
Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at firstname.lastname@example.org