In this week’s five-minute interview, Julia Astashkina, Partner Marketing Lead for Solution Partners at Neo4j, speaks with Calum Chalmers, Senior Data Scientist at Capgemini about his views on the future of graph technology and about Capgemini’s UK Graph Guild.
Tell us a little more about Capgemini’s UK Graph Guild.
Calum Chalmers: In the UK, I lead the team of data scientists, data engineers and developers, and we are united with a passion for graphs, graph databases and graph analytics. The Guild is an initiative with the aim of offering a range of services and products in the graph space to our clients, from advice on what are graphs and graph databases, to helping our clients choose and implement the most appropriate graph database for their specific use case.
We also help clients develop graph-based POCs. We help clients obtain real insights from their data using graph models and graph analytics. And we love telling data stories with our graph-based visuals and so on. So yes, a fairly wide remit for the guild.
What has the UK Graph Guild been doing recently?
At the moment, I’m pleased and excited to say that we’re developing a graph-based investigative solution to help identify and combat fraud and criminal activity. We’ve named our solution Haystack, and we are currently developing it with our global partners, such as Neo4j and Linkurious. So we use Neo4j for the graph database, and then Linkurious for the graph visuals.
We also spend a lot more time developing other POCs, of course, and when we’re not doing that, we enjoy researching graph issues and graph topics. And myself and a number of my colleagues have been enjoying writing blogs, so there are a number of blogs on our site that discuss topics and trends within the graph space.
What problems are you trying to solve with Haystack?
We realized that there are a number of off-the-shelf investigative solutions out there, but in our view, they tend to suffer from a number of flaws. For instance, they often need multiple interfaces to surface the data; they tend to have a non-customizable tech stack. They tend to be a one-size-fits-all solution.
With Haystack, what we’re trying to do is develop a solution that addresses these and other issues. So for example, we’re aiming for a solution that will pair federated data with a fully customizable tech stack. And we wanted to develop a very intuitive and easy-to-use front-end, but which nevertheless really delivers the insights that are needed.
Why are graph databases so important?
In a nutshell, you can think of graph databases and graph analytics as the enhanced and contemporary version of relational databases and SQL models. The relational data model was developed in the ’70s. Back then, data sizes were teeny tiny – much, much smaller than what we have now. And the data was very highly structured. You could easily shape and mold your data into a dozen or so of these neatly structured tables.
Some of that’s a bit of an oversimplification, but certainly the relational database wasn’t designed with the big data era in mind. Now, we’re in the age of massive volumes of structured, unstructured and semi-structured data coming at us from all sorts of directions. We’ve got IoT devices embedded in almost any and every product you can mention, from fridges and cars and mobile phones.
We’re just surrounded by data. Now for highly structured data, I wouldn’t say that relational databases are obsolete, but they are nevertheless severely under strain. It’s difficult to explore large relational datasets to the full extent.
It’s very difficult to explore the explicit relationships that reside in our data. And in fact, frankly, in my view, it’s almost impossible to unearth the implicit and hidden relationships that also exist. But you can do these things with graph databases.
Graph databases can consume vast quantities of data without loss of performance, but we can also query that data. We can understand those obvious relationships, but we can also dig deeper. We can find those pertinent relationships with ease actually.
What is in store for the future of graph technology?
So I think with the graph industry, we’ve so far seen a story similar to what we saw with Hadoop when MapReduce emerged in the mid-2000s. Although they were more or less developed by a few people based at the time, I think at Yahoo and Google, the Hadoop ecosystem expanded very quickly to include solutions such as Yarn, Spark, Pig and Hive.
And as a result there are now numerous companies operating in that space, some of the big players being Amazon, Cloudera, Hortonworks and so on. But that’s just the big players; the market is very busy.
Similarly, with graphs, we saw in the early 2000s, there were a small number of graph vendors offering (back then) a fairly limited number of products and solutions.
Over the last decade, we’ve seen the graph ecosystem explode, and there are now dozens and dozens of companies offering everything from specialized visualization tools to graph computing frameworks to niche graph databases, and pretty much everything else in between.
That said, I don’t think the industry has matured. And I think we’ll see a contraction in the future, maybe something similar to what we saw happen in the 1980s with the relational database industry.
That industry came to be dominated by a small handful of significant players, such as Oracle, Microsoft and IBM. And they came to dominate that market because of the adoption of SQL as the single industry standard query language.
In the graph industry, we’re beginning to see something slightly similar in that there’s a standard graph query language that has been developed and is being backed by a number of the larger players, such as Neo4j. So of all that movement, some only just started, it’ll be interesting to see how that develops in time and whether that causes a contraction in the market.
Do you anticipate a turning point where graph replaces SQL databases?
I think it is good to recognize that certainly there has been a consistent growth in the adoption of graph databases over the past decade or so. Certainly there’s a massive spike in interest in graphs the last few years.
However, I think it’d be very premature to say that the market’s mature in terms of the adoption of it. I think a lot of companies still see it as a bit of a niche solution and are not quite sure what to do with it.
We are some ways away from graph databases replacing SQL databases. To be honest, in my view at the moment, I don’t see graph databases replacing SQL databases. Instead, I think we’ll maybe see a trend toward them sitting side by side for a while.
That said, longer term, there are developments that will help the adoption of graph and the growth of graph databases. I’ve already mentioned the movements within the industry to create an industry-wide standard graph query language. And I think the development of that is critical.
I also think making graph databases and technologies more easily accessible to clients and to developers is key. For example, we’ve seen the introduction of Neo4j’s own fully managed cloud service, Aura. So I think these sorts of things will really help the adoption and growth of graph.
When will graph databases become more mainstream?
So it is always difficult to make a prediction as to when in my view. I’m wary about making specific predictions like that; we’ve seen a trend in the tech industry to make hyped-up claims about whatever new technology or product is the flavor of the month.
We’ve all seen claims that such and such a technology will introduce a new paradigm and revolutionize things. And unless you invest, now, now, now, you’ll be left behind. Invariably those predictions are hype and it’s part of the marketing machine. But that said, I think the claims and noise that have been made about graphs and graph databases and graph analytics are much more accurate and reliable.
The industry isn’t yet mainstream, but equally the industry isn’t immature either. It has to be remembered that graphs and graph databases are not actually new, and I think that’s quite easily forgotten.
From a computer science standpoint, you can actually trace the origin of graph databases back to, I think, the mid-1980s. Albeit commercial graph databases didn’t really take off until the mid to late 2000s. That’s at least graph databases that satisfied ACID guarantees and so on.
From a mathematical perspective, which is one of my key hobbies, you can trace the theory of graphs back to the time of Leonhard Euler and the Seven Bridges of Königsberg problem around 1735 or ‘36.
So graph theory has been around for a long time. You can actually turn the question almost all the way around, in my view. Graphs were well ahead of their time. And so the question isn’t whether graph databases would become mainstream or not, but rather, why have they not been mainstream until now.
I think the answer is really that graphs needed our age of big data to arrive to come into their own. They needed that orientation to query these vast volumes of data and to then make sense of that data.
What’s next for Capgemini and the UK Graph Guild?
We’re very excited about the UK Graph Guild and what we can offer our clients and partners such as a Neo4j and Linkurious, and of course others.
I mentioned our Haystack solution, and we are currently working very closely with our partners on that. I’m looking forward to taking Haystack to the next stage of its development early next year.
We will be launching a landing page for the UK Graph Guild early new year. It’ll be on Capgemini’s UK main site, and there will be a landing page for clients, partners, interested parties and so on to find information and useful articles on graphs. The site will provide access to global partners, white papers and more. I’m hoping that’s early next year, and then looking at possibly toward the end of the year, I would love to hold our first graph analytics conference for clients and hopefully join with a number of partners. A lot of things to look forward to next year.
Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at firstname.lastname@example.org
Discover Graph-Based Search