Neo4j’s Emil Eifrem and FirstMark’s Matt Turck on the Graph Database Explosion

At this year’s Data Driven NYC, Neo4j CEO and Co-Founder Emil Eifrem sat down with Matt Turck of FirstMark, and they discussed the evolution and explosion of the graph database space.

Emil previously spoke at the event in 2015, and it’s exciting to hear them reflect on how graph and its myriad applications have blown up over the last few years.

Emil and Matt jumped right into discussing Neo4j’s Series F investment round, and the conversation took off into a number of different directions from there. Check out their conversation below and learn more about the past, present, and future of Neo4j and graph technology.


The Largest Round of Funding in Database History

Emil Eifrem: We raised this round last summer. We went out with some numbers, like the valuation, for the first time, which was north of two billion. It was actually the largest round in database history. Mongo, for example, who was the early mover in, let’s call it, the broader, modern, non-relational database space – they raised a total of about 300 million cumulatively.

So it was the largest round in database history, which was exciting, because graph databases, which is the category that we’re part of, and that we helped define and evangelize at events such as Data Driven back in 2015, has, by and large, been seen as a really valuable corner of the database market, but also a niche market.

It used to be that people would say, “Yeah, great technology, kick-ass CEO.” Okay, maybe not too much that. But really just useful for social networks. That used to be the thing back in the day. Then fast forward five years, it’s like, “Well, great technology, but really just useful for a few use cases.” Then every year, those use cases start expanding.

Of course, we have the privilege of first based information, so we see the breadth of use cases. The perception is always lagging back naturally. So the fact that we then went down and raised this big round was one of the signals that the category is truly, truly taking off.

The Explosive Growth of Graph Databases

Matt Turck: Great. Give us a sense of how many are in the company right now. How big is the organization?

Emil Eifrem: We’re just north of 600 people. I have no idea how many we were back in 2015. We actually just, earlier today, we went out with a momentum release where we talked about how we crossed 100 million ARR last year.

Just to give it flavor, I think there are five database companies that have crossed 100 million – let’s call it the NO-SQL crowd or modern operational database companies. It’s Mongo, and then it’s us and Redis. We’re on that Mongo path. Then there’s Couchbase and DataStax who maybe have been traditionally on a little bit of a different path right now. They are growing maybe at a lower pace and plateauing. Maybe they’ll turn and become amazing again. But it’s really down to Mongo and us and Redis who’s in that cohort at the moment.

Matt Turck: Yep. Why is the space accelerating, going from niche to much broader acceptance? I’ve seen that chart, that famous chart on DB-Engines, which showed that graph database is by far the fastest growing category in databases. I read somewhere that Gartner calls graph databases the foundation of modern data and analytics. What’s happening?

Emil Eifrem: Yeah. Look. There’s a lot of factors that I think are contributing to and accelerating and enabling the broader shift towards alternative databases, that aren’t specific to graph databases – things like the platform shift to the cloud, and then there’s advancements in architecture like microservices and containers that enable you to more easily swap in a new type of database. Stuff like that is as applicable to any database, as well as to graph databases. The thing that’s specific to us is this broader trend around the world becoming increasingly connected. The fundamental premise behind what we do is super simple, actually. In fact, today, people might even call it simplistic. Which is what I just said. Everything is increasingly connected. Hardly a controversial statement on a Zoom call from New York. I’m in Malmo, Sweden right now. A bunch of people are, I’m sure, calling in from New York, but also elsewhere, probably, on the planet.

Everything is becoming more connected. We all know that intuitively. But the consequence of that is a little bit more subtle. What is data? This is Data Driven New York, right? What is data? Well, data information describes the real world. So as the real world is becoming more connected, data is becoming more connected.

That’s neither good nor bad. That’s just an objective observation of what’s going on. But what that implies and the consequence of that is that connected data exerts this massive amount of pressure on the traditional relational database, because the normal relational database works with tables.

You can model connected data in tables. You call them foreign keys. You have a record with an identifier, and then you have another record with another identifier. So Matt, you have ID three, and I have ID seven, and we’re connected, so then there’s a three and a seven showing that we’re connected. You can do it, but it’s really awkward. If you want to query along it, if you want to find patterns, how things fit together, it completely starts breaking down.

So what we did 100,000 years ago, when dinosaurs ruled the earth, we came up with this concept of what’s called a native graph database. We’ve optimized every layer in the stack of the database architecture completely around connected data. We’re not built on top of a different database running back. It’s a native architecture.

That means that if you want to query along how things are connected, want to find patterns in that, we are frequently not 50 percent faster or a 100 percent faster – we’re a thousand times faster. Our customers frequently tell us that we’re a million times faster.

So when you want to do a recommendation engine, you want to find patterns in, “Wait. Who is Matt similar to, and what have they purchased? How are they connected and connected to the product hierarchy?” – that’s typically five, 10, 12, 15 hops in a connected data structure. A graph database is freaking amazing at that.

Emil Eifrem: Connected Data Visionary

Matt Turck: Just to recap, just to make sure it’s clear to everyone. First of all, “graph database” – you coined the term if I remember correctly.

Emil Eifrem: Yeah.

Matt Turck: That’s when you started the company in 2007 or something like that. You’re literally at the origin of the space, which was just your idea and has now become a whole space, with different companies and competitors and all those things. You really pioneered all of this, just to put it in context.

A graph database is a database that elevates relationships as first-class citizens, as opposed to just rows and columns. The product understands how entities are connected to one another, in the most simple layman’s term. Is that correct?

Emil Eifrem: That’s spot on.

Matt Turck: OK. What are the use cases? You just mentioned recommendation engines. I think Airbnb is a classic example of that. Give us a range of the different use cases, including how Neo4j customers use the products.

Emil Eifrem: Right. For example, recommendation engines is one example. Fraud detection is another one. Capturing fraud rings – that’s all about a number of transactions that individually are OK.

Matt Turck: Why is fraud a graph, for example? Why?

Emil Eifrem: Yeah, exactly. Right? Traditionally, you wouldn’t think of it in that way, but what all fraud detection software is doing out there is trying to find anomalies.

Let’s say it’s credit card fraud. You would have two dimensions. One is the number of transactions. The other one is dollar value per transaction. Then we’d kind of like a scatter plot of that. You would find a band of what’s normal. Everything that’s outside that is an anomaly. So their fraud detection analysts investigate that anomaly. Basically like that, except it’s not two dimensions, it’s like 19 dimensions or something. But conceptually, it’s the same.

So that’s great. We will capture a bunch of different things. What it won’t capture is what if you have a number of transactions that are all within this band of what’s normal, but they’re connected in fraudulent ways, like a fraud ring. The only way you can find that is if you can operate on connected data. That’s what graph databases do. That’s another classic use case.

Then you have a bunch of other things like customer 360. How’s my individual customer connected to external social media, but all of my internal systems. Data lineage, very important in regulated industries. How does an individual data item evolve over time? For GDPR and compliance reasons, you might need to do that. Entitlements or identity and access management. KYC. You go down the list, it turns out that there’s a lot of use cases where the value is in how things fit together.

Then coming back to your original question, why is the category taking off? I say, well, it’s because everything is becoming more connected.

I’ll give you an example of this. When you and I first met in 2015, supply chain was not a use case for Neo4j. Why? Because most companies that produced physical goods, that produced stuff, they might have a supply chain that is two, three levels deep. So if you want to digitalize that and analyze it, you can shove that into a classic relational database. A little bit awkward, your engineers will have to compute some joins and whatever, but doable.

Fast forward to today. In 2020 in particular, at the start of the pandemic, for sure. Today in 2022, any company that is producing physical goods is tapping into this global supply chain spanning continent to continent. That is frequently 20, 30 hops deep. All of a sudden, if you recall last year, the Suez Canal was blocked for a week. How does that cascade across my supply chain? Well, the only way you can figure that out is by digitalizing your supply chain. Then all of a sudden, you’re dealing with this deeply connected data structure.

If we abstract that and we figure out what’s actually happened here, well, what’s happened is that it’s exactly the same use case as back in 2015, when I was on stage in New York. It’s exactly the same use case, but the world is so more deeply connected now, and therefore data becomes more connected. Therefore it’s now a kickass use case for Neo4j and graph databases. This is just happening across use cases, across industries, across verticals. That’s the wind behind our back.

Cypher as a Solution

Matt Turck: If I’m a technology person at a company, and I have data problems, how do I figure out what I use for different problems? You have key value stores, you have document databases, you have relational databases, you have graph databases. How do I choose the right tool, and how does it all work together?

Emil Eifrem: Yeah. It’s actually pretty simple. You start with the shape of the data, and you look at the query workloads that you want to run in that data. If that data is very tabular, if it’s a payroll system, and you want to record all the individuals, and they’re all well structured, all of them have exactly the same schema, and you want to calculate average salary, and blah, blah, blah, blah, stuff like that, awesome. Relational database. Go.

Or if you have a bunch of JSON documents sitting around, and you don’t really care how they’re connected, document the database. Go.

Or if you have a dataset that is highly complex, that is evolving, where the business requirements change, where the value is in how things fit together, like a shopping cart, which is connected to order items. Those order items are connected to a product, which sits in a product hierarchy, and how things fit together, a graph database is your best fit.

So that ends up being the first go-to move. Look at the shape of the data and the queries you want to run on that. That’ll clue you in very rapidly where you should try to evaluate first.

Matt Turck: I think, to be a Neo4j user, you require people to use a different language called Cypher. I’m just curious how that compares to SQL, which is really the language that everybody knows for databases. Why is that a different language, and how steep is the learning curve if you know SQL to know Cypher?

Emil Eifrem: Yeah. The big comparison is probably something like the following. SQL is old and boring. Cypher is new and sexy. Done. That’s it.

No, it’s actually spiritually very similar. It’s a declarative query language, which basically means that you don’t have to write programming language interactive code, depending on how technical the audience is. But you can type it in a very simple… You can describe what pattern you’re looking for. You draw it – some of the people who are older in the audience will recall this – with something called ASCII art, which is basically you end up drawing… You draw notes using parenthesis, and then with arrows you describe the little pattern. Then you throw that to the graph database, and it’s going to find that pattern and return it back to you. So spiritually very similar to SQL, but really pretty astounding…

One of the biggest things that have happened since 2015 – it’s probably a good thing for us to contrast to what it was like the last time we spoke – is that Cypher is the most popular graph database query language, but what we’ve end up doing is that we went to the SQL committee, the committee that is standardizing SQL, and we said, “You know what? We don’t want Cypher to be proprietary just to Neo4j. Yes, today it is one of our key competitive advantages to other graph databases out there. But the entire space is better served if there’s a unified standard query language for all graph databases.”

Just as a little bit of a background here. Every single new database paradigm, since the mid-’90s, has gone to the SQL committee, and they said they want to standardize the query language. Object databases tried that in the mid-’90s. The SQL committee said, “You know what, object databases? You’re just a feature of SQL. So we’re going to incorporate some of your functionality into SQL, but that’s it.” XML databases, in the early 2000s, they went to the SQL committee and said, “You know what? We can just sprinkle some XML syntax into SQL.” Document databases, in the mid-2010s, went to the SQL committee and said, “We want to standardize how you query document databases.” SQL committee, “Nah. You’re just a dialect of SQL. We’re going to spray some JSON into SQL. It’s not needed.”

For the first time ever in the history of databases, the SQL committee looked at Cypher, looked at graph databases, and then said, “You know what? This category is here to last. This is an actual sibling to SQL” They created the GQL query language, which is, at this point, 98 percent identical to Cypher, our query language. It’s, again, the first time in 40 years this has happened. I think that’s a pretty stark blessing around the future and the value of graph databases as a category.

Why Choose Neo4j in a Flooded Market

Matt Turck: Great. A couple of questions from the audience that very much cover where I was going to go next, so let’s use those.

First question from Balaji. He says, “There’s been a flood of investments in the graph DB space. How does Neo4j differentiate itself? More broadly, is there opportunity for more than one player to exist?”

Emil Eifrem: Yeah. It’s a great question.

A couple of things on that in terms of differentiation. We’re the OG graph database. We’ve been around the longest. If you attend Data Driven New York, you’re probably somewhat clueful about data, so you’ll know that in many product categories, you want to be the new kid, in many ways. For a database, maturity, robustness, stability is actually a key part of the value proposition.

So the fact that we’ve been around, we were the OG, the one that defined it and so on and so forth, is actually a massive advantage. Because what this means is that we have by far the most robust product, by far the biggest developer community, and by far the biggest reference account base, so the most customers by far of all graph databases out there.

We’ve also happened to have this modern, which maybe sounds a bit weird, this native graph architecture, where a lot of the more recent, as the graph spaces become harder and harder, the more recent entrants, what they try to do is they layer graph functionality on top of their existing core. They don’t take the native approach, which takes forever to build. But that’s ultimately the only way to get to the scalability and the performance. That speaks to the first question.

In terms of is there room for more, I absolutely believe so. I think this is an absolutely massive market. Databases is the biggest market in all of enterprise software. It’ll soon be a $100 billion market. I think graph databases can be a significant chunk of that, 20, 30, $40 billion. So obviously, there’s room for more than one company.

Matt Turck: One of Balaji’s next questions was precisely to your point about the established customer base. If you could share some customer growth profile, like how many customers, how fast are you acquiring, in what space, what industries, what verticals – anything you can share.

Emil Eifrem: Yeah. We have over 1,000 customers in production right now, and hundreds of thousands of active developers in our community. That’s just to give you some quantifiable things.

Then over 75 percent of the Fortune 100 are using Neo4j today. All 20 of the biggest banks in North America – all 20 of them are using Neo4j. Seven of the 10 biggest retailers in the world are using Neo4j. Four of the five biggest telcos. That gives you a little bit of a flavor.

99 percent of this will be a data thing, because we’re still, I guess, in the pandemic era. Matthew, we were just on a plane. Anyone who’s ever ordered a flight ticket, 99 percent of all flight ticket calculations – so which route should I go from Point A to Point B when I fly from Paris to New York, is that a direct flight, do I connect in Heathrow, how do I get there – is done with Neo4j. 99% of all airfares.

Matt Turck: That’s a crazy stat. That’s amazing.

Emil Eifrem: Yeah. Then every single room you’ve ever booked at a Marriott or any hotel that is owned by Marriott – so the Ritz Carlton and all that stuff – all of that is calculated with Neo4j. So very likely, you’ve actually used Neo4j, if not today, at the very least this business week. That gives you a little bit of flavor.

Matt Turck: Very cool. Couple of questions from Gaurav. First question is, “Emil, who is your favorite Indian American board member of all time?”

Emil Eifrem: Okay. I assume Gaurav is Gaurav Tuli, who was on my board for the longest time. He’s with a firm called F-Prime Capital. He was, for sure, the MVP of my board, which I’ve been saying publicly and privately.

No offense to any particular VCs on this call, but if you have any opportunity to raise money from F-Prime or, for that matter, FirstMark, I have to add, you should go ahead and do it.

Neo4j as a Category Creator

Matt Turck: All right. Very good.

Second question. “Although graph theory as a math concept is not new, you’ve evangelized a new category of graph databases for a long time. That must have been lonely. Can you talk about some of the highs and lows of the journey? Now that Neo and the category have “made it,” quote, end of quote, can you talk about any secrets to category creation in the data world?”

Emil Eifrem: Yeah. It’s such an interesting… I’m obviously an engineer by background and training, but I’m a student of and a lover of marketing. I think marketing is very interesting. Category creation happens to be one of the areas that I really love in marketing.

One of the reasons that I love category creation is that it’s so counterintuitive. For example, when you start out … We coined the term graph databases back in the days. When we did that, we started thinking, “What does success look like 10 years down the line? What does success look like?” Well, success looks like we have a bunch of big companies that are competing against us. That’s what success looks like.

You look at today, and you see who’s participating in the graph space. It’s Amazon, it’s Microsoft, it’s Oracle, it’s SAP. The entire axis of evil of enterprise software companies are in the space, along with a cohort, one of the previous questions alluded to, a cohort of younger startups.

That’s what success looks like when you do category creation, that you have a thriving category. Because if not, then you’re probably not doing something that is valuable enough.

That’s one of the things that, in the early days, you’re just talking to everyone, and you’re evangelizing. Every single person that you talk to knows graph databases and understands the value of them – you either talk to them directly or one hop away.

Then all of a sudden, there’s a tipping point where, “Wait. I have no idea how this person heard about graph databases.” So it’s starting to truly resonate in the market. I think that was a huge tipping point for us. Part of that is, honestly, getting a bunch of competitors in the space, which is a net positive thing for us, as the leader in the space.

Matt Turck: Together there was persistence, lots of talks, lots of content creation.

Emil Eifrem: There’s a ton of that, and then a deep focus on practitioners. We go to market by winning the hearts and minds of developers. Yes, we love to monetize the companies where they work, but we’re open source. We give it away for free. We have a free tier in our cloud service. It’s called Aura. And we have a free tier of that one.

We win the hearts and minds, and then they wake up, and they realize that they work at one of those top 20 biggest banks in North America, and they have a problem, and they have a bunch of connected data, and they realize, “You know what? Graph database would be a great fit for this. I played around with it over the weekend or in the evenings and whatnot. This would be a great fit for it.” That’s when we engage commercially.

The other piece that we haven’t talked about, a real high order bit that has changed since we last spoke back in 2015, is what I just told you is absolutely accurate, the fact that we are so developer centric, but today, and this happened just in the last 12 to 18 to maybe 20, at most 24 months, data scientists are an equally as big of a persona for us as the developer.

If you look at our top line metrics around awareness, or visits to, or leads or engagement – whichever way you want to slice and dice it, data scientists are as prevalent today as developers. It turns out that the initial value prop for developers to build applications on connected data, it’s as true as it ever was, and it’s a massively growing thing and so on and so forth. But data scientists, they’re increasingly realizing that, “You know what? If I can extract how things are connected and use that as a signal, the relationships between data points as a signal into my machine learning, all of a sudden I can increase my level of predictiveness.”

Google moved there five, seven years ago. They spoke publicly about it, graph-based machine learning. It’s kind of true – where Google was 10 years ago is where the rest of the enterprise is today, kind of a thing. Neo4j is by far the best engine for that.

Matt Turck: Yeah. Balaji was asking if you were leveraging graph neural networks.

Emil Eifrem: Awesome. Yeah. That’s fun. That’s exactly what I’m talking about here. This is an area where Neo4j is very unique amongst databases.

You mentioned the site DB-Engines. DB-Engines today tracks over 350 databases, which is crazy. When I grew up as a developer in the mid-’90s, there were four or five databases to choose from. And they were all the same. They were all relational databases. Now there’s over 350.

There’s also … I think there’s a great landscape thing that some guy’s posting every year. That’s a great way to make sense of that. I don’t know if you heard of that, Matt, but I think you might want to look into it.

Matt Turck: Yeah. That sounds like I don’t know why one would do that.

Emil Eifrem: That sounds like a crazy thing to keep track of.

This is a pretty powerful thing. Out of those 350 databases, developers use them and get value from them. Data scientists – they don’t want to use a database. The only reason a data scientist goes to a database is to get data out of it. They go to the database, not for value, but to get the data out of it, and put it in their normal machine learning tool chain. With exactly one exception out of the 350, one exception, Neo4j. They go to Neo4j to put data into Neo4j to be able to use relationships as a signal into their machine learning.

So we built out an entire new stack called GDS, Graph Data Science, that is built on top of the graph database, that is targeting machine learning and AI driven by data science. This is an entire new motion and persona for us. It’s a very unique thing.

If you think about us, fast forward a couple of years – as a public company, we have a deep developer adoption, an OLTP system of record for these core use cases in the enterprise, as well as being this essential must-have ingredients for any machine learning pipeline out there. In a deep developer community and data science community, that’s a really powerful combination in one company.

Neo4j’s Go-To-Market Strategy

Matt Turck: Yeah. That’s a good place to be.

Just to finish up on that, on your go-to market motion. A lot of companies that we speak with, a lot of people want to do that open source, bottoms up effort. In many ways it feels like you’re wandering through the desert for a long time, because you talk to individual developers that may or may not want – or may or may not have any budget – to buy your product.

At what point did you switch targeting the larger enterprises? At what point did you get a sense that this was working, and what did you do? Did you build a sales force to go after the larger enterprises? At what point do you go from bottoms up to tops down, if ever?

Emil Eifrem: Yeah. I was going to say if ever. On some level, we had a bifurcated approach, where we built the community, and that is the long term focus and the right thing to do and so on and so forth, but then we also went out hand in hand with enterprise sales.

We tried to identify, for these core use cases, where people have a lot of connected data today, not where they will have connected data five years from now, because everything is becoming connected, but today – which are really valuable enterprises willing to pay hundreds of thousands of dollars?

We tried to identify them. We knocked on doors through our own personal network, or our graph, as we like to call it, and sold to that. But that was much more to seed the community, get some of those anchor lighthouse accounts and stuff like that. So we had a bifurcated approach like this in the early days.

About five years ago, probably around the time for Data Driven New York, at that point, we had shifted… Over 85 percent of our ARR back then, and still true today, originate with an individual practitioner. It used to be an individual developer. Now it’s an individual developer or a data scientist who found us, used one of the free SKUs, be it the on-prem community edition or the free tier in the cloud, played around with it, and then over time realized, “Oh, I want to put this in production.”

Then there’s an entire monetization path for them and a PLG path for them in the cloud. Then all kinds of monetization triggers to shift mode to the enterprise edition on the on-prem. That’s all a bottom up motion.

Then we have some air cover. We don’t sell top down ever. We don’t go in and knock on a CIO door and sell top down. We do provide air cover there through GSI, through some of the Gartner quotes. There’s an endless list these days of massive validation for the category as a really deeply strategic investment for any Fortune 500 company. That really helps. But the bottom up way of going to market is still the fundamental way that we take it to market.

Cloud Is the Future

Matt Turck: All right. One last question, since we’re over time, but this is fun. A question from Tony. “Has the cloud changed your addressable customer base compared to the on-prem days?”

Emil Eifrem: Oh, totally. If you think a little bit about what we did in the early days, we broadcasted the value proposition of graph databases towards developers initially, and then more recently to data scientists. Where? Data scientists where? Data scientists and developers everywhere, any geography, any size company, hobbyists, professionals, wherever they were.

Then, because we were in the on-prem world, because I think that was the question – “How’s the cloud changed things?” In the on-prem world, we then monetized a very thin slice of that, which is, specifically, you are at an enterprise company, Global 2000 company. You have a use case that is worth hundreds of thousands of dollars. You have access to that type of budget. You’re in North America and Europe. That’s where we monetized in the on-prem world. So a very thin slice of this broader awareness that we had created.

With the cloud product, of course, all of a sudden, we have a free tier, we have a really cheap, tens of dollars per month type low-end offering, the entire spectrum, all the way up to million dollar mission critical deals for an enterprise that is globally available.

So now all of a sudden, none of those constraints are true. It’s all geographies. It’s all sizes of companies, not just Global 2000, but mid-market and small all the way down to individual developers. That’s a massive TAM expansion just on the developer side. Then you add data scientists on top of that. That’s a really big slice of the overall data pie.

Matt Turck: All right. Wonderful. Well, it’s a quarter past midnight your time.

Emil Eifrem: Right.

Matt Turck: You’re remarkably awake and energetic.

Emil Eifrem: It’s called coffee, my friend.

Matt Turck: Yes, exactly. Well, that seems to be working. This conversation was brought to you by Red Bull and coffee.

All right. This was wonderful. Emil, it’s so cool to see the journey over the last few years.

Emil Eifrem: It’s only just begun, my friend.

Matt Turck: Yeah. It feels like it. It feels like you are tackling a market that was already super large and that’s in the process of becoming gigantic. If it becomes the cornerstone of machine learning, that’s as big a mega trend as it gets. So fantastic progress.

Thanks for coming back and telling us a story. We’ll continue to root for you. Maybe by the next Data Driven, you’ll come back as a public company CEO. That would be a lot of fun.

Emil Eifrem: Sounds like a plan, my friend.

Make sure to follow Emil on Twitter and subscribe to Emil’s blog to stay updated on what he does.

Subscribe to Emil’s Blog Here