Lean Graph Data Models Drive Fast Innovation: A Fireside Chat with David Fox, Senior Software Engineer at Adobe


“For overhauling our infrastructure in Cassandra, we went from a 50 terabyte Cassandra dataset to a 50 gigabyte Neo4j dataset and a lot of people are very surprised by that. And that’s what attracts the most questions and attention,” said David Fox, Senior Software Engineer at Adobe.

In this fireside chat interview (conducted at GraphTour NYC) David Fox speaks with Greta Workman, Marketing Programs Manager, Neo4j, about how he brought Neo4j to the Behance social media platform at Adobe, and his vision for the future of Neo4j.



Greta Workman: How did you originally find Neo?

David Fox: In 2013 I worked for an online dating company. There were a lot of use cases in dating at that point. We had a use case where our users had a lot of Facebook-connected data and they were looking to make matches with other users.

We wanted to be able to show friends of friends, mutual connections with the people you were looking at within the app. And then we wanted to take it out another degree, because there weren’t enough immediate connections, so we wanted to go out to friends of friends of friends, another level.

So we were looking for ways to pull off that use case way back then. And one of my colleagues found Neo4j; we actually work together at Adobe now. We got into it and saw it was a really powerful tool for that use case.

You were the one to bring Neo4j into Adobe, so can you talk a little about your initial implementation?

We had an activity feed for Behance, which was our main logged-in user experience and we were running it on Cassandra when I came into the company. And from the beginning of that implementation, they had some scaling and performance issues. And when I was introduced to the project, I saw a possible graph use case. So it took a couple of years but we ended up implementing Neo4j for better infrastructure and cost savings.

You’ve got a couple of other implementations where you’ve used Neo4j. Can you talk a little bit more about that?

We implemented Neo4j to replace Cassandra and we were really happy with the results. Slowly, our product team started coming up with ideas they wanted to execute and asking if they were possible. We said yes, the graph could handle this case.

And one of those early use cases was Work in Progress. And the product team wanted to be able to recommend a larger subset of that content to specific users. So we had a lot of the data that we needed for an algorithm to do that in Neo4j already.

To execute that, we added a little more data and we were able to pull off a nice recommendation algorithm for Work in Progress. We saw a 20% increase in people interacting with recommended Work in Progress pages as opposed to people who didn’t see recommended Work in Progress, so that was a successful feature.

Recently this summer, we overhauled our logged-in user experience. We now have this nice continuous feed where we merged all these different views of content, and it’s really a well thought-out presentation of data that prioritizes what our users see.

At what point during the process did you realize that you could have multiple uses for Neo4j and drive innovation?

We brought it in knowing it could specifically address this use case and then naturally the other features and use cases started to start to surface themselves. So Product would ask for things and then we would say, “Well, that’s a good use case for the graph we already have and we could probably execute it quickly.” So we saw that happen a number of times on these projects, which was pretty cool.

Check out this fireside chat with David Fox, Senior Software Engineer at Adobe.

What was it about Neo4j that led you to choose it as your database for these projects?

So for overhauling our infrastructure in Cassandra, it was really how lean our graph database models could be. I always say that we went from a 50 terabyte Cassandra dataset to a 50 gigabyte Neo4j dataset, and a lot of people are very surprised by that. Even though I like to talk about other things, that attracts the most questions and attention.

It’s a surprising number. It’s like, wait, where did all your data go?

Yes, it just shows that if you have repeating data and apps that store every piece of data for users and then you go to a model where there’s pretty much no repeating data, that’s what you see in a lot of cases.

You’re the one who brought Neo4j into Adobe. What was the buy-in process like?

Adobe is obviously a huge tech company. We have a lot of teams all over the place. One of the first things was finding out if anybody was using Neo4j. And people had played with it in the past. We have a wiki where we can see everything that’s happened historically. Nobody was using it in production at the time.

The actual pitch process was pretty difficult because in general, I think tech companies and developers are sometimes hesitant to use new technologies, especially when it comes to expressing data. People tend to become complacent with relational. The project really blossomed when I built a proof of concept on my laptop with the actual data we would be using and showed them, saying, “Hey, look this is performant. It uses exponentially less data.”

Very cool. What was the development like? What was the implementation like?

It was a good process. We had a small team to implement it. I did most of the actual development, and we had a DevOps team that did most of the actual provisioning of the infrastructure. It was a pretty straightforward process. One of the things that I’ve prided myself on was we had zero downtime, when we actually cut over.

You had a large team that had to learn graph. What was that learning curve like?

I found that people might not know anything about graph or Cypher but it’s pretty intuitive. We’re working with really great developers and they’re able to pick up on it really quickly. Our research team who had no Neo4j experience built the recommendations use case, and they were able to learn it really quickly.

And so were there any surprises in the process? Was that in the process? Or maybe after implementation? Are there any surprising things that Neo4j allowed you to do? Or gave you the ability to do?

Even though I knew how well it works, I was kind of surprised by how intuitive the innovation on top of it was and how the use cases came naturally to us just by saying, we have this thing Product wants and we think it’s going to make the user experience better; do we have a way to do it? And the answer seems to be yes a lot with our graph dataset.

Another thing I touched on before is how people who have no experience with Neo4j are able to pick it up pretty quickly if they’re skilled developers and skilled Ops people.

Some of the Ops people who work on our Neo4j infrastructure have taken it to a really cool level. We run it really effectively; we have a whole backup strategy with backup every hour, and we have automated restores that test to make sure everything is still working. So we have a really nice process, and that was developed by a DBA I work with. They took to it really quickly.

How big is the team that’s working on Neo?

Some of the immediate team that actually touches it, I would say is only about we have about three Ops people who occasionally touch it and then we have a DBA, my colleague who’s done a lot of the backup strategy and some really cool stuff there.

What’s next?

We have some cool use cases in development. One of them is moving user-facing statistics from a legacy system to Neo4j.

Another one is in our search experience. We want to add suggestions of tags that might be like the ones they’ve chosen or that enhance their search experience and we think collaborative filtering that we have in the database can be really powerful for that, so that might go in pretty soon.

Want to share about your Neo4j project in a future fireside chat or 5-Minute Interview? Drop us a line at content@neo4j.com


Read this white paper The Power of Graph-Based Search, and learn to leverage graph database technology for more insight and relevant database queries.

Discover Graph-Based Search