Switching From MongoDB to Neo4j


Written by Nick Manning, originally posted on his Blog

Our Startup


Swig is a mobile app (iOS, Android) that helps you explore new drinks and share them with the world. Take a picture of what you’re drinking, tag it with taste tags, share it, earn rewards and gameification points, follow famous mixologists and drink aficionados and search for the best drinks nearby.

Our team consists of two people, me (sole engineer) and my partner Harry (growth + business side).

We started off using: Screenshot 2014-10-07 10.37.46
    • Ruby on Rails
    • Postgres
    • Hosting costs: ~$100/mo
We then switched to:
    • Ruby on Rails
    • MongoDB
    • Redis
    • ElasticSearch
    • Heroku + a bunch of Heroku addons
    • Hosting costs: ~$350/mo (multiple environments)
We now use:
    • Ruby
    • Rack
    • Neo4j
    • Neo4j + Spatial
    • Go
    • Private VPS
    • Hosting costs: ~$60/mo (multiple environments)

Postgres to MongoDB

The effort it took switching to MongoDB was well worth it. I like MongoDB because of it’s flexibility and schemaless design. You could perform reads on large amounts of data very fast. We decided to switch to MongoDB because of these reasons. We could also get new features out the door much faster. It also had a great Rails wrapper, so I never really needed to learn (nor did I want to spend the time) on how to write raw Mongo queries.

In summary, Mongo was:
    • flexible
    • easy to integrate with Rails
    • fast reads from single collections

Issues with MongoDB

After awhile, we started getting more and more users, who posted more and more types of content. Not only uploading drink photos but adding hash tags, venues and friend tags. All this content had to be shown in a feed, based on your posts as well as the posts of who you follow – a typical social app. The performance of feeds started getting worse and worse. MongoDB is not meant to be used as a relational database. We knew this when deciding to switch to this, but we’d take some steps to mitigate performance issues:

Screenshot 2014-10-07 10.39.58

Solution #1, Denormalize

We addressed performance issues by dumping all posts into one single, time ordered Mongo collection. Nothing fancy, but the problem there was that our codebase and data model increased in complexity. Which sucks if you’re the only developer on the team. It sucks because you’re effectively managing two schemas, the original schema and then another collection with copied, flattened data. It sucks more when you have production issues you have to troubleshoot while trying to crank out new features.

Solution #2, Denormalize + Cache with Redis

I love Redis. It’s a non-bloated technology (a huge plus) with great documentation (another huge plus for startups) and a great community for support. We started using it for caching user-specific news feeds. It made our feeds load super fast. We then used it in many more areas in the app (gameification/leaderboards, other feeds, etc).

Complexity Kills

Our codebase became more and more complex now with a denormalized database and Redis. See, the downside to Redis is that if you don’t use it in a straightforward manner, when you do find a bug, reproducing then troubleshooting the issue is time consuming. Also, the more places you use Redis, the more code you have to manage. Code which decides when and how to cache data and when to invalidate the cache.

Complexity kills productivity, even if you’re just one developer. For a feature rich social network, managing all types of caches is royal pain in the ass.

Enter Neo4j

I started learning about Neo4j last year and realized the it was a great choice for social networks. So I decided to roll up my sleeves and switch our entire codebase to use Neo4j instead of MongoDB (the process being the topic of another blog post in the near future).

  1. We needed to quickly iterate and produce new features. Neo4j is a schemaless database. So, like Mongo, it was easy to alter and build our database schema. Pure flexibility.
  2. Like other startups, our data was diverse and interconnected. If your startup idea sounds simple now, wait a few months, it will soon have some social networking aspect to it coupled with 3rd party data you’re going to have to import/integrate. This is what our startup went through. More and more joins, more frequently. For Neo4j, relationships are first class citizens, so loading something like a news feed was easy and quite fast.
  3. We wanted a simple persistence layer and our schema got much simpler. Making complex queries became straightfoward because our data model was so simple. No denormalization. We removed Redis. We removed ElasticSearch. Our codebase significantly shrank.

Summary

We now use Neo4j as our single database and it works brilliantly. Hosting is easy, backup/restores are straightforward and our codebase is clean and simple because we have a clearly defined data model without relying on denormalizing or complex caching mechanisms.

Eventually, once our app grows, we will have to use more technologies for search indexing or caching but for a modestly sized user base (we’re no Facebook of course), Neo4j works perfectly well on its own.

I think investing time in learning how to really leverage a graph database is a major asset for any full stack engineer. It’s easy to use for simple projects and if your project grows, it can cope with complexity and perform well.

(More blog posts to come on how I migrated our codebase as well as how we’re hosting Neo4j.)


Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases and get started building your own graph database application today.

Get My Free Copy