[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

As a software engineer, I’ve always enjoyed discussing technology with like-minded devs who could offer alternate perspectives. And, like many engineers, I have moments where I really want to rant to an audience about an infuriating encounter I just had with tech or a project I’m working on.

These experiences encouraged my co-founder and I to create devRant: A community especially crafted with the wants and needs of developers in mind.

Learn How devRant Built a Community for Developers Using Neo4j for Their Backend Database


In this article I’m going to explain how Neo4j allows us to rapidly develop our app, scale and analyze at a rate we wouldn’t be able to achieve with a relational database.

The Very Basic Structure of Our Data


The Graph Data Model for devRant Using Neo4j


Above is a very basic diagram of how our data is structured. Even though a number of properties are omitted, the diagram covers a good chunk of our functionality.

Users post “rants,” which contain some text, a timestamp, an optional image property (e.g., if there’s an image attached to the rant – yay memes!), etc. Users can post comments on rants, and they can also upvote rants and comments.

Why We’re Using Neo4j as Our Primary and Only Datastore


We are a two-person team. I do all the backend development and most of the app development, and our other co-founder does all the UX and design work.

We are strong believers in building highly functional and effective MVPs (minimum viable product), but also know how to speed up the process of getting a product to market. Also, as someone who has done a lot of data engineering, I take app performance very seriously and try to build infrastructure with future scalability in mind.

Neo4j provided us with pretty much everything we needed to rapidly build a solid infrastructure for our app, while simultaneously being a datastore that allows us to quickly add features that our users request. Beyond user-facing features, we can also use it to gather interesting data points on user behavior.

Let’s dive a little deeper.

Planning for Scale, Rapid Prototyping and the Relational Nightmare


In the planning stages with a data format as described above, it quickly becomes clear that a relational database would be problematic for a few reasons.

First, in my opinion, there are some glaring scalability issues. To store the data in a relational database, we would need a table for rants, a table for comments and another table for users. Then, we’d need a bunch of JOIN tables: One for users owning rants, one for users owning comments, one for users upvoting/downvoting rants and one for users upvoting/downvoting comments.

In terms of scalability, the piece that worries me the most about this use case is votes on rants and votes comments. Voting is our simplest action and I expect it to be our highest-volume one too.

I would never want to have a case where we had “too many” votes stored (1) to properly and efficiently query them and (2) to continue to store all votes. I see this as a problem with a relational database because with a nice amount of growth, the vote table becomes highly trafficked in regard to both reads and writes (we’re constantly getting info about which user has voted on a rant, vote counts, etc.) and ends up with many awkward queries targeting it for various use cases.

The data format of Neo4j lends extremely well to having only 3 basic types of nodes:
    • rants
    • comments
    • users
and then expressing simple user actions –- like votes – by just using relationships. I like this for scalability because relationships are so cheap (compared to an indexed JOIN table) and the data is easily queryable in an efficient way.

For ease of implementation and rapid development, I like to try to store as few counters as possible (and if scale dictates they are needed, then add them). So when getting rant information, we might do a series of Cypher queries that look something like this:

MATCH (rant:Rant{id:{rant_id}})
RETURN
	r,
	SIZE((rant)<-[:UPVOTED_RANT]-()) as num_upvotes,
	SIZE((rant)<-[:DOWNVOTED_RANT]-()) as num_downvotes,
	SIZE((rant)-[:HAS_COMMENT]->()) as num_comments

We might run a query like that 10 times in a batch to get info about a batch of rants. In a graph database, all of the data for that query can easily be obtained from the rant node. In a relational database like MySQL, a similar query would require 3 subqueries querying JOIN tables for counts. This seems much less scalable and less flexible.

Beyond efficiency and scalability, the flexibility of a graph database allows us to quickly add features. Even seemingly complex features don’t require a bunch of tables to be set up to create a usable feature.

For instance, being able to report a user or a rant is a pretty important functionality. We were able to add this with the simple addition of a relationship between users, as opposed to a new table and another table to join on.

The Neo4j Advantage in a the Rapid Pace of Startup Development


Founding a startup with a tiny, bootstrapped team is inherently difficult. I believe that for many projects, and especially ones like ours, using a graph database provides an excellent foundation for the needs of a rapidly growing technical startup. Using a graph database is providing us a storage engine that is, as previously mentioned, scalable, very flexible for new features and also queryable for some pretty interesting analytics.

As a very basic example, say we wanted to see how many of our users have voted on at least two rants and have commented on at least two rants. We could just run a Cypher query like this (DISTINCT is added for comments because users can comment more than once on a rant, but can’t vote more than once.):

MATCH (user:HexicalUser)
WHERE
SIZE((user)-[:UPVOTED_RANT|DOWNVOTED_RANT]->()) >= 2
WITH user
MATCH (user)-[:HAS_COMMENT]->(:Comment)<-[:HAS_COMMENT]-(rant:Rant)
WITH
	user,
	SIZE(COLLECT(DISTINCT rant)) as c
WHERE c >= 2
RETURN COUNT(user)

In the future, using Neo4j will allow us to take our product to the next level by adding in more complex algorithms surrounding our data. For example, we think it would be useful to cater a user’s content feed to their actual preferences, which we could implement with some collaborative filtering, the user’s interests, etc.

As we continue to improve devRant, one of our main focuses will be growth which I think is the case for many startups our size. Using a flexible database for our backend will allow us to focus on the things we really need in order to build our company while our storage engine allows our technical platform to grow and flourish.


Want to dive into using Neo4j for your next project or app? Take our free online training course, Introduction to Graph Databases and Neo4j, and learn how to use the world’s leading graph database like a pro.

Take the Course

 

Keywords:  


About the Author

David Fox, Co-Founder & Engineering Lead

David Fox Image

David Fox is the co-founder and engineering lead of devRant. He has over 10 years of experience developing high-performance backend systems and working with a large variety of databases alongside massive datasets. David also spoke at GraphConnect NYC in 2013.


1 Comment

Marty says:

Real interesting use of Neo4j in a startup. I am now encouraged to use a graph db as Davis did.
Thanks for the insight

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe

Upcoming Event

 


From the CEO

Emil's Blog


Have a Graph Question?

Stackoverflow
Slack
Contact Us

Share your Graph Story?

Email us: content@neotechnology.com


Popular Graph Topics