From Good to Graph: Choosing the Right Database [GraphConnect Preview]

Learn about Clark Richey’s Journey to Graph Databases and Why He Chose Neo4j for FactGemFor many of us, it feels like software development has well and thoroughly moved into the NoSQL database era. However, recent studies suggest that adoption is still as low as 20%.

Personally, I joined the NoSQL movement in 2008 when I took a position with database vendor MarkLogic. Since that time I have been a big advocate for NoSQL technologies in all of their various flavors.

While it is certainly true that there are classes of problems that remain very well suited to a relational database system, many of today’s problems are better addressed with NoSQL technologies such as XML, key-value pair, document, and graph databases.

My Own Journey to Graph Databases

When I joined FactGem as Chief Technology Officer in 2013, I was presented with a unique challenge.

Our founder’s vision was to provide an application that would allow users to upload existing content from Excel files and allow them to search the data, visualize and analyze the results. Users needed to be able to do all of that without hiring consultants, having a degree in computer science or any other technical expertise.

As if that wasn’t enough, the application couldn’t know ahead of time what the data would be about, so there couldn’t be a fixed schema. I certainly didn’t have all of the answers on day one, but one thing I knew for sure was that there was no way I could do this with an RDBMS. I had to have a database that didn’t require me to define a schema.

Having just left MarkLogic, I was familiar with that technology and I felt it was an excellent starting point. It would allow me to store information in XML documents without having to pre-define a schema and I could add information as needed.

We built our first proof of concept systems on MarkLogic, and it was good. We had excellent performance when searching on properties and the flexibility to change our schema as needed.

The Key Was Data Relationships

However, during this time period, we came to realize that the truly interesting thing wasn’t so much the properties of the things people were loading (people, products, companies, etc.) but rather the relationships between those things that were of the greatest interest. Our existing database would let us perform those types of queries but it wasn’t really the right tool.

Right around this time, MarkLogic released support for a new database model based on RDF. Without going into tremendous detail here (more detail at GraphConnect), suffice it to say that RDF is designed to support relationships between items as first-class citizens.

A database that supported this model seemed ideal so we made the transition. At first things looked good, but then we hit some roadblocks. We hadn’t found a great solution yet.

Again, I went in search of a database that would meet my needs. This time I knew that not only did I need a high performance, schema-free database, but I needed one that fully embraced data relationships.

This, of course, led me to an evaluation of numerous graph databases. As a result of that evaluation I choose Neo4j for its ease of use, performance, flexibility and, of course, the fact that it allows us to focus on the relationships between entities in the database.

Learn More at my GraphConnect Session

During my talk at GraphConnect San Francisco I will go into greater depth on what other databases I evaluated during this process.

In conclusion, moving our application to Neo4j has turned out to be a great decision. Neo4j has not only met our expectations but exceeded them in many regards, providing us access to capabilities and features that or application has yet to grow into.

We have also learned a lot along the way in terms of best (and worst!) practices for working with Neo4j, which I will share during my talk at GraphConnect. I hope to see everyone there!

Click below to register for GraphConnect San Francisco and join Clark Richey and many other Neo4j graphistas at the only conference that focuses on the rapidly growing world of graph databases.