The Diverse World of NoSQL DatabasesNoSQL databases are a spectrum of data storage technologies that are more varied than they are similar so it’s difficult to make sweeping generalizations about their characteristics. In the following weeks, we’ll explore a few types of NoSQL databases. Our tour will encompass the group collectively known as aggregate stores (highlighted in blue below), including key-value stores, column family stores and document stores as well the various types of graph databases (in green), which include property graphs, hypergraphs and RDF triple stores.
An overview of the NoSQL database space. Quadrants in blue are collectively known as aggregate stores.Historically, most enterprise-level web applications ran on top of a relational database (RDBMS). But in the past decade alone, the data landscape has changed significantly and in a way that traditional RDBMS deployments simply can’t manage. The NoSQL database movement has emerged particularly in response to three of these data challenges:
- Data Volume
- Data Velocity
- Data Variety
Data VolumeIt’s no surprise that as data storage has increased dramatically, data volume (i.e., the size of stored data) has become the principal driver behind the enterprise adoption of NoSQL databases. Large datasets simply become too unwieldy when stored in relational databases. In particular, query execution times increase as the size of tables and the number of joins grow (so-called join pain). This isn’t always the fault of the relational databases themselves though. Rather, it has to do with the underlying data model. In order to avoid joins and join pain, the NoSQL world has several alternatives to the relational model. While these NoSQL data models are better at handling today’s larger datasets, most of them are simply not as expressive as the relational model. The only exception is the graph model, which is actually more expressive. (More on that in the weeks to come.)
Data VelocityBut volume isn’t the only problem modern web-facing systems have to deal with. Besides being big, today’s data often changes very rapidly. Thus, data velocity (i.e. the rate at which data changes over time) is the next major challenge that NoSQL databases are designed to overcome. Velocity is rarely a static metric. A lot of velocity measurements depend on the context of both internal and external changes to an application, some of which have considerable system-wide impact. Coupled with high volume, variations in data velocity require a database to not only handle high levels of edits (tech lingo: write loads), but also deal with surging peaks of database activity. Relational databases simply aren’t prepared to handle a sustained level of write loads and can crash during peak activity if not properly tuned. But there’s also another aspect of data velocity NoSQL technology helps us overcome: the rate at which the data structure changes. In other words, it’s not just about the rapid change of specific data points but also the rapid change of the data model. Data structures commonly shift for two major reasons. First is the fast-moving nature of business. As your enterprise changes, so do your data needs. Second is that data acquisition is often experimental. Sometimes your application captures certain data points just in case you might need them later on. The data that proves valuable to your business usually sticks around, but if it isn’t worthwhile, then those data points often fall by the wayside. Consequently, these experimental additions and eliminations affect your data model on a regular basis. Both forms of data velocity are problematic for relational databases to handle. Frequently high write loads come with expensive processing costs, and regular data structure changes come with high operational costs. NoSQL databases address both data velocity challenges by optimizing for high write loads and by having flexible data models.
Data VarietyThe final challenge in today’s data landscape is data variety – that is, it can be dense or sparse, connected or disconnected, regularly or irregularly structured. Today’s data is far more varied than what relational databases were originally designed for. In fact, that’s why many of today’s RDBMS deployments have a number of nulls in their tables and null checks in their code – it’s all to adjust to today’s data variety. On the other hand, NoSQL databases are designed from the bottom up to adjust for a wide diversity of data and flexibly address future data needs.
ConclusionRelational databases can no longer handle today’s data volume, velocity and variety. Yet, understanding how NoSQL databases overcome these challenges is only the prelude of finding the right database for your enterprise. In the coming weeks, we’ll explore the strengths and weaknesses of various NoSQL technologies so you can make the most informed decision possible. Broaden your horizons on the world of graph technologies: Click below to get your free copy of the O’Reilly Graph Databases ebook and discover how to apply graph databases to mission-critical problems at your enterprise.
About the Author
Bryce Merkl Sasaki , Editor-in-Chief, Neo4j
Bryce Merkl Sasaki is the Editor-in-Chief at Neo4j. He studied professional and creative writing for undergrad and has been freelancing for 7 years. Recently, he worked at an inbound marketing agency in Philadelphia as a copywriter before moving to California. When not working, he likes to spend his time working on his novel, looking for pickup soccer games and reading voraciously.