Excerpt from article written by Jim Webber, Chief Scientist, Neo4j for Database Trends and Applications.

Graph databases are increasingly popular. In fact, according to DB-Engines graphs are the fastest growing of any database category since 2013. This growth is fueled in part because many organizations are realizing the value of understanding connections in their data. For companies looking to use a graph database to build behavior and decision-making applications based on real-time evaluation of connected data, there are several key attributes, including integrity, performance, efficiency and scalability.

If all databases are not created equally, which graph database is best for your solution? Fortunately we have experience to draw upon that can guide us toward a pragmatic technology investment. Primary amongst these is the native and non-native design decision of the database management system.

As the name suggests, native graph databases are those specifically built to handle graph workloads across the entire computing stack. The alternative – non-native, comes in two types: 1. those that layer a graph API on top of an existing, native-to-other kind of database management system and 2. those that claim multi-model semantics where one system can purportedly support several data models.

There is a considerable difference between the architecture of native graph storage and querying, compared to non-native. Predictably, native tends to perform queries faster, scale better (retaining their query speed as datasets grows in size), and run more efficiently (even upon less hardware).

Why Native?

A native graph database is distinguished by an exclusive preference to serve graph workloads across its entire stack. That stack – from query language through to the database management engine and file system considerations, and from clustering to backup and monitoring – epitomizes graph thinking throughout.

The native graph database ensures that end-user application developers can work with the graph productively and humanely. It also needs to ensure that your precious data is safe and that the system as a whole is dependable. To achieve all of this, it must optimize every layer of its stack for graphs – no responsibility is abdicated to non-graph native software. As such, components of the native graph database are continuously “graph-affined” as hardware trends emerge and evolve, because each component in the architecture must make sure that graph workloads run efficiently and safely on that hardware.

Native Graph Storage

Graph storage refers to the underlying structure of connected data persisted (often, but not always) on disk. When the storage system is built specifically for graph data, it’s known as native graph storage.

Native graph databases are designed to use the file system in a way that understands and is sympathetic to graphs, which means it is both highly performant and safe for graph workloads. For example a traversal across a relationship in such a database has constant cost irrespective of the size of the graph and that constant cost is minimal because of mechanical sympathy between the software and hardware.

Conversely, graph storage is non-native when it is optimized for any other storage model To translate columnar, relational, document, or key-value data as a graph, the database management system has to perform costly translations to and from the the primary model of the database. While implementers can try to amortize these translations through radical denormalization, this non-native approach typically leads to high latency when querying graphs. It also has very well-understood safety risks when persisting graph data – risks which radical denormalization exacerbates.

The disconnect between graph data with non-graph storage is problematic for both performance and scalability. Our research and development experience indicates that the the only way to ensure data safety is to update the graph via ACID transactions. Maintaining relationships between records is far more demanding than weaker-than-ACID consistency models can provide.

Native graph databases include transactional mechanisms to ensure that data safety remains impervious to network blips, server failures, and even contention from competing transactions or scaling decisions. Non-native graph architectures, especially the variants that are built on eventually consistent stores, can (and will eventually) corrupt graph data.

Furthermore, native storage allows for implementations of the evolving hardware architectures of tomorrow. As memory and disk technology evolves, a native graph database implementation evolves to support ever more ambitious graph workloads. In coming years we fully expect to see the emergence of native storage models for novel disk storage platforms and memory architectures like non-volatile RAM.

Read the full article →