Review: The NoSQL MatrixThe macrocosm of NoSQL databases is a diverse one of which graph databases are only a part. Last week, we toured the three blue quadrants of the matrix below which are collectively known as aggregate stores, including key-value, column family and document stores. This week, we’ll be double-clicking on the equally diverse world of graph database technologies which occupy the green quadrant in the matrix below.
The matrix of NoSQL databases. Quadrants in blue are collectively known as aggregate stores.
The Spectrum of Graph Database TechnologiesWe already walked through a formal definition of a graph database in our first post, but let’s do a quick review. A graph database is an online, operational database management system capable of Create, Read, Update, and Delete (tech lingo: CRUD) processes that operate on a graph data model. There are two important properties of graph database technologies:
- Graph storage Some graph databases use “native” graph storage that is specifically designed to store and manage graphs, while others use relational or object-oriented databases which are often slower.
- Graph processing engine Native graph processing (tech lingo: index-free adjacency) is the most efficient means of processing data in a graph because connected nodes physically “point” to each other in the database. Non-native graph processing engines use other means to process CRUD operations.
Property GraphsProperty graphs are the type of graph database we’ve already talked about most. In fact, our original definition of a graph database was more precisely about a property graph. Here’s a quick recap of what makes a graph database a property graph:
- Property graphs contains nodes (data entities) and relationships (data connections).
- Nodes can contain properties (tech lingo: key-value pairs).
- Nodes can be labeled with one or more labels.
- Relationships have both names and directions.
- Relationships always have a start node and an end node.
- Like nodes, relationships can also contain properties.
HypergraphsA hypergraph is a graph model in which a relationship (called a hyperedge) can connect any number of given nodes. While a property graph permits a relationship to have only one start node and one end node, the hypergraph model allows any number of nodes at either end of a relationship. Hypergraphs can be useful when your data includes a large number of many-to-many relationships. Let’s look at the example below.
OWNSrelationships to express what the hypergraph captured with just one hyperedge. Yet, by using six relationships instead of one, we have two distinct advantages:
- First, we’re using a more familiar and explicit data modeling technique (resulting in less confusion for a development team).
- Second, we can also fine-tune the model with properties such as “primary driver” (for insurance purposes), which is something we can’t do with a single hyperedge.
Triple StoresTriple stores come from the Semantic Web movement and store data in a format known as a triple. Triples consist of a subject-predicate-object data structure. Using triples, we can capture facts such as “Ginger dances with Fred” and “Fred likes ice cream.” Individually, single triples aren’t very useful semantically, but en-masse, they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores are modeled around the Resource Description Framework (RDF) specifications laid out by the W3C, using SPARQL as their query language. Data processed by triple stores tends to be logically linked, thus triple stores are included in the category of graph databases. However, triple stores are not “native” graph databases because they don’t support index-free adjacency, nor are their storage engines optimized for storing property graphs. Triple stores store triples as independent elements, which allows them to scale horizontally but prevents them from rapidly traversing relationships. In order to perform graph queries, triple stores must create connections from individual, independent facts – adding latency to every query. Because of these trade-offs in scale and latency, the most common use case for triple stores is offline analytics rather than for online transactions.
ConclusionJust like for other NoSQL databases, every type of graph database is best suited for a different function. Hypergraphs are a good fit for capturing meta-intent and RDF triple stores are proficient at offline analytics. But for online, transactional processing nothing beats a property graph for a rapid traversal of data connections. Learn more about the diverse world of graph database technologies: Click below to get your free copy of the O’Reilly Graph Databases ebook and discover how to apply graph technologies to mission-critical problems at your enterprise.
About the Author
Bryce Merkl Sasaki , Editor-in-Chief, Neo4j
Bryce Merkl Sasaki is the Editor-in-Chief at Neo4j. He studied professional and creative writing for undergrad and has been freelancing for 7 years. Recently, he worked at an inbound marketing agency in Philadelphia as a copywriter before moving to California. When not working, he likes to spend his time working on his novel, looking for pickup soccer games and reading voraciously.