Dzone interviews Philip Rathle, Senior Product Director at Neo Technology, about GraphConnect and the latest release of Neo4jWith the release of a new version of the open source graph database, Neo4j, and the fast-approaching Graph Connect conference (the first EVER graph database-focused conference, btw), we thought it’d be a good idea to talk to a couple of leaders in the graph database space at Neo Technology, the key commercial backer behind Neo4j.
- Zero-downtime rolling upgrades in HA clusters, for nicer administrative ops
- Streamed responses to REST API requests, for faster remote access
- Bi-directional traversals, branch state and path expanders in the traversal framework, for even faster queries
- Support in the Cypher language for writing graph data and updating auto-indexes, see above 😉
- Support for explicit transactions in neo4j-shell, on the command line and through the web
“The 1.8 version improves upon the world’s leading graph database, adding lots of features and enhancements that make Neo4j 1.8 the fastest and most robust database we’ve ever shipped, ” said Rathle. Further on we discussed how there had been a lot of ‘bashing’ posts around NoSQL databases over the past few years, especially targeted towards MongoDB. I asked him if he’d seen similar unfair/user-error-based criticisms of Neo4j. He’s seen it only rarely. These were the main points it boiled down to from Philip’s perspective:
- 95% of the use cases for Neo4j will work smoothly out of box.
- For about 5% of use cases he suggested reaching out to with Neo Technology to see whether you need app level sharding, clustering, or other techniques depending on what you’re trying to scale.
NoSQL DistilledNeo4j was featured in Martin Fowler’s new book, “NoSQL Distilled” and I asked Philip about his thoughts on the book. “It’s a great book to understand what NoSQL is all about, what the pieces are, and how they fit,” said Rathle. “But it’s more than simply a book that NOSQL developers should read. It’s an important book for any developer, period, because we’re all dealing with persistence in some form or another.” He said the book drew a clear distinction between aggregate oriented databases such as KV stores, doc stores, and column stores, which are optimized for atomic intelligence, and then graph databases, which are optimized for understanding data connections. Martin Fowler is also noticing the movement toward polyglot persistence (relational and non-relational databases working together in a single system). I asked Philip for a few examples of companies using Neo4j with other databases, and he mentioned a few: Polyglot persistence usually comes about in one of two ways: 1) Someone with an existing system, often relational, suddenly finds that the connectedness of the data and queries is such that the existing system can’t perform fast enough in real time. While a few customers have replaced the entire system with Neo4j, this is usually an extreme solution, and isn’t as cost effective as moving just the parts of the system that are highly connected. 2) Someone is building a new system, and recognizes that the data in that system fits into distinct categories, for example: huge volumes of simple time-series data that don’t need to be inter-related, giant multimedia files, and then something closely knit and highly interconnected. As the data volumes grow and SLAs become more rigorous, it starts to make sense to store data in a place that’s optimized for that type of data. In this example, you might use something like Cassandra, S3, and Neo4j. A few examples: Telenor, one of the world’s 10 largest telcos, replaced part of a Sybase application with Neo4j for hierarchical queries that needed to run very fast, but kept much of their existing database around. On the other hand, we have a life sciences customer who provides gene sequencing as a service and uses an Amazon S3-like filesystem to store gene sequences, which are large binary files, and Neo4j to store and relate metadata about the genome.
Graph Theory in other areas of lifeThe reason why graph databases are so successful and growing in popularity right now is because many businesses are evolving beyond atomic intelligence, and making huge competitive gains by leveraging connected intelligence. Graph databases are the best way to do this. Interestingly, Euler’s Graph Theory, nearly 300 years old, has made enormous impacts on mathematics and the sciences, and has long been proven as a powerful and accurate model for describing many things in nature. Only recently have graphs been used as the basis for a database management system, and the opportunities are just beginning to unfold. As for graphs and what you can model with them, Philip had several examples:
- The Human Brain – The most powerful device in the known universe (as Philip described it) is made up of neurons and synapses, which are directly comparable to the nodes and edges of graph models
- Geography/cartography – The mathematical act of path-finding was the way in which Euler first surmised Graph Theory
- Relationships – Between people, classifications (ontologies), and almost anything being compared and connected.
- Network management – This can overlap with relationships, but we’re talking about machine networks mainly.