Fortunately, graph algorithms are up to the challenge.
In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Last week, we explored how data connections drive future discoveries. This week, we’ll take a closer look at Neo4j’s Graph Analytics platform and put its performance to the test.
The Neo4j Graph Analytics Platform
Neo4j offers a reliable and performant native-graph platform that reveals the value and maintains the integrity of connected data.
First, we delivered the Neo4j graph database, originally used in online transaction processing with exceptionally fast transversals. Then we added advanced, yet practical, graph analytics tools for data scientists and solutions teams.
Streamline Your Data Discoveries
We offer a growing, open library of high-performance graph algorithms for Neo4j that are easy to use and optimized for fast results. These algorithms reveal the hidden patterns and structures in your connected data around community detection, centrality and pathways with a core set of tested (at scale) and supported algorithms.
The highly extensible nature of Neo4j enabled the creation of this graph library and exposure as procedures — without making any modification to the Neo4j database.
These algorithms can be called upon as procedures (from our APOC library), and they’re also customizable through a common graph API. This set of advanced, global graph algorithms is simple to apply to existing Neo4j instances so your data scientists, solutions developers and operational teams can all use the same native graph platform.
Neo4j also includes graph projection, an extremely handy feature that places a logical sub-graph into a graph algorithm when your original graph has the wrong shape or granularity for that specific algorithm.
For example, if you’re looking to understand the relationship between drug results for men versus women, but your graph is not partitioned for this, you’ll be able to temporarily project a sub-graph to quickly run your algorithm upon and move on to the next step.
Example: High Performance of Neo4j Graph Algorithms
Neo4j graph algorithms are extremely efficient so you can analyze billions of relationships using common equipment and get your results in seconds to minutes, and in a few hours for the most complicated queries.
The chart below shows how Neo4j’s optimized algorithms yields results up to three times faster than Apache Spark(TM) GraphX for Union-Find (Connected Components) and PageRank on the Twitter-2010 dataset with 1.4 billion relationships.
Even more impressive, running the Neo4j PageRank algorithm on a significantly larger dataset with 18 billion relationships and 3 billion nodes delivered results in only 1 hour and 45 minutes (using 144 CPUs and 1TB of RAM).
In addition to optimizing the algorithms themselves, we’ve parallelized key areas such as loading and preparing data as well as algorithms like breadth-first search and depth-first search where applicable.
As you can see, using graph algorithms help you surface the hidden connections and actionable insights obscured within your hordes of data, but even more importantly, the right graph algorithms are optimized to keep your computing costs and time investment to a minimum. Those graph algorithms are available to you know via the Neo4j Graph Platform – and they’re waiting to help you with your next data breakthrough.
Next week, we’ll explore specific graph algorithms, describing what they do and how they’re used.
Learn about the power of graph algorithms in the O’Reilly book,
Graph Algorithms: Practical Examples in Apache Spark and Neo4j by the authors of this article. Click below to get your free ebook copy.
Get the O’Reilly Ebook
Catch up with the rest of the graph algorithms in Neo4j blog series: