Excerpt from Infoworld article, Review: Neo4j supercharges graph analytics

Neo4j graph analytics and graph algorithms

Graph analytics and graph algorithms help you to understand the organization and dynamics of complex systems. These can be applied globally to discover the overall nature of networks and model the behavior of intricate systems, and locally — possibly in real time — to provide a focused view of relationships between specific data points, as shown in the figure below.

Neo4j provides five path-finding and traversal algorithms including parallel depth-first and breadth-first searches, four centrality algorithms including PageRank, and six clustering algorithms including Louvain Modularity. Louvain Modularity is often used for fraud ring detection.

Neo4j performance and scalability

While benchmarking Neo4j in a meaningful way is not really possible for me as a reviewer, the company provided several metrics based on its own tests and on customer experience. For example, Neo4j Inc. has compared the performance of the Union-Find and PageRank algorithms in Neo4j and Apache Spark GraphX. The data set contained 1.47 billion relationships and 41.65 million nodes extracted from Twitter. Neo4j outperformed GraphX by roughly a factor of two on Union-Find and roughly a factor of four on PageRank, using clusters of 128 CPUs.

In a customer deployment, Neo4j replaced an Oracle RAC cluster to calculate optimum room pricing for Marriott Hotels and demonstrated 10 times the transaction rate on half the hardware. The Neo4j system at Marriott can perform 300 million pricing operations per day.

Every node in a Neo4j high availability cluster contains the database and a cluster management component, and the cluster can be accessed through a load balancer. The full graph is replicated to each instance of the cluster, and the read capacity of each HA cluster increases linearly with the number of server instances. Neo4j can commit tens of thousands of writes per second while maintaining fully ACID transactions.

In a Neo4j causal cluster, a new Neo4j Enterprise feature, a core cluster of read-write servers is combined with one or more asynchronously updated clusters of read replicas. Any application is guaranteed causal consistency, meaning that it is guaranteed to read at least its own writes, even when hardware and networks fail. The read replicas in a causal cluster may be geographically distributed to improve query performance for users near the replicas.

 

Keywords: