When designing your solution, some of your first considerations will concern your functional requirements and the type of technology choices you make to meet them. Some of those functional requirements likely will include a need to scale to many concurrent users, maintain consistent uptime, or the ability to recover from a system failure and maintain availability. These are important production related questions that help drive your technical decisions and can ultimately guide you to choose to cluster Neo4j.
This covers four major advantages of using Neo4j clustering:
Clustering Neo4j allows you to distribute read workload across a number of Neo4j instances. You can take two approaches to scaling your reads with Neo4j:
Neo4j’s clustering architecture replicates the entire database to each instance in your cluster. Therefore you are able to direct any read from your application to any instance without much concern for data locality.
This is sometimes referred to as "cache sharding". The strategy simply allows you to take advantage of natural partitions in your data to direct reads to particular instances where the system will already have those datasets in memory. This approach is significantly beneficial when your total active dataset is much larger than can fit in memory in any particular instance.
A significant and fundamental functional requirement for any service or application is the requirements for overall availability. Very often this question is answered more by the demands of the users, the times they would be interacting with the solution, the impact downtime would have on the business or users of the system to complete their roles, or the financial impact of a system failure. These are not always customer-facing solutions and can be critical internal systems.
Availability can often be addressed with various strategies for recovery or mirroring. However, Neo4j’s clustering architecture is an automated solution for ensuring Neo4j is consistently available to your application and end-users.
Disaster recovery, in general terms, defines your ability to recover from major outages of your services. The most common example is whole-datacenter outages where many services are disrupted. In these cases a disaster recovery strategy can define a failover datacenter along with a strategy for bringing services back online.
Neo4j clustering can accommodate disaster recovery strategies that require very short-windows of downtime or low tolerances for data loss in disaster scenarios. By deploying a cluster instance to an alternate location, you have an active copy of your database up and available in your designated disaster recovery location that is consistently keeping up with the transactions against your database.
Your application needs to access data for its purposes. It reads data, writes data, and is generally keeping your application service or end-users happy. Then comes the analytics team that wants to collect and aggregate data for their reports. Next thing you know, you have a set of long-running compute queries running against your production databases and disrupting your service or end-users' happiness.
You can’t avoid servicing the needs of the analytics requests, but you can box in the impact their queries have on your service. Neo4j clustering can be used to include separate instances entirely in support of query analytics, either from end users or from BI tools. Using clustering means the data is always up to date for analytics queries as well.