Capacity Planning Example

Here is a back of the napkin example of capacity planning for a Neo4j workload for the following list of requirements:

Requirements

Requirement Value

Number of total users

100-200 (end users, most likely accessing via front end applications)

Number of visits (read/queries) per day per user

5

Number of Nodes

50-75 MM

Number of Relationships

100 – 150 MM

Number of Properties per Node

Min 1, Max 50, Avg 5

Number # of Properties per Relationship

Min 0, Max: 20, Avg: 2

Average request time

500 ms

Queries per second at peak

200/second

Frequency of batch inserts and updates

4-5 times daily

Batch size assume 10% of volumes provided

~ 20 GB a day , 5 million nodes

Max processing/ingest for delta volumes

One hour

RR

in US + EU AWS

DR

DR In 2 US availability zones

Analysis

1) Estimating an initial database size of about 38GB (see table below) – assuming:

  • 20% for indexes
  • Max # of nodes and relationships with Avg props per node & relations

Number

Bytes/Object

Space(GBs)

Properties subtotal

Nodes

75000000

15

1.048

Relationships

150000000

34

4.750

Props / node

5

41

14.319

Props / rel

2

41

11.455

25.774

Index (percentage)

20

6.314

Total

37.886

2) Assuming daily loads of 5M nodes per day (or 10% – and let’s assume, we need to accommodate future growth of another 50% for the next year.)

3) We then arrive of an estimating about 100GB of total memory per instance [ 5 GB(OS) + 60 GB(data + indexes + 50%growth) + 30GB(Heap) ~ 100GB of total memory ]

4) Lastly, we estimate we will need about 10 CPU cores(or 20 vCPU cores) to accommodate peak demand of 200 queries/second with a response time of 500ms, among a cluster with 3 cores and 2 RRs(see below):

  • Number of concurrent requests per second 200/sec
  • Workload processing time per second=0.50 sec
  • CPU load factor 0.5 ( Assuming CPU’s will be 50% busy on average)
  • Number of instance failures to design for f=1
  • Core count = r x w x c = 200 x .5 x .5 = 50
  • Cluster size = 3 cores + 2 RR
  • Core count per machine = 50/5= 10 (or 20 vCPU cores)
  • Estimate:
    • Cluster configuration (5 Instances) x (20 vCPU cores per instance with 100GB of RAM) ”’
      • Last Modified: 2020-09-23 21:26:58 UTC by Ali Maddahian.
      • Relevant for Neo4j Versions: 3.5.
      • Relevant keywords storage, disk, filesystem, unix, capacity.