Knowledge Base

Capacity Planning Example

Here is a back of the napkin example of capacity planning for a Neo4j workload for the following list of requirements:

Requirements

Requirement Value

Number of total users

100-200 (end users, most likely accessing via front end applications)

Number of visits (read/queries) per day per user

5

Number of Nodes

50-75 MM

Number of Relationships

100 – 150 MM

Number of Properties per Node

Min 1, Max 50, Avg 5

Number # of Properties per Relationship

Min 0, Max: 20, Avg: 2

Average request time

500 ms

Queries per second at peak

200/second

Frequency of batch inserts and updates

4-5 times daily

Batch size assume 10% of volumes provided

~ 20 GB a day , 5 million nodes

Max processing/ingest for delta volumes

One hour

RR

in US + EU AWS

DR

DR In 2 US availability zones

Analysis

1) Estimating an initial database size of about 38GB (see table below) - assuming:

  • 20% for indexes

  • Max # of nodes and relationships with Avg props per node & relations

Number

Bytes/Object

Space(GBs)

Properties subtotal

Nodes

75000000

15

1.048

Relationships

150000000

34

4.750

Props / node

5

41

14.319

Props / rel

2

41

11.455

25.774

Index (percentage)

20

6.314

Total

37.886

2) Assuming daily loads of 5M nodes per day (or 10% - and let’s assume, we need to accommodate future growth of another 50% for the next year.)

3) We then arrive of an estimating about 100GB of total memory per instance [ 5 GB(OS) + 60 GB(data + indexes + 50%growth) + 30GB(Heap) ~ 100GB of total memory ]

4) Lastly, we estimate we will need about 10 CPU cores(or 20 vCPU cores) to accommodate peak demand of 200 queries/second with a response time of 500ms, among a cluster with 3 cores and 2 RRs(see below):

  • Number of concurrent requests per second 200/sec

  • Workload/Query processing time (w=0.50 sec)

  • CPU load factor 0.5 ( c=.5 ; Tha is CPU’s will be 50% busy on average)

  • Number of instance failures to design for F=1

  • Core count = r x w / c = 200 x .5 / .5 = 200

  • Cluster size = 3 cores + 2 RR

  • Core count per machine = 200/5 = 40 (or 80 vCPU cores)

  • Estimate:

    • Cluster configuration (5 Instances) x (80 vCPU cores per instance with 100GB of RAM)