Capacity Planning Example

Here is a back of the napkin example of capacity planning for a Neo4j workload for the following list of requirements:

Requirements

Requirement	Value
Number of total users	100-200 (end users, most likely accessing via front end applications)
Number of visits (read/queries) per day per user	5
Number of Nodes	50-75 MM
Number of Relationships	100 – 150 MM
Number of Properties per Node	Min 1, Max 50, Avg 5
Number # of Properties per Relationship	Min 0, Max: 20, Avg: 2
Average request time	500 ms
Queries per second at peak	200/second
Frequency of batch inserts and updates	4-5 times daily
Batch size assume 10% of volumes provided	~ 20 GB a day , 5 million nodes
Max processing/ingest for delta volumes	One hour
RR	in US + EU AWS
DR	DR In 2 US availability zones

Requirement

Value

Number of total users

100-200 (end users, most likely accessing via front end applications)

Number of visits (read/queries) per day per user

Number of Nodes

50-75 MM

Number of Relationships

100 – 150 MM

Number of Properties per Node

Min 1, Max 50, Avg 5

Number # of Properties per Relationship

Min 0, Max: 20, Avg: 2

Average request time

500 ms

Queries per second at peak

200/second

Frequency of batch inserts and updates

4-5 times daily

Batch size assume 10% of volumes provided

~ 20 GB a day , 5 million nodes

Max processing/ingest for delta volumes

One hour

in US + EU AWS

DR In 2 US availability zones

Analysis

1) Estimating an initial database size of about 38GB (see table below) - assuming:

20% for indexes
Max # of nodes and relationships with Avg props per node & relations

	Number	Bytes/Object	Space(GBs)	Properties subtotal
Nodes	75000000	15	1.048
Relationships	150000000	34	4.750
Props / node	5	41	14.319
Props / rel	2	41	11.455	25.774
Index (percentage)	20		6.314
Total			37.886

2) Assuming daily loads of 5M nodes per day (or 10% - and let’s assume, we need to accommodate future growth of another 50% for the next year.)

3) We then arrive of an estimating about 100GB of total memory per instance [ 5 GB(OS) + 60 GB(data + indexes + 50%growth) + 30GB(Heap) ~ 100GB of total memory ]

4) Lastly, we estimate we will need about 10 CPU cores(or 20 vCPU cores) to accommodate peak demand of 200 queries/second with a response time of 500ms, among a cluster with 3 cores and 2 RRs(see below):

Number of concurrent requests per second 200/sec
Workload/Query processing time (w=0.50 sec)
CPU load factor 0.5 ( c=.5 ; Tha is CPU’s will be 50% busy on average)
Number of instance failures to design for F=1
Core count = r x w / c = 200 x .5 / .5 = 200
Cluster size = 3 cores + 2 RR
Core count per machine = 200/5 = 40 (or 80 vCPU cores)
Estimate:
- Cluster configuration (5 Instances) x (80 vCPU cores per instance with 100GB of RAM)

Is this page helpful?

Knowledge Base

Capacity Planning Example

Requirements

Analysis