Knowledge Base

Warm the cache to improve performance from cold start

Note: For 3.5.x forward the details below are no longer applicable as Neo4j will keep record of what is in the pagecache at all times and upon restart of Neo4j the pagecache will be auto-warmed with the data which was previously in the pagecache. In 3.5.x forward the logs\debug.log will report similar to

2020-12-14 21:00:08.486+0000 INFO [o.n.k.i.p.PageCacheWarmer] Page cache warmup started.
2020-12-14 21:00:08.659+0000 INFO [o.n.k.i.p.PageCacheWarmer] Page cache warmup completed. 441 pages loaded. Duration: 173ms.

Note: For Neo4j 2.3+ there is no object cache anymore, so this warms up the page-cache which maps the Neo4j store files into memory.

You may find that some queries run much faster the second time they run. This is because on cold boot, a server node has nothing cached yet, and needs to go to disk for all records. Once some/all of the records are cached, you will see greatly improved performance.

One technique that is widely employed is to "warm the cache". At its most basic level, we run a query that touches each node and relationship in the graph. Assuming the data store can fit into memory, this will cache the entire graph. Otherwise, it will cache as much as it can. Give it a try and see how it helps you!

Cypher (Server,Shell)
MATCH (n)
OPTIONAL MATCH (n)-[r]->()
RETURN count(n.prop) + count(r.prop);

In the above example the reference to count(n.prop) + count(r.prop) is used so as to force the optimizer to search for a node/relationship with a property named 'prop'. Replacing this with count(*) would not be sufficient for it would not load all of the node and relationship properties.

Embedded (Java):
@GET @Path("/warmup")
public String warmUp(@Context GraphDatabaseService db) {
  try ( Transaction tx = db.beginTx()) {
    for ( Node n : GlobalGraphOperations.at(db).getAllNodes()) {
      n.getPropertyKeys();
      for ( Relationship relationship : n.getRelationships()) {
        relationship.getPropertyKeys();
        relationship.getStartNode();
      }
    }
  }
  return "Warmed up and ready to go!";
}

With 3.0 forward and the inclusion of APOC one can now warm up the cache by running the stored procedure

CALL apoc.warmup.run()

This can help in many ways. Aside for pure performance improvement, it can also help alleviate upstream issues resulting from lagging queries. For example if the nodes are busy, and your load balancer/proxy has a very short timeout, it can appear that the cluster is not available initially, if none of the graph is in memory yet. If the cache is warmed, the short timeout shouldn’t be a concern on a cold cluster start.