Improving Cypher performance

This page covers a number of steps you can take to improve the Cypher performance of your workload.

Cypher statements with literal values

One of the main causes of poor query performance is due to running many Cypher statements with literal values. This leads to inefficient Cypher processing as there is currently no use of parameters. As a result, you don’t benefit fully from the execution plan cache that would occur otherwise.

The following Cypher queries are identical in form but use different literals:

MATCH (tg:asset) WHERE tg.name = "ABC123"
MERGE (tg)<-[:TAG_OF]-(z1:tag {name: "/DATA01/" + tg.name + "/Top_DOOR"})
MERGE (tg)<-[:TAG_OF]-(z2:tag {name: "/DATA01/" + tg.name + "/Data_Vault"})

In cases like this, query parsing and execution plan generation happen multiple times, resulting in a loss of efficiency. One way to solve that is by rewriting the former example as follows:

MATCH (tg:asset) WHERE tg.name = $tgName
WITH tg
UNWIND $tags as tag
MERGE (tg)<-[:TAG_OF]-(:tag {name: tag.name})

By replacing the literal values in the queries with parameters you get a better execution plan caching reuse. Your application needs to place all the values in a parameter list and then you can issue one statement that iterates through them. Making these changes will lead to improvements in execution and memory usage.

Review queries and model

One first action that you can take is reviewing and listing all your Cypher queries. The best starting point is to have a good understanding of the sequence and frequency of the Cypher queries submitted.

Additionally, if the queries are generated by a framework, it is essential to log them in Cypher form to make reviewing easier.

You can also profile a Cypher query by prepending it with EXPLAIN (to see the execution plan without running the query) or PROFILE (to run and profile the query). Read more about profiling a query.

When using PROFILE you may need to run it multiple times in order to get the optimal value. The first time the query runs, it gets a full cycle of evaluation, planning, and interpreting before making its way into the query cache. Once in the cache, the subsequent execution time will improve. Furthermore, always use parameters instead of literal values to benefit from the cache.

Read more about execution plans and see this detailed guide for the steps on how to capture the execution plans

To best interpret the output of your execution plan, it is recommended that you get familiar with the terms used on it. See this summary of execution plan operators for more information.

Index specification

As your data volume grows, it is important to define constraints and indexes in order to achieve the best performance for your queries. For that, the runtime engine will need to evaluate the cost associated with a query and, to get the best estimations, it will rely on already existing indexes. This will likely show whether an index is missing from the execution plan and which one is it. Though in some circunstances it might look like an index is not available or possible, it may also make sense to reconsider the model and create an intermediate node or another relationship type just to leverage it.

Read more about the use of indexes for a more comprehensive explanation.

You can also fine-tune the usage of an index in your query by leveraging it with the USING clause.

Review metrics and instance size

With Aura, you can keep an eye on some key metrics to see which resource constraints your instance may be experiencing. Follow the steps described in Monitoring to check that information.

At this stage, if the key metrics are too high, you may want to reconsider the instance sizing. A resize operation does not cause any downtime, and you would only pay for what you use.

You should always size your instance against your workload activity peaks.

Consider concurrency

Sometimes individual queries may be optimized on their own and run fine, but the sheer volume and concurrency of operations can overwhelm your Aura instance.

To review what is running at any given time (this makes particular sense if you have a long-running query), you can use these statements and list what is running:

Runtime engine and Cypher version

The execution plan should show you the runtime that is selected for the execution of your query. Usually, the planner makes the right decision, but it may be worth checking at times if the other runtimes do not perform better. Read more about query tuning on Cypher runtime.

To invoke the use of a given runtime forcibly, prepend your Cypher statement with:

  • CYPHER runtime=pipelined for pipelined runtime

  • CYPHER runtime=slotted for slotted runtime

  • CYPHER runtime=interpreted for interpreted runtime

If you have a Cypher pattern that is not performing without error, it could as well be running on a prior Cypher version. You can control the version used to interpret your queries by using these Cypher query options.

Network and the cost of the round-trip

With Aura, it is essential to consider the best cloud in your region as the physical distance is a direct factor in the achievable network latency.

When some event causes any network disruption between your application and Aura, you would be affected by round-trip network latency to re-submit a query. With Aura, this is particularly important because you will need to be using transaction functions when connecting your instance to applications.