Using Query Best Practices

About this module

You have learned most of the basics for using Cypher to query the graph. As you gain more experience with Cypher and use it to create your application graph, there are some best practices that you can use. Next, you will learn some of these best practices for using Cypher for your application.

The training course, Cypher Query Tuning in Neo4j 4.x goes into greater depth about Cypher best practices and query tuning.

At the end of this module, you will be able to:

  • Use parameters in your Cypher statements.

  • Analyze Cypher execution.

  • Monitor queries.

Cypher parameters

Most deployed applications that use Neo4j have client code written in Java, Javascript, Python, etc. In a deployed application, you must not hard code values in your Cypher statements.

As you test your Cypher statements, you will use a variety literal values to ensure that your Cypher queries are correct. But you don’t want to change the Cypher statement every time you test. In fact, any change to a Cypher statement requires a recompilation of the Cypher code which is expensive. You create Cypher statements what will not change, except for the substitution of placeholders (parameters) in the query. A best practice is to parameterize values in your Cypher statements.

Using Cypher parameters

In your Cypher statements, a parameter name begins with the $ symbol.

Here is an example where we have parameterized the query:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName
RETURN m.released, m.title ORDER BY m.released DESC

At runtime, if the parameter $actorName has a value, it will be used in the Cypher statement when it runs in the graph engine.

Setting a parameter

You can set values for Cypher parameters that will be in effect during your session.

You can set the value of a single parameter in the query editor pane as shown in this example where the value Tom Hanks is set for the parameter actorName:

:param actorName => 'Tom Hanks'
You can even specify a Cypher expression to the right of => to set the value of the parameter.

Here is the result of executing the :param command:

SetActorNameParameter

Notice here that :param is a client-side command. It takes a name and expression and stores the value of that expression for the name in the session.

Using the parameter

After the actorName parameter is set, you can run the query that uses the parameter:

UseParameter

Subsequently, you need only change the value of the parameter and not the Cypher statement to test with different values.

After we have changed the actorName parameter to Tom Cruise, we get a different result with the same Cypher query:

TomCruiseParameter

Setting multiple parameters

You can also use the JSON-style syntax to set all of the parameters in your Neo4j Browser session. The values you can specify in this object are numbers, strings, and booleans. In this example we set two parameters for our session:

:params {actorName: 'Tom Cruise', movieName: 'Top Gun'}

With the result:

SetAllParameters

Using multiple parameters

Here is a different query that uses both of these parameters:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName AND m.title = $movieName
RETURN p, m

With the result:

TomCruiseAndTopGun

If you want to remove an existing parameter from your session, you do so by using the JSON-style syntax and exclude the parameter for your session.

If you want to clear all parameters, you can simply type:

:params {}

Viewing parameters

If you want to view the current parameters and their values, simply type :params:

ViewParams

Analyzing queries

The Movie graph that you have been using during this course is a very small graph. As you start working with large datasets, it will be important to not only add appropriate indexes to your graph, but also write Cypher statements that execute as efficiently as possible. This is a combination of good graph data modeling and query tuning.

Graph data modeling is taught in the the course Graph Data Modeling for Neo4j. Query tuning is taught in the course Advanced Cypher.

There are two Cypher keywords you can prefix a Cypher statement with to analyze a query:

  • EXPLAIN provides estimates of the graph engine processing that will occur, but does not execute the Cypher statement.

  • PROFILE provides real profiling information for what has occurred in the graph engine during the query and executes the Cypher statement.

Using EXPLAIN

The EXPLAIN keyword provides the Cypher query plan. A Cypher query plan has operations where rows are processed and passed on to the the next operation (step). You can compare different Cypher statements to understand the stages of processing that will occur when the Cypher executes.

Example: Setting parameters

Here is an example. We have set the actorName and year parameters for our session:

:params {actorName: 'Hugo Weaving', year: 2000}

Here are the parameter settings:

ActorYearParams

Example: Using EXPLAIN

Then we execute this Cypher code:

EXPLAIN MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName AND
      m.released <  $year
RETURN p.name, m.title, m.released

Here is the query plan returned:

EXPLAIN

Notice that the query plan involves a sequence of steps. Rows are retrieved from the graph and are passed on to subsequent steps.

You can expand each phase of the Cypher execution to examine what code is expected to run. Each phase of the query presents you with an estimate of the number of rows expected to be returned. With EXPLAIN, the query does not run, the graph engine simply produces the query plan.

Expanding the steps

Here is the query plan with its steps expanded:

EXPLAIN2

A major goal for a good graph data model and query is one where the number of rows processed is minimized. Because we have an index on the released property of the Movie node, the initial step is simply an index lookup. You want to see the use of indexes in your queries.

For a better metric for analyzing how the Cypher statement will run, you use the PROFILE keyword which runs the Cypher statement and gives you run-time performance metrics.

Example: Using PROFILE

Here is the result returned using PROFILE for the previous Cypher statement:

PROFILE1

Here we see that for each phase of the graph engine processing, we can view the cache hits and most importantly the number of times the graph engine accessed the database (db hits). This is an important metric that will affect the performance of the Cypher statement at run-time. The overall execution milliseconds, however is the measurement that you typically use for query tuning. The elapsed milliseconds is affected, not only by your query, but also whether the caches are populated.

Example: PROFILE when no node labels

For example, if we were to change the Cypher statement so that the node labels are not specified, we see these metrics when we profile:

PROFILE2

Here we see more db hits which makes sense because all nodes need to be scanned to perform this query. And to total elapsed milliseconds is greater which we would expect as it typically correlates to database access.

Monitoring queries

If you are testing an application and have run several queries against the graph, there may be times when your session hangs with what seems to be a very long-running query.

There are two reasons why a Cypher query may take a long time:

  • The query returns a lot of data. The query completes execution in the graph engine, but it takes a long time to create the result stream.

    • Example A: MATCH (a)--(b)--(c)--(d)--(e)--(f)--(g) RETURN a

  • The query takes a long time to execute in the graph engine.

    • Example B: MATCH (a), (b), (c), (d), (e) RETURN count(id(a))

Killing a query

If the query executes for a long time, you can kill the query by simply closing the result pane in Neo4j Browser.

Here we kill the first running query by closing the result pane in Neo4j Browser:

ROGUE1

Note that you must kill it in Neo4j Browser while the query is running. If the query has completed, and it is now returning the results to the client, you will not be able to kill it. All you can do at this point is to close Neo4j Browser.

Here we kill the second running query:

ROGUE2

Example: Monitoring queries

You might want to understand whether the query is taking a long time or whether the query has completed, but it is returning a lot of results.

You can monitor it by using the :queries command. Here is a screenshot where we are monitoring a long-running (Example A) query in another Neo4j Browser session:

ListQueries1

The :queries command calls dbms.listQueries() under the hood, which is why we see two queries here.

Example: A long-running query

After a while, we see that the query has completed, but in the Neo4j Browser client (on the left) it is still trying to return results to the client. In this case, all you can do to stop the query from the client is to close/kill the client.

ListQueries2

Here is a screenshot where we are monitoring a long-running (Example B) query in another Neo4j Browser session:

ListQueries3

Example: Killing a long-running query

After a while, we see that the query is still running. In the browser window on the right, you can kill the long running query. Once the query has been killed, the client will receive a message as shown on the left.

ListQueries4
The :queries command is only available in the Enterprise Edition of Neo4j.

Cypher query best practices

Here is a very high-level list of some of the best practices to aim for in your Cypher queries:

  • Create and use indexes effectively.

  • Use parameters rather than literals in your queries.

  • Specify node labels in MATCH clauses.

  • Reduce the number of rows processed.

  • Aggregate early in the query, rather than in the RETURN clause, if possible.

  • Use DISTINCT and LIMIT early in the query to reduce the number of rows processed.

  • Defer property access until you really need it.

Many of these guidelines are covered in the course, Advanced Cypher.

Exercise 15: Using query best practices

In the query edit pane of Neo4j Browser, execute the browser command:

:play 4.0-intro-neo4j-exercises

and follow the instructions for Exercise 15.

This exercise has 14 steps. Estimated time to complete: 30 minutes.

Check your understanding

Question 1

What Cypher keyword can you use to prefix any Cypher statement to examine how many db hits occurred when the statement executed?

Select the correct answer.

  • ANALYZE

  • EXPLAIN

  • PROFILE

  • MONITOR

Question 2

What commands do you use to set values for parameters in your Neo4j Browser session?

Select the correct answers.

  • :set param

  • :param

  • :set params

  • :params

Question 3

Suppose you are executing queries in Neo4j Browser Session A and monitoring them in Neo4j Browser Session B with the :queries command. What are some ways that you can kill a query?

Select the correct answers.

  • You can close the result pane in Session A, if the query can be seen in Session B.

  • You can close the result pane in Session A, if the query can no longer be seen in Session B.

  • You can kill any running query seen in Session B.

  • You can close the Neo4j Browser that is running Session A.

Summary

You can now:

  • Use parameters in your Cypher statements.

  • Analyze Cypher execution.

  • Monitor queries.