Online Course Implementing Graph Data Models in Neo4j 4.0 Implementing Your First Model Importing Data into the Graph Profiling Queries Refactoring Graphs Summary Want to Speak? Get $ back. Profiling Queries About this module At the end of this module,… Read more →
At the end of this module, you should be able to:
- Profile queries against the graph.
- Determine if a query can be improved.
Understanding the performance of the queries for your use cases is an important part of implementing a graph data model. Here is the workflow you should use:
- Load data into the graph.
- Create queries that answer the application questions.
- Execute the queries against the data to see if they retrieve the correct information.
- PROFILE the query execution.
Identify problems and weaknesses in the query execution.
- Can the query be rewritten to perform better?
- Do we need to refactor the graph?
- If necessary, modify the graph data model and refactor the graph.
- PROFILE the same type of query against the refactored graph.
|The new query will be different due to the change in the graph data model.|
Here is the code to profile a query that retrieves all connections that have a destination of LAS:
PROFILE MATCH (origin:Airport)-[c:CONNECTED_TO]->(destination:Airport) WHERE destination.code = 'LAS' RETURN origin, destination, c LIMIT 10
To profile a query, simply add
PROFILE to the beginning of the Cypher statement.
This executes the query as normal, but gives a different output, which shows every step in the query execution plan, and how much each step cost.
PROFILE, the main metric we consider is db hits.
Execution time is important, but that is generally a result of administrative factors, like bandwidth, memory, and traffic volume.
db hits, on the other hand, are entirely a function of the data model and query.
The first step in any query is locating the anchor. In this case, Neo4j will anchor on the destination node set, because that one has a property filter while origin does not.
According to the
PROFILE, the anchor was located by first locating all Airport nodes under the Airport label, then scanning to find the desired code property value.
This required 7 and 12 db hits respectively, and got us down to a single node—the perfect anchor set.
This label scan + property scan is much less efficient than an index lookup. We will observe that in a later exercise.
With the anchor set identified, Neo4j then expanded along every incoming CONNECTED_TO relationship, finding 11 such paths with 13 and 11 db hits respectively.
Next, Neo4j checked which of those paths terminated at an origin node labeled Airport. Based on what we know about our model, it is no surprise that all 11 paths qualified. This is an example of providing an unnecessary filter. We could have dispensed with the Airport label on the origin set, and saved 11 db hits.
With traversal finished, Neo4j then returned the results, filtered by the
This operates entirely upon the objects already in memory, and so requires no db hits.
When all is said and done, you can see that total cost of this query was 43 db hits. We could have saved 11 by not filtering on Airport for the origin nodes. In addition, recall that identifying the anchor required 7+12 = 19 db hits. So our best opportunity for improving the performance of this query lies in finding a way to anchor more efficiently. PROFILE made that obvious, and we would have been unable to discover that by any other means.
In the query edit pane of Neo4j Browser, execute the browser command:
and follow the instructions for Exercise 3.
|This exercise has 2 steps. Estimated time to complete: 15 minutes.|
In the previous exercise, we asked this question:
What are the airports and flight information for flight number 1016 for airline WN?
This is our current model: