Query Your World with Cypher: Focus on Data Relationships


Editor’s Note: Last October at GraphConnect San Francisco, Nicole White – Data Scientist at Neo Technology – delivered this presentation on how to write Cypher queries for your most common connected-data questions.

For more videos from GraphConnect SF and to register for GraphConnect Europe, check out graphconnect.com.
.

Cypher: Neo4j’s Graph Query Language


When you ask someone what they love about Neo4j, Cypher is always at the top. Cypher is essentially ASCII art; you draw out your desired graph pattern in your code.

A node is indicated with open and closed parentheses, a data relationship is indicated by open/close square brackets, and to specify a pattern you use hyphens in combination with the nodes and relationships.

Drawing Graph Patterns with Nodes and Relationships


If you want to find a “node-relationship-node” pattern, you include two parentheses, a hyphen, open/close square brackets, another hyphen followed by a node on the other side, indicated below:

Drawing Graph Patterns in Cypher


With Cypher, you most commonly specify a relationship direction with the “greater than” > or “less than” < signs. The second row above shows that the node on the left has an outgoing relationship to the node on the right, while the third example shows the opposite.

An Example Dataset: StarCraft


Let’s examine an example dataset from the game StarCraft.

Below is a hierarchical tree of requirements that demonstrate the different types of buildings you can construct and their required components. For example, the below hierarchy indicates that to build a barracks, you first need to have a command center.

The Terran Technology Tree in StarCraft


We imported this tech tree into Neo4j because it’s very good at storing, modeling and querying tree-like structures. Below is the resulting graph:

Watch Nicole White Present on How to Query Your World with Cypher: Neo4j’s Graph Query Language


In this dataset, we only have two node labels — Building and Unit — while the relationship, which is the hierarchy of requirements, is labeled Requires.

Above, the blue Factory node in the center of the graph is a building that requires a Barracks, which requires a Supply Depot. There is an extensive hierarchy of building requirements that extends to the lowest node, which is typically the Supply Depot.

The graph also indicates that Units have requirements; for example, a Medevac requires a Starport which requires a Factory.

We also have Builds relationships, which demonstrate what is built by the different Buildings. For example, the Barracks builds a Reaper, Ghost, Marauder and Marine.

Cypher for Build Dependencies in StarCraft


Additionally, resources — such as minerals and gas — are required in order to create Units and Buildings. The below example demonstrates the resource requirements for the SVC unit and Supply Depot building:

Resource Property Labels in StarCraft


Query 1: What Units can be Built at the Barracks? The MATCH and WHERE Clauses

The most important component when writing a query is the MATCH clause, which is where you draw the graph pattern that will be retrieved by the query. In the below example, we indicate that we want to start with a node labeled Building, which is included after a colon and surrounded by parentheses and is our entity type:

A Cypher Query for What Units Can Be Built at a Barracks


Here we’ve indicated that we want to find a Building node that has an outgoing Build relationship to a Unit node. We chose to use the identifiers b, r and u to precede the colons, which are now bound to the entities and can be used in the following clauses.

Below the MATCH clause we have the WHERE clause, which indicates that the type of building we want is Barracks. The RETURN clause indicates the type of entities we want returned to us by our query.

Neo4j returns the following visual result of our data.

A Neo4j Graph of What Units Can Be Built at a Barracks


Query 2: Average Unit Cost

To find out the average cost of each unit, we use the same MATCH clause but without the identifier on the Builds relationship because we don’t want it in the RETURN. However, we do include the b and u identifiers because we want those returned in the RETURN clause:

A Cypher Query for the Average Cost of Units at Each Building in StarCraft


The below table is the result of our query; it provides us with the name of the Building and the average cost of each Unit at that Building in terms of minerals and gas.

The Cypher Query Results for Asking Average Unit Cost at Each Building in StarCraft


In the above example, the b.name is unique, but this doesn’t need to be the case. You can have multiple buildings with the same name, and your query can return all of them unless you have a uniqueness constraint on that property, such as “buildings have to be unique by the name property.”

Query 3: What Buildings and Units are Unlocked by Construction of an Engineering Bay?

In StarCraft, once you’ve constructed a certain number of buildings and units, you can construct an Engineering Bay. This allows you to build even more components.

To find out what buildings and units can only be built once an Engineering Bay has been constructed, we need to traverse the Requires relationship up one more level from the Engineering Bay.

In Query 2, we matched Buildings to Units through a Builds relationship. In this query, we are matching Buildings with other Buildings through a Requires relationship, which is why we’ve applied the building label to the nodes on either side of the Requires relationship. However, we’ve indicated a specific building type to the node on the far right, Engineering Bay.

Cypher Query for What Does an Engineering Bay Immediately Unlock


Neo4j returns the following, with the Engineering Bay in the middle and the immediate one-step-out buildings that are unlocked once the Engineering Bay has been constructed.

A Neo4j Graph of What an Engineering Bay Immediately Unlocks


Query 4: Which Buildings Have No Dependencies?

Which buildings can we build right away without having any other Buildings or Units on the map?

To answer this question, we include a pattern in the WHERE clause. We aren’t inquiring about data relationships; our only requirement is the return of entities that don’t have any Building requirements.

A Cypher Query for Which Buildings Have No Dependencies


As indicated below, there are three buildings that do not have any Requires relationships attached to them: the Refinery, Command Center and Supply Depot. In other words, these are the only buildings that you can construct from the very start of the game.

A Neo4j Graph of Buildings with No Dependencies


Query 5: Which Units Have Additional Requirements for Construction?

The prerequisite for constructing most units is simply related to a specific building. However, some have additional dependencies, which we indicate in the below MATCH clause:

A Cypher Query for which Units Have Additional Requirements Beyond their Builder


In the MATCH clause we’ve indicated that we want to find a Building that Builds a Unit, and the first additional requirement Building for that Unit. In the first line of the MATCH clause, we’ve bound the unit node to u, which we also use in the second line of the MATCH clause.

In the WHERE clause, we indicate with the less-than and greater-than signs that we don’t want entities returned that are the same (i.e., we don’t want units with a Requires relationship that pointed back to the same Builder building).

Below are the Neo4j results:

A Neo4j Graph of which Units have Additional Requirements Beyond Their Builder


Query 6: What’s the Most Expensive Unit that Can Be Built at the Factory? Ordering and Limits

The MATCH clause shows that a building builds a unit, and the WHERE clause shows that the name of the building is Factory. We want to return the name of the Unit, as well as the amount of required mineral and gas it takes to build it.

You can use ORDER and LIMIT right after the RETURN clause, and request the results in descending order, i.e. from most expensive to least expensive. LIMIT 1 refers to the most expensive unit that can be built at this particular Building.

A Cypher Query for the Most Expensive Unit that can be Built at the Factory


Query 7: What Do the Barracks Unlock up to Two Levels Deep? Traversing Two-Level Relationships

Our prior queries have all included single-level relationships (i.e., the entities have been directly related). However, this is a “variable length” query. In this case, we want to RETURN results that are separated by both one and two degrees from the Barracks. In the MATCH clause, we indicate this by including an asterisk and the number 2 in our relationship:

A Cypher Query for Data that is Connected Two Levels Deep


Below are the results in Neo4j:

A Neo4j Graph of Data that is Connected at Two Levels of Relationship


This map shows that the Barracks unlock the Bunker, Orbital Command, Factory and Ghost Academy. Two steps away, the Barracks unlock the Starport and Armory.

Query 8: What Are All The Dependencies of the Starport Building? What Not To Do

To answer this query, we move from the Starport node all the way down the hierarchy, which you do by including an asterisk while omitting the “maximum” on the relationship:

A Cypher Query for Dependencies of Building a Starport


This is not a good query, because it requires an exhaustive search of your entire database, and will only return the below:

A Neo4j Graph of the Dependencies of a Starport


It did the Requires relationship all the way through the hierarchy until it didn’t have anywhere to go. This isn’t a great way to create a dependency, so let’s explore a better dependency with the Battlecruiser as an example.

Query 9: What Are All The Buildings Required to Construct a Battlecruiser? Cypher’s Shortest Path Function

To address this query, we rely on Cypher’s shortestPath function, which allows you to find the single shortest path between nodes. The syntax for this function is demonstrated below:

A Cypher Query using the shortestPath Function


This query returns the following Neo4j graph:

A Neo4j Graph of the shortestPath Function in Cypher


Query 10: What Are All The Buildings Required to Construct a BattleCruiser and How Much Will It Cost? Cypher’s “Unwind” Function

To answer this query we rely on Cypher’s UNWIND function, which allows you to expand a collection into a sequence of rows. In the below example, we grab the nodes out of a path, place those nodes into their own separate rows, and then RETURN the name and amount of resources required to build it:

A Cypher Query Using the UNWIND Function


The below table shows the amount of required resources to build each Building in the Neo4j graph from Query 8:

The Cypher Query Results of using the UNWIND Function


Query 11: What Are All the Necessary Components to Build a Battlecruiser, and Where Does it Need to be Built? Multiple MATCH Clauses

To answer this question, we will need to write multiple MATCH and WHERE clauses. In the below example, our first two lines are identical to the previous “shortest path” example, and we’ve bound Unit to the identifier u:

A Cypher Query using Multiple MATCH Clauses


Neo4j then returns the following graph:

A Neo4j Graph of a Cypher Query using Multiple MATCH Clauses


So that is how you can use multiple MATCH clauses within a Cypher query.


Inspired by Nicole’s talk? Register for GraphConnect Europe on April 26, 2016 at for more industry-leading presentations and workshops on the evolving world of graph database technology.