10 Things You Can Do With Cypher That Are Hard With SQL
data:image/s3,"s3://crabby-images/c090e/c090eec8617cbb9305de0aaed5523a8f242627fa" alt=""
Head of Product Innovation & Developer Strategy, Neo4j
11 min read
data:image/s3,"s3://crabby-images/b9507/b95075ef0ebe44a04f3deb3194b2799d16f57ee5" alt=""
Cypher vs. SQL
SQL as a query language has been around more than 50 years and was developed with the involvement of many vendors and ISO committees. Initially, SQL started out as a query language for single tables. Later, JOINs and support for geospatial data for XML and JSON were added. Here is a typical SQL query which selects top-selling products from a retail database.SELECT p.product_name, count(*) as cnt FROM customers c JOIN orders o ON (c.customer_id = o.customer_id) JOIN order_details od on (od.order_id = o.order_id) JOIN products p ON (od.product_id = p.product_id) WHERE p.product_name LIKE '%G%' GROUP BY p.product_name ORDER BY cnt DESC LIMIT 5;I used four JOINs in the SQL query to produce the following results. product_name | cnt ———————————+—– Guaraná Fantástica | 51 Gorgonzola Telino | 51 Gnocchi di nonna Alice | 50 Mozzarella di Giovanni | 38 Gumbär Gummibärchen | 32 As you know, relational databases are built on tables and connections are made by JOINs. Cypher, on the other hand, is a language designed for graphs. For comparison, here is a query in Cypher.
MATCH (a:Person)-[:ACTED_IN]->(m:Movie) WITH a, count(m) as movies WHERE movies > 2 MATCH (a)-[:DIRECTED]->(m) RETURN a.name, collect(m.title) as movies, count(*) as count ORDER BY count DESC LIMIT 5;This query has the building blocks of a graph, meaning entities like nodes and their relationships are built into the code. Cypher is easy to read and understand, even for people who are not developers. It is a powerful, modern language and borrows from other query and programming languages like Python. Subject matter experts or business analysts can read these queries, provide feedback, or even write the queries themselves. A popular feature in graph databases is that database searches are schema free or schema optional. That means you don’t have a fixed schema that represents your data, and you can instead evolve your schema data, adding new attribute types and relationships as you go. This makes it very flexible for developing an application.
Datasets
In today’s blog, I’ll be using two built-in example datasets in Neo4j: one called Movies and another called Northwind. We use an IMDb style database and the Northwind Retail database as examples, because there are lots of materials for those in SQL available. Here you can see the data models for both.data:image/s3,"s3://crabby-images/26747/2674777dd45d9dab420227d9b79302b8afa139f5" alt=""
data:image/s3,"s3://crabby-images/03ff9/03ff9bdb5eaeda97c98b6d69666d6d25613e28ec" alt=""
Graph Patterns vs. JOINS
In Cypher, we look at visual patterns. Whatever you would draw on a whiteboard or on a diagram as circles and arrows, we are able to represent in the query language. Nodes All nodes or entities are in parentheses. You can also put stuff into the parentheses, for instance, a type of node or additional attribute:()
(:Person)
(p:Person {name:'Emil'})
Relationships
Similarly, the simplest way to represent a relationship is just two dashes, but you can also add two dashes and a greater than sign for an arrow, or even use square brackets to add more information into a relationship:
--
-->
-[:RATED]->
-[:RATED {stars:5}]->
Combining nodes and relationships gives us Cypher patterns.
(:Person {name:'Emil'})-[:RATED {stars:5}]->(:Movie {title:'The Matrix'})I can imagine seeing this drawn out on a whiteboard: a person named Emil, with an arrow pointing to a movie, titled The Matrix. All of these little half dashes are actually like a JOIN in a relational database. However, in a relational database we need to JOIN from your entity to the JOIN table and again from a JOIN table to the other entity, which adds computational steps.
FROM customers c1 JOIN orders o1 ON (c1.customer_id = o1.customer_id) JOIN order_details od1 ON (od1.order_id = o1.order_id) JOIN products p1 ON (od1.product_id = p1.product_id) JOIN orders o2 ON (o2.customer_id = o1.customer_id) JOIN order_details od2 ON (od2.order_id = o2.order_id) JOIN products p2 ON (od2.product_id = p2.product_id) WHERE c1.customer_name = 'Emil' AND p1.product_name = 'Philz Coffee'This customer-product relationship SQL example here could be used to build a recommendation engine. To make a relational database work for this, you need many JOINs, each one slowing the application down. (As you’ve probably gathered, Emil really likes Philz Coffee.) Now that we’ve covered some key points, we can dive into ten things you can do easily in Cypher that are difficult with SQL!
1. Path Patterns
Path Patterns is an advanced feature to start with, but I thought we’d just go for it.:param product=>"Pavlova";Here I set a parameter called product to use a bit later in my queries. Now let’s take a look at the path patterns.
MATCH path = (p:Product {productName:$product})-[:PART_OF*]->(root:Category) RETURN pathWe have a product pointing to a list of categories. If you don’t know how deep your hierarchy of categories is, then you can just put a star and it will find arbitrary, linked relationships up to the root category.
MATCH path = (p:Product {productName:$product})-[:PART_OF*1..10]->(root:Category) RETURN pathYou can also limit this by saying find a path of up to 10 relationships. Below is just one category, Confections, but I can add more categories.
MATCH path = (c:Category {categoryName: "Confections"}) CREATE (c)-[:PART_OF]->(:Category {categoryName: "Sweets"})-[:PART_OF]->(:Category {categoryName: "Unhealthy Food"}) RETURN pathUsing a relationship type PART_OF, we add another category with a categoryName of “Sweets”. We can even add an additional category with a categoryName of “Unhealthy Food”. Here’s what our path would look like.
data:image/s3,"s3://crabby-images/a120d/a120da03eb3e116fba504a6b89a7b621c21f3ddb" alt=""
MATCH path = (p:Person)-[*1..5]-(p) RETURN [n in nodes(path) | coalesce(n.name, n.title)] as path, length(path) LIMIT 10Here I am looking for a path of one to five hops, and I want to end up with the person I started with. This data can be shown tabularly, but also visually, which is what graph is all about. Here’s a longer loop with its visual.
MATCH path = (p:Person)-[*3..10]-(p) RETURN path LIMIT 1
data:image/s3,"s3://crabby-images/db069/db0691896e402bf9cc8a5175de54603e05d55f0c" alt=""
2. Shortest Path
The second Cypher feature is shortest paths. You’ve probably seen on LinkedIn that a person is so many hops away from this other person. This is something that can be done easily in a graph database.:param name=>"Meg Ryan";I begin with our starting person, and in our case I’ll use the name “Meg Ryan”. I want to find the shortest path from Meg Ryan to Kevin Bacon.
MATCH path = shortestPath( (:Person {name:$name})-[:ACTED_IN*..10]-(:Person {name:'Kevin Bacon'}) ) RETURN path, length(path);I set the ACTED_IN linkages to be up to 10 hops away, and then I return the path and the length of the path. I can make this path an arbitrary length– one hop, two, three, four, five.
data:image/s3,"s3://crabby-images/8ab33/8ab330341139d4ea5fb232e1ccccfb795f659f51" alt=""
3. Aggregation Without GROUP BY
This next feature is something that I’m personally very happy about. I always hated GROUP BY in SQL because it is superfluous having to spell out grouping keys, both on the field selection and again in the GROUP BY statement. This is something that we wanted to avoid in Cypher because it is obvious if I have a GROUP BY statement or the non-aggregated fields grouping key.MATCH (m:Movie)<-[:ACTED_IN]-(a:Person) // in-between aggregation, only passing on declared fields WITH a, count(m) as movies // with filter on the aggregated value WHERE movies > 2 // next query part MATCH (a)-[:DIRECTED]->(m) // aggregation at the end by a.name RETURN a.name, collect(m.title) as movies, count(*) as count ORDER BY count DESC LIMIT 5;In this case, we find people that acted in a movie and then use an in-between aggregation. Using the WITH clause, we see the aggregation by person. Person is automatically our grouping key because it’s the only field that’s not aggregated in our return results.
4. Built-In Data Structure Support: Lists and Maps
If you come from a programming language background, it’s quite useful and common to use lists and maps as first class functions. Cypher has full support for lists. You can have ranges, maps, and slices. We have index access for lists, list comprehensions, and the quantifiers on lists. Along with conditions on lists, for maps we have dot-access, key access, and map projections.// list comprehension WITH [id IN range(1,100) WHERE id%13 = 0 | // literal map construction {id:id, name:'Joe '+id, age:id%100}] as people // list quantor predicate on list elements WHERE ALL(p in people WHERE p.age % 13 = 0) RETURN peopleThis example of list and map usage is a bit contrived. We started with a range of 100 numbers, so from one to 100, we iterate over the list. Their IDs are in this list. Then, because 13 is my lucky number, we add the filter and pick every 13 numbers, everything that’s divisible by 13 with no remainder. For each of those IDs, we create a map with an ID field, a name field that is a connected string with the ID, and an age field, which is the ID modulo 100. So we loop through people from one to 99. Here’s what this looks like.
data:image/s3,"s3://crabby-images/4ad2a/4ad2a8d77509b23c3b0c17c8bd520d27138d5b22" alt=""
5. UNWIND and Collect
This next feature is something that I was involved in building, which is UNWIND and Collect. The Collect function turns a number of values into a real list. UNWIND is the inverse of that. Collect takes a number of rows and turns it into a list, and UNWIND takes a list and turns it into a number of rows. UNWIND is especially helpful with data creation and updates based on lists. If you pass in a list of data, such as a list of maps or arrays, you can iterate over this list of elements with UNWIND and for every element you can create or update an entity.// turn list into 10k rows UNWIND range(1,10000) as id CREATE (p:Person {id:id, name:"Joe "+id, age:id%100}) // aggregate people into age groups, collect per age-group RETURN p.age, count(*) as ageCount, collect(p.name)[0..5] as ageGroupThis example takes a list of 10,000 people, turns them into rows where the ID field is numbered 1 all the way through 10,000. Then, for each of those, we create a person with an ID. For the inverse Collect function, we RETURN the age as the grouping key and then the count of how many people are in each age group. We can Collect any expression, including the person’s name, which we do in this case. Then we do a list slice on Collect to only show the first five values.
data:image/s3,"s3://crabby-images/09152/09152fb8646e8cc280600c9252c93331ffdd386f" alt=""
6. Pattern Comprehensions & Map Projections
Neo4j has been looking at GraphQL, which is an API query language for graph shape domain data. About six years ago, we thought of some critical features in GraphQL that we could add to Cypher. One is called Map Projections, and the other is called Pattern Comprehension.MATCH (m:Movie) // map projection, with property access RETURN m { .*, // pattern comprehension actors: [(m)<-[r:ACTED_IN]-(a) | // property access, nested expressions a { .name, roles: r.roles, movies: size([()<-[:ACTED_IN]-(a)|a]) } ][0..5], // list slice // pattern comprehension with filter & expression directors: [(m)<-[:DIRECTED]-(d) WHERE d.born < 1975 | d.name] } as movieDocument LIMIT 10For Map Projections, we use our Movies database, and then for a movie that we have to find, we RETURN all of its properties. That’s the .*, in the example above. You can then add other fields to this map as well. For instance, you can have an actual field in a map, which is a nested structure. This is good for when you want to RETURN a document-like element, such as with a front-end application or JavaScript application, and get a JSON object back with all the data in a nested document. Finally, Pattern Comprehension in the WHERE clause is similar to list comprehension, but for related graph patterns instead of lists of elements.
7. Combining Reads and Writes
If you’ve used SQL, you might have seen that you need to support Reads and Writes. You can do insert statements or insert from select, but that’s about it. This is why it’s nice to combine Reads and Writes in the same query. With GraphQL, while updating the database, you can also fetch data. In our example, just set the parameter with a movie title, a person’s name, and some rating stars. To query this parameter, we first find a movie through Read and then have a Write statement.param rating=>({title:'The Matrix',name:'Emil Eifrem', stars:5}) MATCH (m:Movie {title:$rating.title}) // possibly write MERGE (u:User {name:$rating.name}) // possibly write MERGE (u)-[r:RATED]->(m) // write SET r.stars = $rating.stars WITH * MATCH (m)<-[:ACTED_IN]-(a:Person) // read & return RETURN u, m, r, collect(a)If the User doesn’t exist, we create a User with the name ‘Emil Eifrem’. If the RATED relationship between User and Movie does not exist, we create it, and if it already exists, we access it. Then we set the new star rating. Here it is 5. After the update, we do another Read and find all the actors for this movie and RETURN our user, movie, rating, as well as all the actors from this movie. When I run this, here is the result below.
data:image/s3,"s3://crabby-images/8037f/8037f88197b113bc9f08ce66eb0b86eea2b135a4" alt=""
8. Transactional Batching
If you do large updates on a database, you need to deal with the fact that in a transaction, the information in your update is kept in memory until you commit your transaction. Transactional Batching allows you to limit the amount of memory you use at one time.:auto MATCH (o:Order) // imagine 100M CALL { WITH o MATCH (o)-[r:ORDERS]->() WITH o, sum(r.count) as products, sum(toFloat(r.unitPrice)*r.quantity) as total SET o.count = products, o.total = total } IN TRANSACTIONS OF 100000 ROWSI want to compute two new fields from our total orders. For each ORDER, I get the number of products and the total price. The unit price multiplied by the quantity is our total price. We set these two attributes on the ORDER node. Here, you would make 100,000 updates for each sub-transaction. If you run this, you would see how the data gets updated in one transaction after another.
9. Built-In Data Loading: CSVs
In Neo4j, you can access CSV files via a URL and then use the CSV files to make updates to your graph.WITH "https://data.neo4j.com/importing/ratings.csv" AS url LOAD CSV WITH HEADERS FROM url AS row MATCH (m:Movie {movieId:row.movieId}) MERGE (u:User {userId:row.userId}) ON CREATE SET u.name = row.name MERGE (u)-[r:RATED]->(m) SET r.rating = toFloat(row.rating) SET r.timestamp = toInteger(row.timestamp)I take a CSV file, and using my URL, I turn my CSV into a map structure. Each row is actually a map structure that I can use within my query tool. When a user doesn’t exist, we MERGE the user to create them. Here are the nodes this query creates for our movies, users, and relationships.
data:image/s3,"s3://crabby-images/bc0ec/bc0ec6c87b21345297f6e366a7c4b237b7252020" alt=""
10. Built-In Data Loading: APIs
Here we have a user-defined procedure that can load JSON data straight from an API.WITH "https://api.stackexchange.com/2.2/questions?pagesize=2&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url // load the json data from the URL CALL apoc.load.json(url) YIELD value // turn element list into rows UNWIND value.items AS item // deconstruct row data RETURN item.title, item.owner, item.creation_date, keys(item) LIMIT 5;Using YIELD, we take the data and UNWIND it, turning this JSON list into rows.
data:image/s3,"s3://crabby-images/3e7aa/3e7aa172e085f244ec0d8758f98e0bc399a1405e" alt=""