Tutorial: Build a Cypher Recommendation Engine

Important - page not maintained

This page is no longer being maintained and its content may be out of date. For the latest guidance, please visit the Getting Started Manual .

Goals
Gather insights and generate recommendations with simple cypher queries, by navigating the graph
Prerequisites
You should have a basic understanding of the property graph model.

Beginner

Graphs are everywhere. By following the meaningful relationships between the people and movies, you can determine occurences of actors working together, the frequency of actors working with one another, and the movies they have in common in the graph. This is one way we can recommend movies to users, based on what they liked before, and their favorite actors. We will step you through everything you need to get started with AuraDB and Cypher, to solve a real-world problem.

Neo4j Aura

First, Create your AuraDB Free Instance

Free forever, no credit card required.

Setting Up

When you’ve created your AuraDB account, click "Create a Database" and select a free database

Then, fill out the name, and choose a cloud region for your database and click "Create Database". Make sure "Learn about graphs with a movie dataset" is selected, so you’ll start with a dataset.

AuraDB will prompt you with the password for your new instance while it being set up. Make sure to save the password for later steps.

Once your database is running, open browser as shown below.

Now you’ve arrived inside of Neo4j Browser. Use your username and password (the one you captured above) to log in. You’ll immediately notice a guide on the left-hand side that you can tab through to start out with some experimental queries. Any of these queries you see can be automatically put into the query execution box and run on the right hand side of the screen by clicking the little "play" button.

This first query just shows a few movies in the database to prove there’s something there. Congratulations, you’ve got some data in a new database, and we’re ready to get started.

The next section will show you how to write some queries to explore the data you just created.

Basic Queries

Before we start recommending things, we need to find out what is interesting in our data to see what kinds of things we can and want to recommend. To start, let us run a query like this to find a single actor like Tom Hanks.

MATCH (tom:Person {name: 'Tom Hanks'})
RETURN tom

Now that we found an actor we are interested in, we can retrieve all his movies by starting from the Tom Hanks node and following the ACTED_IN relationships. Your results should look like a graph.

MATCH (tom:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(movie:Movie)
RETURN tom, r, movie

Of course, Tom has colleagues who acted with him in his movies. A statement to find Tom’s co-actors looks like this:

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person)
RETURN coActor.name

Recommendations with Collaborative Filtering

We can now turn the co-actor query above into a recommendation query by following those relationships another step out to find the "co-co-actors", i.e. the second-degree actors in Tom’s network. This will show us all the actors Tom may not have worked with yet, and we can specify a criteria to be sure he hasn’t directly acted with that person.

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name

You probably noticed that a few names appear multiple times. This is because there are multiple paths to follow from Tom Hanks to these actors.

To see which co-co-actors appear most often in Tom’s network, we can take frequency of occurrences into account by counting the number of paths between Tom Hanks and each coCoActor and ordering them by highest to lowest value.

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name, count(coCoActor) as frequency
ORDER BY frequency DESC
LIMIT 5

One of those "co-co-actors" is Tom Cruise. Now let’s see which movies and actors are between the two Toms so we can find out who can introduce them.

Exploring the Paths

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(cruise:Person {name: 'Tom Cruise'})
WHERE NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(cruise)
RETURN tom, movie1, coActor, movie2, cruise

As you can see, this returns multiple paths. If you have ever played the six degrees of Kevin Bacon game, this concept of seeing how many hops exist between people is exactly what graphs depict. You will notice that our results even return a path with Kevin Bacon himself.

With these two simple Cypher statements, we already created two recommendation algorithms - who to meet/work with next and how to meet them.

Other Recommendations

You could apply the same ideas you learned here to many other uses for recommending products and services, finding restaurants or activities you might like, or connecting with other colleagues who share similar interests of skills. We will mention a few specifically here with resources you can use to find more information.

Restaurant Recommendations

We have a graph of a few friends with their favorite restaurants, cuisines, and locations.

restaurant recommendation

A practical question to answer here, formulated as a graph search, is:

What Sushi restaurants are in New York that my friends like?

How could we translate that into the appropriate Cypher statement?

MATCH (person:Person {name: 'Philip'})-[:IS_FRIEND_OF]->(friend)-[:LIKES]->(restaurant:Restaurant)-[:LOCATED_IN]->(loc:Location {location: 'New York'}),
      (restaurant)-[:SERVES]->(type:Cuisine {type: 'Sushi'})
RETURN restaurant.name, count(*) AS occurrence
ORDER BY occurrence DESC
LIMIT 5

Other factors that can be easily integrated in this query are favorites, allergies, ratings, and distance from my current position.

Resources

Are you struggling?
If you need help with any of the information contained on this page, you can reach out to other members of our community. You can ask questions in the Cypher category on the Neo4j Community Site.