Developer Guides Getting Started Getting Started What is a Graph Database? Intro to Graph DBs Video Series Concepts: RDBMS to Graph Concepts: NoSQL to Graph Getting Started Resources Neo4j Graph Platform Graph Platform Overview Neo4j Desktop Intro Neo4j Browser Intro… Read more →

Developer Guides

Want to Speak? Get $ back.

Tutorial: Build a Cypher Recommendation Engine

Goals
This guide shows how to use the relationships in your data to gather insights and recommend new entities that do not currently have a direct relationship based on the other relationships and network in the graph.
Prerequisites
You should have a basic understanding of the property graph model. Having Neo4j Desktop downloaded and installed will allow you code along with the examples.
Beginner

This tutorial explains how to use a basic dataset of Actors acting in Movies in a graph database to show recommendations for other actors to work with or similar movies to watch.

By following the meaningful relationships between the people and movies, you can determine occurences of actors working together, the frequency of actors working with one another, and the movies they have in common in the graph. This structure forms the basis for many recommendation engines.

Setting Up

You can follow along by starting Neo4j Desktop and opening Neo4j Browser. For a complete walkthrough of those steps, you can read this blog post.

Once you have a running instance of Neo4j, you populate the movie dataset by typing :play movie graph into the Neo4j Browser command line and clicking the play button (cypher run button).

Go to the second slide (using the right arrow) of the pane that appears below the command line, click the query, and run it.

You should see a result pane in Neo4j like the one below.

This data has been created and stored in the database so we can query it. The next section will show you how to write some queries to explore the data you just created.

Basic Queries

Before we start recommending things, we need to find out what is interesting in our data to see what kinds of things we can and want to recommend. To start, let us run a query like this to find a single actor like Tom Hanks.

MATCH (tom:Person {name: 'Tom Hanks'})
RETURN tom

Now that we found an actor we are interested in, we can retrieve all his movies by starting from the Tom Hanks node and following the ACTED_IN relationships. Your results should look like a graph.

MATCH (tom:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(movie:Movie)
RETURN tom, r, movie

Of course, Tom has colleagues who acted with him in his movies. A statement to find Tom’s co-actors looks like this:

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coActor:Person)
RETURN coActor.name

Recommendations with Collaborative Filtering

We can now turn the co-actor query above into a recommendation query by following those relationships another step out to find the “co-co-actors”, i.e. the second-degree actors in Tom’s network. This will show us all the actors Tom may not have worked with yet, and we can specify a criteria to be sure he hasn’t directly acted with that person.

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name

You probably noticed that a few names appear multiple times. This is because there are multiple paths to follow from Tom Hanks to these actors.

To see which co-co-actors appear most often in Tom’s network, we can take frequency of occurrences into account by counting the number of paths between Tom Hanks and each coCoActor and ordering them by highest to lowest value.

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(coCoActor:Person)
WHERE tom <> coCoActor
AND NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(coCoActor)
RETURN coCoActor.name, count(coCoActor) as frequency
ORDER BY frequency DESC
LIMIT 5

One of those “co-co-actors” is Tom Cruise. Now let’s see which movies and actors are between the two Toms so we can find out who can introduce them.

Exploring the Paths

MATCH (tom:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(movie1:Movie)<-[:ACTED_IN]-(coActor:Person)-[:ACTED_IN]->(movie2:Movie)<-[:ACTED_IN]-(cruise:Person {name: 'Tom Cruise'})
WHERE NOT (tom)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(cruise)
RETURN tom, movie1, coActor, movie2, cruise

As you can see, this returns multiple paths. If you have ever played the six degrees of Kevin Bacon game, this concept of seeing how many hops exist between people is exactly what graphs depict. You will notice that our results even return a path with Kevin Bacon himself.

With these two simple Cypher statements, we already created two recommendation algorithms – who to meet/work with next and how to meet them.

Other Recommendations

You could apply the same ideas you learned here to many other uses for recommending products and services, finding restaurants or activities you might like, or connecting with other colleagues who share similar interests of skills. We will mention a few specifically here with resources you can use to find more information.

Restaurant Recommendations

We have a graph of a few friends with their favorite restaurants, cuisines, and locations.

restaurant recommendation

A practical question to answer here, formulated as a graph search, is:

What Sushi restaurants are in New York that my friends like?

How could we translate that into the appropriate Cypher statement?

MATCH (person:Person {name: 'Philip'})-[:IS_FRIEND_OF]->(friend)-[:LIKES]->(restaurant:Restaurant)-[:LOCATED_IN]->(loc:Location {location: 'New York'}),
      (restaurant)-[:SERVES]->(type:Cuisine {type: 'Sushi'})
RETURN restaurant.name, count(*) AS occurrence
ORDER BY occurrence DESC
LIMIT 5

Other factors that can be easily integrated in this query are favorites, allergies, ratings, and distance from my current position.