Goals There’s lots of example datasets for Neo4j. This guide outlines some of them. Prerequisites You should be comfortable installing and importing data into Neo4j. Intermediate Overview The Dataset Download Example Queries Import Instructions The Dataset The movie database is… Learn More →

Goals
There’s lots of example datasets for Neo4j. This guide outlines some of them.
Prerequisites
You should be comfortable installing and importing data into Neo4j.
Intermediate


The Dataset

The movie database is a traditional dataset for graph databases, similiar to IMDB it contains movies and actors, directors, producers etc. It could be extended with Genres, Publishers, Ratings and more.

You find a similar, but smaller dataset in the Neo4j-Browser of your Neo4j installation by entering :play movie graph in the commandline.

This dataset was sourced from link:TheMovieDB.org. Thanks so much to them for allowing us to use it for educational purposes. It was originally used for the Spring Data Neo4j tutorial project Cineasts.

The data model is straightforward:

  • (:Movie {title, released, …​ })
  • (:Person {name, born, …​})
  • (:Person)-[:ACTED_IN|:DIRECTED|:PRODUCED]→(:Movie)

data modeling movies actors

Download

The dataset consists of 12862 movies, and 50179 people (44943 actors, 6037 directors).

Make sure to download the correct version of the dataset for your Neo4j installation.

Unzip the dataset after the download and move the graph.db folder to your /path/to/neo4j/data directory and override the graph.db folder that was previously there.

Installation in Detail:

  • Stop your Neo4j server with bin/neo4j stop or your control app
  • Unzip the downloaded file
  • Override graph.db in /path/to/neo/data
  • Start the server again with bin/neo4j start
  • Open the Neo4j Web Interface on http://localhost:7474
  • Read about the Cypher Query Language
  • Follow the source links for some example Cypher queries

You can also point the neo4j-shell to the extracted directory to run Cypher queries directly:

/path/to/neo/bin/neo4j-shell -path /path/to/graph.db

Example Queries

Actors who played in some movie
MATCH (m:Movie {title: 'Forrest Gump'})<-[:ACTS_IN]-(a:Actor)

RETURN a.name, a.birthplace
Find the most prolific actors
MATCH (a:Actor)-[:ACTS_IN]->(m:Movie)

RETURN a, count(*)

ORDER BY count(*) DESC LIMIT 10;
Find actors who have been in less than 3 movies
MATCH (a:Actor)-[:ACTS_IN]->(m:Movie)

WITH a, count(m) AS movie_count

WHERE movie_count < 3

RETURN a, movie_count

ORDER BY movie_count DESC LIMIT 5;
Find the actors with 20+ movies, and the movies in which they acted
MATCH (a:Actor)-[:ACTS_IN]->(m:Movie)

WITH a, collect(m.title) AS movies

WHERE length(movies) >= 20

RETURN a, movies

ORDER BY length(movies) DESC LIMIT 10;
Find prolific actors (10+) who have directed at least two films, count films acted in and list films directed
MATCH (a:Actor)-[:ACTS_IN]->(m:Movie)

WITH a, count(m) AS acted

WHERE acted >= 10

WITH a, acted

MATCH (a:Director)-[:DIRECTED]->(m:Movie)

WITH a, acted, collect(m.title) AS directed

WHERE length(directed) >= 2

RETURN a.name, acted, directed

ORDER BY length(directed) DESC, acted DESC;
Rewritten to filter both :Actor and :Director labels up front
MATCH (a:Actor:Director)-[:ACTS_IN]->(m:Movie)

WITH a, count(1) AS acted

WHERE acted >= 10

WITH a, acted

MATCH (a:Actor:Director)-[:DIRECTED]->(m:Movie)

WITH a, acted, collect(m.title) AS directed

WHERE length(directed) >= 2

RETURN a.name, acted, directed

ORDER BY length(directed) DESC, acted DESC;
Using the lowest cardinality label, :Director
MATCH (a:Director)-[:ACTS_IN]->(m)

WITH a, count(m) AS acted

WHERE acted >= 10

WITH a, acted

MATCH (a)-[:DIRECTED]->(m)

WITH a, acted, collect(m.title) AS directed

WHERE length(directed) >= 2

RETURN a.name, acted, directed

ORDER BY length(directed) DESC, acted DESC;
User-Ratings
MATCH (u:User {login: 'Michael'})-[r:RATED]->(m:Movie)

WHERE r.stars > 3

RETURN m.title, r.stars, r.comment
Recommendations including counts, grouping and sorting
MATCH (u:User {login: 'Michael'})-[:FRIEND]-()-[r:RATED]->(m:Movie)

RETURN m.title, avg(r.stars), count(*)

ORDER BY AVG(r.stars) DESC, count(*) DESC
Ratings by like-minded people
MATCH (u:User)-[r:RATED]->(m:Movie)<-[r2:RATED]-(likeminded),

      (u)-[:FRIEND]-(friend)

WHERE r.stars > 3 AND r2.stars >= 3

RETURN likeminded, count(*)

ORDER BY count(*) desc LIMIT 10;
Mutual Friend recommendations
MATCH (u:User {login: 'Michael'})-[:FRIEND]-(f:Person)-[r:RATED]->(m:Movie)

WHERE r.stars > 3

RETURN f.name, m.title, r.stars, r.comment

Import Instructions

TheMovieDB.org offers a JSON API to access, movies, and their cast.

We use that API with a ruby script to turn each movie-json object into a cypher statement. Make sure to get an API-Key upfront.

require 'rubygems'

require 'rest-client'

require 'json'



URL = "http://api.themoviedb.org/3"

# get the api key at https://www.themoviedb.org/faq/api and set it as environment variable

KEY = ENV['THE_MOVIE_DB_KEY']





puts setup

puts "BEGIN"

[19995 , 194, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 13, 20526, 11, 1893, 1892,

 1894, 168, 193, 200, 157, 152, 201, 154, 12155, 58, 285, 118, 22, 392, 5255, 568, 9800, 497, 101, 120, 121, 122].each do |id|

 puts create_movie(movie(id))

end

puts "COMMIT"



def get(type,id)

  url = "#{URL}/#{type}/#{id}?api_key=#{KEY}&append_to_response=casts"

  res = RestClient.get url

  File.open("json/#{id}.json", 'w') {|f| f.write(res.to_str) }

  JSON.parse(res.to_str)

end



def person(id)

  get("person",id)

end



def clean(name)

  name.to_s.gsub(/['"]/,"")

end

def movie(id)

  parse_movie(get("movie",id),id)

end



def parse_movie(res,id)

#  puts res.inspect

  movie = [:tagline,:released,:genres].reduce({:movie_id => id}) { |r, prop| r[prop] = res[prop.to_s] if res[prop.to_s] && res[prop.to_s].length>0; r }

  movie[:title] = res["title"]

  movie[:genres] = movie[:genres].collect { |g| g["name"] }

  movie[:actors] = res["casts"]["cast"].collect { |g| { :id => g["id"], :name => clean(g["name"]) , :role => clean(g["character"]) }}

  movie[:directors] = res["casts"]["crew"].find_all {|a| a["job"]=="Director" } .collect { |g| { :id => g["id"], :name => clean(g["name"]) }}

  movie

end



def setup

  ["CREATE INDEX ON :Movie(title)",

  "CREATE INDEX ON :Movie(movie_id)",

  "CREATE INDEX ON :Person(id)",

  "CREATE INDEX ON :Person(name)",

  "CREATE INDEX ON :Genre(genre)",""].join(";\n")

end



# node auto-index for movie_id, id, name, title, genre, type

def create_movie(movie)

  props=[:movie_id, :title,:tagline,:released].find_all{ |p| movie[p] }.collect { |p| "#{p}:'#{clean(movie[p])}'" }.join(', ')

  actors = movie[:actors].collect { |a| "CREATE UNIQUE (movie)<-[:ACTS_IN {role:'#{clean(a[:role])}'}]-(:Person:Actor {id:'#{a[:id]}', name: '#{a[:name]}'})-[:PERSON]->(people) "}.join(" \n") + "\n"

  directors = movie[:directors].collect { |a| "CREATE UNIQUE (movie)<-[:DIRECTED]-(:Person:Director {id:'#{a[:id]}', name:'#{a[:name]}'})-[:PERSON]->(people) "}.join(" \n") + "\n"

  genres = movie[:genres].collect { |g| "CREATE UNIQUE (movie)-[:GENRE]->(:Genre {genre:'#{g}'})-[:GENRE]->(genres) "}.join(" \n") + "\n"

  " MERGE (genres:Genres)

    MERGE (movies:Movies)

    MERGE (people:People)

    CREATE (movie:Movie {#{props}})

   " + genres + actors + directors + ";"

end

It outputs the Cypher statements on stdout and can be piped to the neo4j-shell or a file, which then can be read by the neo4j-shell.