Guide: Example Datasets
This Guide introduces different example datasets for Neo4j and demonstrates how to import and explore them.
You should be comfortable installing Neo4j (Desktop, Docker) or spinning up an instance in the cloud.
For getting started with using Neo4j it’s helpful to use example datasets relevant to your domain and use-cases. For each we want to provide a description, the graph model and some use-case queries.
Neo4j Browser comes with two built-in examples, which you can create and explore using interactive slideshows.
The "Movies" example, is launched via the
:play movie-graph command and contains a small graph of movies and people related to those movies as actors, directors, producers etc.
The "Northwind" example, is run via
:play northwind-graph and contains an traditional retail-system with products, orders, customers, suppliers and employees.
It walks you through the import of the data and incrementally complex queries using the available data.
To explore a wide variety of datasets in an online setup without a local installation, you can use the Neo4j sandbox.
Each sandbox is available for at least 3 days after creation and can also be remotely accessed from applications using any Neo4j driver.
Except for the "blank" sandbox, all other sandboxes come prepopulated with the domain data and focus on use-case specific queries.
The use-cases range from
investigative data from the ICIJ Panama Papers to
crime investigation and
social networks optionally using your own Twitter account.
Other examples that you can quickly run within your own Neo4j Browser are:
:play gotGame of Thrones Interactions
:play nasaNASA knowledge graph example
:play ukcompaniesUK company registration, property ownership, political donations
:play stackoverflowStack Overflow users, tags and Q&A data
` ` BBC Good Foods recipe data
:play listingsAirbnb listings data
:play football_transfersFootball (Soccer) transfer data
Even broader is the selection of graph examples that have been provided by Neo4j users.
|Disclaimer: These examples are not curated and might not always represent the best graph data model.|
You can find a featured selection grouped by industry and use case at https://neo4j.com/graphgists
Those examples are presented in a more long-form style that also discusses data modeling and use an temporary Neo4j store in the background.
To execute these examples within your Neo4j Desktop, install the "Graph Gallery" app from: https://install.graphapp.io
Then you can search and browse all available examples locally and run them against your local databases.
The most reliable way to get a dataset into Neo4j is to import it from the raw sources. Then you are independent of database versions, which you otherwise might have to upgrade. That’s why we provided raw data (CSV, JSON, XML) for several of the datasets, accompanied by import scripts in Cypher.
You would run the Cypher script using a command-line client like
./bin/cypher-shell -u neo4j -p "password" < import-file.cypher
You can also drag and drop or paste the script into Neo4j Browser (check that multi-statement editor is enabled in the settings) and run it from there.
CSV data can be imported using either
LOAD CSV clause in Cypher or
neo4j-admin import --mode csv for initial bulk imports of large datasets.
Other datasets are provided as dump of a Neo4j datastore.
Please stop your Neo4j server.
Then you can import the file using the
./bin/neo4j-admin load --force true --from file.dumpcommand.
The Neo4j version of some of the datasets might be older than your Neo4j version.
Then you might need to configure Neo4j to upgrade your database automatically, by setting
This is a graph-import of the Stack Overflow archive with 16.4M questions, 52k tags and 8.9M users (Stack Overflow Dump (6.2GB)). This graph is pretty big, for best full scale querying you’d need a page-cache and heap of
Here is an article explaining the data model and some exploratory analysis we ran on the data.
The database is also available as a Neo4j Online Database with username "stackoverflow" and password "stackoverflow".
These are not prebuilt data-stores but existing datasets (mostly CSV) to be imported.
The linked articles and repositories also provide instructions for the import.
Was this page helpful?