Knowledge Base

Importing CSV Files: Neo4j Aura, Desktop and Sandbox

Loading various kinds of files into Neo4j requires different locations depending on the tool you are using.

Import methods we will cover:

  1. Remote: Neo4j Aura and Neo4j Sandbox

  2. Local: Neo4j Server and Neo4j Desktop

Neo4j Aura and Neo4j Sandbox

Cloud hosted versions of Neo4j can only access remote http(s) URLs. Because they are hosted in the cloud, the security settings do not allow sandboxes to access local file settings on a desktop. Any files that need to be imported must be stored or placed in a remote location that the instance can access

  • GitHub

  • Pastebin

  • Cloud Provider Storage

  • a website

  • Google Drive

  • Dropbox.

We will look at examples for some of the locations for importing into a Cloud hosted Neo4j Instance.

GitHub

If you find or place a file in a GitHub repository or GitHub Gist, others can access the content in a raw format. You just need to navigate to the place that contains the file and go to the file. Once there, you should see a menu bar like the one below right above the file contents.

Click on the Raw button in the button list on the right and copy the url path when the page loads (url should start like https://raw.githubusercontent.com/…​;). Now you should be able to use the data in your Neo4j Browser session with a statement like this one.

LOAD CSV FROM 'https://raw.githubusercontent.com/<yourFileRepositoryPath>' AS row
RETURN row
LIMIT 20

Website

If the file is hosted on a website, Neo4j can access it there with a public URL. For example, in the Neo4j Cypher Manual, the LOAD CSV page uses a csv file on neo4j.com.

LOAD CSV FROM 'http://data.neo4j.com/northwind/products.csv' AS row
RETURN row

The same is true for any cloud providers storage

  • AWS S3

  • GCP Buckets

  • Azure Blob storage

You can upload the files there using your credentials and the UI or CLI and make them (temporarily) publicly accessible and then use the HTTPS URL of the file.

Google Sheets

You can access files uploaded to Google Sheets, if they are published to the web. Once the file is imported into a tab of a Google Sheet, you can follow the screenshots below to walk through the rest of the process.

The red boxes and numbering show where to click and what order to do the steps.

Step 1 - Publish to Web

image

Step 2 - Select Tab Name and Format CSV

image

Step 3 - Confirm Publication and Copy the Link

image

Step 4 - Run Cypher LOAD CSV command
LOAD CSV WITH HEADERS FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSx5-mHPUs7hQ3292zrLL_FeNzo85iC83TiezRcPl_SUv4NpW0e2VZilCUH9KbCWExAfE7OAELgdCW8/pub?gid=0&single=true&output=csv' AS row
RETURN row

Dropbox

A file uploaded to Dropbox works similarly to the process with Google Drive. Again, you will need to ensure permissions are set appropriately. Screenshots below step through the rest of the steps.

Step 1 - Review file permissions

image

Step 2 - Ensure permissions are open

image

Step 3 - Download Dropbox file

image

Step 4 - Go to browser downloads and copy link address

image

Step 5 - Run Cypher LOAD CSV command
LOAD CSV WITH HEADERS FROM 'https://<fileId>.dl.dropboxusercontent.com/cd/0/get/<yourFilePath>/file#' AS row
RETURN row

Local Neo4j Install and Neo4j Desktop

Your local Neo4j installations can of course also access the remote files via URL as described above.

If you want to import files from the local disk for privacy or performance reasons you have to place them into the import folder. It is generally located relative to your Neo4j server installation.

Those files can then be accessed via file:///filename.csv URLs, e.g.

LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
RETURN row

Neo4j Desktop

In Neo4j Desktop you can open the folder in your file-manager (explorer, finder, etc) via the UI by clicking on the "Open (Folder)" dropdown or menu.

Then place the files there and access them directly from Neo4j.

The CSV import developer guide walks through loading local CSV files to Neo4j Desktop.

Custom Import Folder

If you require a file location different from the default, you can update the following setting in the neo4j.conf file. We recommend specifying a directory path, rather than commenting out the setting, to avoid the security issue mentioned in the configuration comment.

# This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# `LOAD CSV` section of the manual for details.
dbms.directories.import=import

Resources

You can find the full list of file locations by operating system (does not include Sandbox) in the operations manual.

Andy Jefferson explores different methods of loading files (securely) into a remote Neo4j instance * Part 1 - ngrok and python webserver * Part 2 - Cloud Storage