Knowledge Base

Load CSV data in Neo4j from CSV files on Amazon S3 Bucket

Neo4j provides LOAD CSV cypher command to load data from CSV files into Neo4j or access CSV files via HTTPS, HTTP and FTP. But how do you load data from CSV files available on AWS S3 bucket as access to files requires login to AWS account and have file access? That is possible by making use of presign URL for the CSV file on S3 bucket.

We will quickly walk through on how to create a presign URL for a file on AWS S3 bucket. We will need aws command line utility for it. Once the aws command line utility is installed, setup the aws command line using aws configure command.

Rohans-MacBook-Pro-2:bin rohankharwar$ aws configure
AWS Access Key ID [****************KSRQ]:
AWS Secret Access Key [****************t9gZ]:
Default region name [us-east]: us-east-2
Default output format [None]:

For this example the actors.csv file is available in rohank S3 bucket. Run the below command to create the presign URL for actors.csv file.

$ aws s3 presign s3://rohank/actors.csv

Then use the URL to access the file from S3 bucket using LOAD CSV as

LOAD CSV WITH HEADERS FROM "https://rohank.s3.amazonaws.com/actors.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAICM6A3RO53KOKSRQ%2F20190404%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20190404T215301Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=61cb485af12daa60bb8cb7a91fb503797311c8e178d9bfa3c7ff49770e4535b5" as row return count(row)