Loading Data from Web-APIs

Supported protocols are file, http, https, s3, gs, hdfs with redirect allowed.

If no procedure is provided, this procedure will try to check whether the URL is actually a file.

As apoc.import.file.use_neo4j_config is enabled, the procedures check whether file system access is allowed and possibly constrained to a specific directory by reading the two configuration parameters dbms.security.allow_csv_import_from_file_urls and dbms.directories.import respectively. If you want to remove these constraints please set apoc.import.file.use_neo4j_config=false

CALL apoc.load.json('http://example.com/map.json', [path], [config]) YIELD value as person CREATE (p:Person) SET p = person

load from JSON URL (e.g. web-api) to import JSON as stream of values if the JSON was an array or a single value if it was a map

CALL apoc.load.xml('http://example.com/test.xml', ['xPath'], [config]) YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _children fields.

CALL apoc.load.xmlSimple('http://example.com/test.xml') YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text fields and _<childtype> collections per child-element-type.

CALL apoc.load.csv('url',{sep:";"}) YIELD lineNo, list, strings, map, stringMap

load CSV fom URL as stream of values
config contains any of: {skip:1,limit:5,header:false,sep:'TAB',ignore:['aColumn'],arraySep:';',results:['map','list','strings','stringMap'],
nullValues:[''],mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false,nullValues:['n.A.']}}

CALL apoc.load.xls('url','Sheet'/'Sheet!A2:B5',{config}) YIELD lineNo, list, map

load XLS fom URL as stream of values
config contains any of: {skip:1,limit:5,header:false,ignore:['aColumn'],arraySep:';'+ nullValues:[''],mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false,nullValues:['n.A.']}}

Load Single File From Compressed File (zip/tar/tar.gz/tgz)

When loading data from compressed files, we need to put the ! character before the file name. For example:

Loading a compressed CSV file
apoc.load.csv("pathToFile!csv/fileName.csv.tgz")
Loading a compressed JSON file
apoc.load.json("https://github.com/neo4j-contrib/neo4j-apoc-procedures/tree/3.4/src/test/resources/testload.tgz?raw=true!person.json");

Using S3 protocol

When using the S3 protocol we need to download and copy the following jars into the plugins directory:

Once those files have been copied we’ll need to restart the database.

The S3 URL must be in the following format:

  • s3://accessKey:secretKey@endpoint:port/bucket/key or

  • s3://endpoint:port/bucket/key?accessKey=accessKey&secretKey=secretKey

Using Google Cloud Storage

In order to use Google Cloud Storage you need that the following jars are included into the plugin dir:

  • api-common-1.8.1.jar

  • failureaccess-1.0.1.jar

  • gax-1.48.1.jar

  • gax-httpjson-0.65.1.jar

  • google-api-client-1.30.2.jar

  • google-api-services-storage-v1-rev20190624-1.30.1.jar

  • google-auth-library-credentials-0.17.1.jar

  • google-auth-library-oauth2-http-0.17.1.jar

  • google-cloud-core-1.90.0.jar

  • google-cloud-core-http-1.90.0.jar

  • google-cloud-storage-1.90.0.jar

  • google-http-client-1.31.0.jar

  • google-http-client-appengine-1.31.0.jar

  • google-http-client-jackson2-1.31.0.jar

  • google-oauth-client-1.30.1.jar

  • grpc-context-1.19.0.jar

  • guava-28.0-android.jar

  • opencensus-api-0.21.0.jar

  • opencensus-contrib-http-util-0.21.0.jar

  • proto-google-common-protos-1.16.0.jar

  • proto-google-iam-v1-0.12.0.jar

  • protobuf-java-3.9.1.jar

  • protobuf-java-util-3.9.1.jar

  • threetenbp-1.3.3.jar

But we prepared an uber-package in order simplify the process, you can download it from here and place it into the plugin directory

You can use Google Cloud storage via the following url format:

gs://<bucket_name>/<file_path>

moreover you can also define the authorization type:

Ex:

gs://andrea-bucket-1/test-privato.csv?authenticationType=SERVICE

failOnError

Adding the config parameter failOnError:false (by default true), will mean that in the case of an error the procedure will not fail, but just return zero rows.