Load CSV

If we have large files locally or on http(s)/s3/gcp storage that you want to import into neo4j, we can use this procedure that can provide a URL with an offset that can just read a string from a file (e.g. CSV). We can also set an optional limit as the 3rd parameter, otherwise (with 0) it read until the end.

apoc.load.stringPartial(urlOrBinary :: ANY?, offset :: LONG, limit :: LONG = 0, config = {} :: MAP?) :: (value :: STRING?)

For reading from files you’ll have to enable the config option:

apoc.import.file.enabled=true

The procedure support the following config parameters:

Table 1. Config parameters
name	type	default	description
headers	Map<String, Object>	Empty map	Additional headers to be added or replaced to the default
archiveLimit	int	1024102410 (10MB)	Size limit to locate ZIP entries and buffers
bufferLimit	int	1024102410 (10MB)	Buffer read limit
compression	Enum[NONE, GZIP, BZIP2, DEFLATE, BLOCK_LZ4, FRAMED_SNAPPY]	NONE	Set the compression algorithm used, in case of a byte array reading.

Usage examples

We can read a portion of a string from a local file URL, a remote URL (i.e. http(s)/gcp/S3/Azure/Hdfs), a local/remote file placed in an archive, or a byte array. It’s useful compared to other load procedures since the access into the file is handled more efficiently, not an openStream and a read to location.

That is:

in case of a local file under-the-hood an RandomAccessFile will be created.
in case of http(s) URL we will put an additional HTTP header Range: bytes=<offset>, while in case of limit set it will be Range: bytes=<offset>-<httpLimit>, where httpLimit is equal to offset + limit - 1.
in case of S3 location, an GetObjectRequest.range() will be used.
in the other cases, we will execute an InputStream.skip()

If we have the following CSV file

test.csv

name,age,beverage
Selma,9,Soda
Rana,12,Tea;Milk
Selina,19,Cola

We can execute:

CALL apoc.load.stringPartial("path/to/localfile/test.csv", 17, 15)

Table 2. Results
value
Rana,11 Selina,

Or also, without limit set:

CALL apoc.load.stringPartial("path/to/localfile/test.csv", 17)

Table 3. Results
value
Rana,11 Selina,18

We can read in the same way and with a similar result from a remote URL, for example:

CALL apoc.load.stringPartial("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/refs/heads/dev/extended/src/test/resources/test.csv", 17)

We can also read from an archive file, using the syntax <pathToArchive>!<fileToRead>, for example:

CALL apoc.load.stringPartial("https://www3.stats.govt.nz/2018census/Age-sex-by-ethnic-group-grouped-total-responses-census-usually-resident-population-counts-2006-2013-2018-Censuses-RC-TA-SA2-DHB.zip!Data8277.csv", 17)

Or also from a byte array, optionally setting the compression type (default 'NONE', that is not compressed), for example using the apoc.util.compress (placed in APOC Core):

WITH apoc.util.compress('testFooBar', {compression: 'DEFLATE'}) AS compressed
CALL apoc.load.stringPartial(compressed, 5, 17, {compression: 'DEFLATE'}) YIELD value RETURN value

Table 4. Results
value
testFooBar