Load Excel (XLS)

Library Requirements

For loading XLS we’re using the Apache POI library, which works well with old and new Excel formats, but is quite large. That’s why we decided not to include it into the apoc jar, but make it an optional dependency.

These dependencies are included in apoc-xls-dependencies-4.4.0.33.jar, which can be downloaded from the releases page. Once that file is downloaded, it should be placed in the plugins directory and the Neo4j Server restarted.

Alternatively, you can download these jars from Maven Repository (putting them into plugins directory as well):

For XLS files:

Usage

The usage of apoc.load.xls is similar to apoc.load.csv with the main difference the ability to select a worksheet or a range from a sheet to load.

You can either select the sheet by name like 'Kids', or offset like 'Results!B2:F3'

CALL apoc.load.xls({url}, {Name of sheet}, {config})

The {config} parameter is a map

name description

mapping

{mapping:{'<sheet>':{type:'<type>', dateFormat: '<format>', dateParse: [<formats>]}}}

<sheet>

name of the sheet

<type>

Default String, The type of the conversion requested (STRING, INTEGER, FLOAT, BOOLEAN, NULL, LIST, DATE, DATE_TIME, LOCAL_DATE, LOCAL_DATE_TIME, LOCAL_TIME, TIME)

dateFormat: <format>

Convert the Date into String (only String is allowed)

dateParse: [<formats>]

Convert the String into Date (Array of strings are allowed)

header

indicates if file has a header (default: true)

skipNulls

To map header value as "Empty__<NumberRow>" string in case of column header without value, and to map empty row (placed before the end of the Excel file content) as a Map with null values. The default is false, that is: in case of header it will fail with error java.lang.IllegalStateException: Header at position N doesn’t have a value, while in case of empty row it will stop the iteration

firstCellNum

to force the initial cell row number in the case of headers with one or more empty initial values. For example: emptyCell | emptyCell | valueFoo | valueBar | valueBaz. The default is Row.getFirstCellNum()

lastCellNum

to force the initial cell row number in the case of headers with one or more empty final values. For example: valueFoo | valueBar | valueBaz | emptyCell | emptyCell. The default is Row.getLastCellNum()

In dateParse the first format matched return the date formatted, otherwise it will return an error

In format config you can use the pattern describe as the Temporal functions: temporal functions

Examples for apoc.load.xls

CALL apoc.load.xls('file:///path/to/file.xls','Full',{mapping:{Integer:{type:'int'}, Array:{type:'int',array:true,arraySep:';'}}})
apoc.load.xls
CALL apoc.load.xls('http://bit.ly/2nXgHA2','Kids')

Some examples with type/dateFormat and dateParse:

CALL apoc.load.xls('test_date.xlsx','sheet',{mapping:{Date:{type:'String'}}})
apoc.load.xls 1
Figure 1. results
CALL apoc.load.xls('test_date.xlsx','sheet',{mapping:{Date:{type:'String',dateFormat:'iso_date'}}})
apoc.load.xls 2
Figure 2. results
CALL apoc.load.xls('test_date.xlsx','sheet',{mapping:{Date:{type:'String',dateParse:["wrongPath", "dd-MM-yyyy", "dd/MM/yyyy", "yyyy/MM/dd", "yyyy/dd/MM", "yyyy-dd-MM'T'hh:mm:ss"]}}})
apoc.load.xls 3
Figure 3. results