APOC Hidden Gems: What You Could Have Missed

CTO, Larus BA Italy
5 min read

- Transform MongoDB collections automagically into Graphs
- Efficient Neo4j Data Import Using Cypher-Scripts
apoc.load.directory
Sometimes you may need to ingest files into sub-directories with the same business rules (= cypher query), so when in the past you needed to execute the query every time for each specific file now you can leverage the apoc.load.directory procedure order to search recursively into a directory, so instead of doing this:WITH 'path/to/directory/' AS baseUrl CALL apoc.load.csv(baseUrl + 'myfile.csv', {results:['map']}) YIELD map MERGE (p:Person{id: map.id}) SET p.name = map.name RETURN count(p);
WITH 'path/to/directory/sub/' AS baseUrl CALL apoc.load.csv(baseUrl + 'myfile1.csv', {results:['map']}) YIELD map MERGE (p:Person{id: map.id}) SET p.name = map.name RETURN count(p);
WITH 'path/to/directory/sub/sub1/' AS baseUrl CALL apoc.load.csv(baseUrl + 'myfile2.csv', {results:['map']}) YIELD map MERGE (p:Person{id: map.id}) SET p.name = map.name RETURN count(p);You can simply do this:
CALL apoc.load.directory('*.csv', 'path/to/directory', {recursive: true}) YIELD value AS url CALL apoc.load.csv(url, {results:['map']}) YIELD map MERGE (p:Person{id: map.id}) SET p.name = map.name RETURN count(p)Very nice right?! As you can see the procedure takes 3 parameters:
- The file name pattern
- The root directory
- A configuration map that at this very moment (May 2021) manages just one property recursive=true/false that allows searching recursively from the root directory
apoc.load.directory.async.*
Please raise the hand if you hate Cron and other similar mechanisms! Unfortunately, they are quite used when you need to ingest data from files, but in order to overcome this, we build a “proactive” feature that allows you to define a listener into the File System that gets executed every time a file is created/updated and deleted. These procedures are extremely helpful when you’re ingesting data via files like csv, json, graphML, cypher and so on because they don’t rely on a Cron job, but they are constantly active; this has several upsides because if a job fail for some reason (network issues that block the upload of the file into the import dir, and so on…) you don’t need to wait for next cron loop, you can upload the file whenever you want, and the ingestion starts right after! It works in a very similar way to APOC Triggers so the first thing to do is define the listener:CALL apoc.load.directory.async.add( 'insert_person', ' CALL apoc.load.csv($filePath) yield list CALL apoc.load.csv($filePath, {results:['map']}) YIELD map MERGE (p:Person{id: map.id}) SET p.name = map.name ', '*.csv', 'path/to/directory', { listenEventType: ['CREATE', 'MODIFY'] })As you can see the procedure takes 5 parameters:
- The listener name
- The cypher query that has to be executed every time the listener catch a new event
- The file name pattern
- The root directory
- A configuration map that at this very moment (May 2021) manages just one property listenEventType=CREATE/MODIFY/DELETE that defines the type of the file system event that you want to listen
- fileName: the name of the file which triggered the event
- filePath: the absolute path of the file which triggered the event if apoc.import.file.use_neo4j_config=false, otherwise the relative path starting from Neo4j’s Import Directory
- fileDirectory: the absolute path directory of the file which triggered the event if apoc.import.file.use_neo4j_config=false, otherwise the relative path starting from Neo4j’s Import Directory
- listenEventType: the triggered event (“CREATE”, “DELETE” or “MODIFY”). The event “CREATE” happens when a file is inserted in the folder, “DELETE” when a file is removed from the folder and “MODIFY” when a file in the folder is changed. (n.b note that if a file is renamed, will be triggered 2 events, that is first “DELETE” and then ”CREATE”)
- apoc.load.directory.async.list(): that returns a list of all running listeners
- apoc.load.directory.async.remove(‘<listner_name>’): that removes one listener
- apoc.load.directory.async.removeAll(): that removes all the listeners
apoc.periodic.truncate
The procedure is useful when you’re in the prototyping phase and you’re defining your graph model or your ingestion strategies because it allows you very easily to wipe the entire database:CALL apoc.periodic.truncate({dropSchema: true})As you can see we manage a configuration map that at this very moment (May 2021) manages just one property dropSchema=true/false that eventually drops indexes and constraints.
apoc.util.(de)compress
Sometimes you may need to store large string values into Node or Relationships, in this case we added two procedures in order to (de)compress between a string to a compressed byte array data in a very easy way. You can compress the data in the following way:CREATE (p:Person{name: event.name, bigStringCompressed: apoc.util.compress(event.bigString, { compression: 'FRAMED_SNAPPY'})})You can decompress the data in the following way:
MATCH (p:Person{name: 'John') RETURN apoc.util.decompress(p.bigStringCompressed, { compression: 'FRAMED_SNAPPY'}) AS bigStringSupported compression types are:
- GZIP
- BZIP2
- DEFLATE
- BLOCK_LZ4
- FRAMED_SNAPPY
Other Minor Changes
We did tons of minor changes, following some of them:- We updated the Couchbase procedures to the last driver version in order to have state-of-the support for the database.
- We added stats for the computation of apod.periodic.iterate, now you’ll get the number affected entities and properties by the procedure.
- Now UUID and TTL features work per database .
The Road Ahead
APOC growth is not stopping! We’re working hard in order to add new cool features that you can leverage in the near future:- Support for read/write the Apache Arrow file format!
- Support for read/write data from/to Redis!
APOC Hidden Gems: What You Could Have Missed was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.