The APOC Spring Release


daisy

Since version 3.0 you can extend Neo4j with user defined procedures, functions and going forward also aggregate functions. About a year ago during the 3.0 milestone phase, I started to work on the first set of graph refactoring procedures. These evolved into the APOC library, which at the release of Neo4j 3.0 featured about 100 procedures, with Neo4j 3.1 about 250 procedures and functions and now reached about 300.

April is APOC Awareness Month

This month we reward articles that demonstrate how to use APOC to do cool stuff with Graphs, Neo4j and Cypher. Read all about it in the announcement blog post.

Releases

With the beginning of spring we gathered the contributions during the long winter nights and released 3 new versions for your pleasure.

You can find the releases for the different Neo4j versions here:

If you want to learn more about the exisiting APOC feature set, please visit the procedures-gallery on neo4j.com, the APOC documentation, :play https://guides.neo4j.com/apoc in your Browser or read the past blog articles on the topic.

New Feature Contributions

But let’s look at some of the new features since the last release in December:

Stefan Armbruster

Stefan Armbruster worked on automatizing the “manual” index updates, which you can enable with apoc.autoUpdate.enabled=true in your neo4j.conf. You also need a autoUpdate:true configuration setting in your manual index definition. He also added support for mixed content to apoc.load.xml, and provided the apoc.test.regexGroups functions for extracting parts of regular expressions.

Andrew Bowman

Andrew Bowman started his first contributions this month but already added:

  • apoc.coll functions: shuffle(), randomItem(), randomItems(), containsDuplicates(), duplicates(), duplicatesWithCount(), occurrences(), reverse()

  • apoc.path procedures: subgraphNodes(), subgraphAll(), and spanningTree()

  • apoc.date functions: convert() and add()

  • apoc.algo functions: cosineSimilarity(), euclideanDistance(), euclideanSimilarity()

  • Extended the capabilities for the apoc.path.expand procedure with new operators for filtering end nodes, limits, excluding start node from filters and more.

MATCH (p1:Employee)
MATCH (p2:Role {name:'Role 1-Analytics Manager'})
MATCH (sk:Skill)<-[y:REQUIRES_SKILL]-(p2)
OPTIONAL MATCH (p1)-[x:HAS_SKILL]->(sk)
WITH p1, p2,
     collect(coalesce(x.proficiency,0)) as xprof,
     collect(coalesce(y.proficiency,0)) as yprof
RETURN p1.name as name, 
       apoc.algo.cosineSimilarity(xprof, yprof) as cosineSim

Florent Biville

Florent Biville added a new feature to the procedure compiler, that allows us to generate the tabular information about procedures and functions automatically to be included in the documentation. That includes this really nice, searchable table at the beginning of the docs.

Tomaz Bratanic

Tomaz Bratanic submitted including a weight property as an improvement to the Gephi Streaming capability. He also wrote a really nice blog post about it.

The Larus Team

I’m also very happy to announce that our partner Larus BA from Vencice, Italy will support me going forward in working on APOC in a more focused manner. With the help of their team, we will take care of the open issues and feature requests and also add new cool stuff to APOC. They already addressed a number of issues which are included in this release. For example honoring Neo4j’s import directory configuration, handling ElasticSearch scroll results, and following redirects when loading from files.

Michael Hunger

I spent some time bugfixing (graphml export, TTL, setting array properties, more robust startup). I also worked on improving the documentation, now there are independent versions of the docs published for the different versions.

Something I wanted for a longer time is to improve the performance of apoc.periodic.iterate which is used for managing large scale updates or data creation with batched transactions. If you now provide iterateList:true it will execute the inner statement only once but with prepending an UNWIND. Prefixing your inner statement with WITH {foo} AS foo for each return value is also no longer necessary.

For conflicting queries, you can now for instance use retries:5. See also my blog post about performant updates with Cypher.

For quite a while I wanted to add json-path support to APOCs load.json procedure and the different json functions. Now this allows you to reach into a json document and pull out only the data you’re interested in:

Question authors from StackOverflow using json-path:
WITH "https://bit.ly/so_neo4j" AS url
CALL apoc.load.json(url,'$.items.*owner.display_name') YIELD value
UNWIND value.result as name
RETURN name, count(*)
ORDER BY count(*) DESC LIMIT 5;

Bitwise operations were turned into a function I added apoc.text.format, .lpad, .rpad and added new functions for creating virtual nodes and relationships. Some missing procedures for updating/removing labels, properties, and relationships were also added. I also added support for gzipped streams for load csv and load xml, in the future we want to add more protocols here, e.g. “hdfs://” and allow URLs to follow redirects, so stay tuned.

If you have any feedback to existing functionality, bug reports of feature requests, please let us know by issuing them to the repository.

And if you like APOC please don’t forget to star it on GitHub 🙂

Cheers, Michael

PS: All the new Neo4j Sandboxes also come with APOC installed.

PPS: And don’t forget to write and publish for the APOC Awareness Month Challenge.