What’s Waiting for You in the Latest Release of the APOC Library [March 2018]


The last release of APOC library was just before GraphConnect New York, and in the meantime quite a lot of new features made their way into our little standard library.

We also crossed 500 GitHub stars, thanks everyone for giving us a nod!

What’s New in the Latest APOC Release


Learn about the March 2018 release of the APOC library of user-defined procedures and functions built for Neo4j Desktop

Image: Warner Bros.

If you haven’t used APOC yet, you have one less excuse: it just became much easier to try. In Neo4j Desktop, just navigate to the Plugins tab of your Manage Database view and click “Install” for APOC. Then your database is restarted, and you’re ready to rock.

APOC wouldn’t be where it is today without the countless people contributing, reporting ideas and issues and everyone telling their friends. Please keep up the good work.

I also added a code of conduct and contribution guidelines to APOC, so every contributor feels welcome and safe and also quickly knows how to join our efforts.

For this release again, our friends at LARUS BA did a lot of the work. Besides many bugfixes, Angelo Busato also added S3 URL support, which is really cool. Andrea Santurbano also worked on the HDFS support (read / write).

With these, you can use S3 and HDFS URLs in every procedure that loads data, like apoc.load.json/csv/xml/graphml, apoc.cypher.runFile, etc. Writing to HDFS is possible with all the export functions, like apoc.export.cypher/csv/graphml.

Andrew Bowman worked on a number of improvements around path expanders, including:
    • Added support for repeating sequences of labels and/or rel-types to express more complex paths
    • Support for known end nodes (instead of end nodes based only on labels)
    • Support for compound labels (such as :Person:Manager)
I also found some time to code and added a bunch of things. 🙂

Aggregation Functions


I wanted to add aggregation functions all the way back to Neo4j 3.2 after Pontus added the capability, but I just never got around to it. Below is one of the patterns that we used to use to get the first (few) elements of a collect, which is quite inefficient because the full collect list is built up even if you’re just interested in the first element:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p,m ORDER BY m.released
RETURN p, collect(m)[0] as firstMovie

Now you can just use:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH p,m ORDER BY m.released
RETURN p, apoc.agg.first(m) as firstMovie

There are also some more statistics functions, including apoc.agg.statistics which computes all at once and returns a map with: {min,max,sum,median,avg,stdev}. The other statistics functions include:
    • More efficient variants of collect(x)[a..b]
    • apoc.agg.nth, apoc.agg.first, apoc.agg.last, apoc.agg.slice
    • apoc.agg.median(x)
    • apoc.agg.percentiles(x,[0.5,0.9])
    • apoc.agg.product(x)
    • apoc.agg.statistics() provides a full numeric statistic

Indexing


Implemented an idea of my colleague Ryan Boyd to allow indexing of full “documents”, i.e. map-structures per node or relationship that can also contain information from the neighborhood or computed data. Later, those can be searched as keys and values of the indexed data.

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WITH p, p {.name, .age, roles:r.roles, movies collect(m.title) } as doc
CALL apoc.index.addNodeMap(p, doc);

Then, later you can search:

CALL apoc.index.nodes('Person','name:K* movies:Matrix roles:Neo');
apoc.index.addNodeMap(node, {map})
apoc.index.addRelationshipMap(node, {map})

As part of that work, I also wanted to add support for deconstructing complex values or structs, such as:
    • apoc.map.values to select the values of a subset of keys into a mixed type list
    • apoc.coll.elements is used to deconstruct a sublist into typed variables (this can also be done with WITH, but requires an extra declaration of the list to be concise)
RETURN apoc.map.values({a:'foo', b:42, c:true}, ["a","c"]) -> ['foo', true]

CALL apoc.coll.elements([42, 'foo', person]) 
YIELD _1i as answer, _2s as name, _3n as person

Path Expander Sequences


You can now define repeating sequences of node labels or relationship types during expansion, just use commas in the relationshipFilter and labelFilter config parameters to separate the filters that should apply for each step in the sequence.

relationshipFilter:'OWNS_STOCK_IN>, <MANAGES, LIVES_WITH>|MARRIED_TO>|RELATED'

The above will continue traversing only the given sequence of relationships.

labelFilter:'Person|Investor|-Cleared, Company|>Bank|/Government:Company'

All filter types are allowed in label sequences. The above repeats a sequence of a :Person or :Investor node (but not with a :Cleared label), and then a :Company, :Bank, or :Government:Company node (where :Bank nodes will act as end nodes of an expansion, and :Government:Company nodes will act as end nodes and terminate further expansion).

sequence:'Person|Investor|-Cleared, OWNS_STOCK_IN>, Company|>Bank|/Government:Company,
         <MANAGES, LIVES_WITH>|MARRIED_TO>|RELATED'

The new sequence config parameter above lets you define both the label filters and relationship filters to use for the repeating sequence (and ignores labelFilter and relationshipFilter if present).

Path Expansion Improvements


    • Compound labels (like Person:Manager) allowed in the label filter, applying only to nodes with all of the given labels.
    • endNodes and terminatorNodes config parameters, for supplying a list of the actual nodes that should end each path during expansion (terminatorNodes end further expansion down the path, endNodes allow expansion to continue)
    • For labelFilter, the whitelist symbol + is now optional. Lack of a symbol is interpreted as a whitelisted label.
    • Some minor behavioral changes to the end node > and termination node / filters, specifically when it comes to whitelisting and behavior when below minLevel depth.

Path Functions


(This one came from a request in neo4j.com/slack.)

    • apoc.path.create(startNode, [rels])
    • apoc.path.slice(path, offset, length)
    • apoc.path.combine(path1, path2)
MATCH (a:Person)-[r:ACTED_IN]->(m)
...
MATCH (m)<-[d:DIRECTED]-()
RETURN apoc.path.create(a, r, d) as path

MATCH path = (a:Roo)<-[:PARENT_OF*..10]-(leaf)
RETURN apoc.path.slice(path, 2,5) as subPath

MATCH firstLeg = shortestPath((start:City)-[:ROAD*..10]-(stop)),
             secondLeg = shortestPath((stop)-[:ROAD*..10]->(end:City))
RETURN apoc.path.combine(firstLeg, secondLeg) as route

Text Functions


    • apoc.text.code(codepoint), apoc.text.hexCharAt(), apoc.text.charAt() (thanks to Andrew Bowman)
    • apoc.text.bytes/apoc.text.byteCount (thanks to Jonatan for the idea)
    • apoc.text.toCypher(value, {}) for generating valid Cypher representations of nodes, relationships, paths and values
    • Sørensen–Dice similarity (thanks Florent Biville)
    • Roman <-> Arabic conversions (thanks Marcin Cylke)
    • New email and domain extraction functions (thanks David Allen)

Data Integration


    • Generic XML import with apoc.import.xml() (thanks Stefan Armbruster)
    • Pass Cypher parameters to apoc.export.csv.query
    • MongoDB integration (Thanks Gleb Belokrys)
      • Added paging parameter in the get and find procedure
    • stream apoc.export.cypher script export back to client when no file name is given
    • apoc.load.csv
      • Handling of converted null values and/or null columns
      • explicit nullValues option to define values that will be replaced by null (global and per field)
      • explicit results option to determine which output columns are provided

Collection Functions


    • apoc.coll.combinations(), apoc.coll.frequencies() (Thanks Andrew)
    • Update/remove/insert value at collection index (Thanks Brad Nussbaum)

Graph Refactoring


    • Per property configurable merge strategy for mergeNodes
    • Means to skip properties for cloneNodes

Other Additions


    • Added apoc.date.field UDF
Other bugfixes in this release of the APOC library include:
    • apoc.load.jdbc (type conversion, connection handling, logging)
    • apoc.refactor.mergeNodes
    • apoc.cypher.run*
    • apoc.schema.properties.distinctCount
    • Composite indexes in Cypher export
    • ElasticSearch integration for ES 6
    • Made larger parts of APOC not require the unrestricted configuration
    • apoc.json.toTree (also config for relationship-name casing)
    • Warmup improvements (dynamic properties, rel-group)
    • Compound index using apoc.schema.assert (thanks Chris Skardon)
    • Explicit index reads don’t require read-write-user
    • Enable parsing of lists in GraphML import (thanks Alex Wilson)
    • Change CYPHER_SHELL format from upper case to lower case. (:begin,:commit)
    • Allowed apoc.node.degree() to use untyped directions (thanks Andrew)

Feedback


As always, we’re very interested in your feedback, so please try out the new APOC releases, and let us know if you like them and if there are any issues.

Please refer to the documentation or ask in neo4j-users Slack in the #neo4j-apoc channel if you have any questions.

Enjoy the new release(s)!


Take a deeper dive into the world of graph algorithms: Read this white paper – Optimized Graph Algorithms in Neo4j – and learn how to harness graph algorithms to tackle your toughest connected data challenge.

Get My Free Copy