We also crossed 500 GitHub stars, thanks everyone for giving us a nod!
What’s New in the Latest APOC Release
Image: Warner Bros.
If you haven’t used APOC yet, you have one less excuse: it just became much easier to try. In Neo4j Desktop, just navigate to the Plugins tab of your Manage Database view and click “Install” for APOC. Then your database is restarted, and you’re ready to rock.APOC wouldn’t be where it is today without the countless people contributing, reporting ideas and issues and everyone telling their friends. Please keep up the good work.
I also added a code of conduct and contribution guidelines to APOC, so every contributor feels welcome and safe and also quickly knows how to join our efforts.
For this release again, our friends at LARUS BA did a lot of the work. Besides many bugfixes, Angelo Busato also added S3 URL support, which is really cool. Andrea Santurbano also worked on the HDFS support (read / write).
With these, you can use S3 and HDFS URLs in every procedure that loads data, like
apoc.load.json/csv/xml/graphml
, apoc.cypher.runFile
, etc. Writing to HDFS is possible with all the export functions, like apoc.export.cypher/csv/graphml
.Andrew Bowman worked on a number of improvements around path expanders, including:
- Added support for repeating sequences of labels and/or rel-types to express more complex paths
- Support for known end nodes (instead of end nodes based only on labels)
- Support for compound labels (such as
:Person:Manager
)
Aggregation Functions
I wanted to add aggregation functions all the way back to Neo4j 3.2 after Pontus added the capability, but I just never got around to it. Below is one of the patterns that we used to use to get the first (few) elements of a collect, which is quite inefficient because the full collect list is built up even if you’re just interested in the first element:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p,m ORDER BY m.released RETURN p, collect(m)[0] as firstMovie
Now you can just use:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie) WITH p,m ORDER BY m.released RETURN p, apoc.agg.first(m) as firstMovie
There are also some more statistics functions, including
apoc.agg.statistics
which computes all at once and returns a map with: {min,max,sum,median,avg,stdev}
. The other statistics functions include:
- More efficient variants of
collect(x)[a..b]
apoc.agg.nth
,apoc.agg.first
,apoc.agg.last
,apoc.agg.slice
apoc.agg.median(x)
apoc.agg.percentiles(x,[0.5,0.9])
apoc.agg.product(x)
apoc.agg.statistics()
provides a full numeric statistic
Indexing
Implemented an idea of my colleague Ryan Boyd to allow indexing of full “documents”, i.e. map-structures per node or relationship that can also contain information from the neighborhood or computed data. Later, those can be searched as keys and values of the indexed data.
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) WITH p, p {.name, .age, roles:r.roles, movies collect(m.title) } as doc CALL apoc.index.addNodeMap(p, doc);
Then, later you can search:
CALL apoc.index.nodes('Person','name:K* movies:Matrix roles:Neo'); apoc.index.addNodeMap(node, {map}) apoc.index.addRelationshipMap(node, {map})
As part of that work, I also wanted to add support for deconstructing complex values or structs, such as:
apoc.map.values
to select the values of a subset of keys into a mixed type listapoc.coll.elements
is used to deconstruct a sublist into typed variables (this can also be done withWITH
, but requires an extra declaration of the list to be concise)
RETURN apoc.map.values({a:'foo', b:42, c:true}, ["a","c"]) -> ['foo', true] CALL apoc.coll.elements([42, 'foo', person]) YIELD _1i as answer, _2s as name, _3n as person
Path Expander Sequences
You can now define repeating sequences of node labels or relationship types during expansion, just use commas in the
relationshipFilter
and labelFilter
config parameters to separate the filters that should apply for each step in the sequence.relationshipFilter:'OWNS_STOCK_IN>, <MANAGES, LIVES_WITH>|MARRIED_TO>|RELATED'
The above will continue traversing only the given sequence of relationships.
labelFilter:'Person|Investor|-Cleared, Company|>Bank|/Government:Company'
All filter types are allowed in label sequences. The above repeats a sequence of a
:Person
or :Investor
node (but not with a :Cleared
label), and then a :Company
, :Bank
, or :Government:Company
node (where :Bank
nodes will act as end nodes of an expansion, and :Government:Company
nodes will act as end nodes and terminate further expansion).sequence:'Person|Investor|-Cleared, OWNS_STOCK_IN>, Company|>Bank|/Government:Company, <MANAGES, LIVES_WITH>|MARRIED_TO>|RELATED'
The new
sequence
config parameter above lets you define both the label filters and relationship filters to use for the repeating sequence (and ignores labelFilter
and relationshipFilter
if present).Path Expansion Improvements
- Compound labels (like
Person:Manager
) allowed in the label filter, applying only to nodes with all of the given labels. endNodes
andterminatorNodes
config parameters, for supplying a list of the actual nodes that should end each path during expansion (terminatorNodes
end further expansion down the path,endNodes
allow expansion to continue)- For
labelFilter
, the whitelist symbol+
is now optional. Lack of a symbol is interpreted as a whitelisted label. - Some minor behavioral changes to the end node
>
and termination node/
filters, specifically when it comes to whitelisting and behavior when belowminLevel
depth.
Path Functions
(This one came from a request in neo4j.com/slack.)
apoc.path.create(startNode, [rels])
apoc.path.slice(path, offset, length)
apoc.path.combine(path1, path2)
MATCH (a:Person)-[r:ACTED_IN]->(m) ... MATCH (m)<-[d:DIRECTED]-() RETURN apoc.path.create(a, r, d) as path MATCH path = (a:Roo)<-[:PARENT_OF*..10]-(leaf) RETURN apoc.path.slice(path, 2,5) as subPath MATCH firstLeg = shortestPath((start:City)-[:ROAD*..10]-(stop)), secondLeg = shortestPath((stop)-[:ROAD*..10]->(end:City)) RETURN apoc.path.combine(firstLeg, secondLeg) as route
Text Functions
apoc.text.code(codepoint)
,apoc.text.hexCharAt()
,apoc.text.charAt()
(thanks to Andrew Bowman)apoc.text.bytes/apoc.text.byteCount
(thanks to Jonatan for the idea)apoc.text.toCypher(value, {})
for generating valid Cypher representations of nodes, relationships, paths and values- Sørensen–Dice similarity (thanks Florent Biville)
- Roman <-> Arabic conversions (thanks Marcin Cylke)
- New email and domain extraction functions (thanks David Allen)
Data Integration
- Generic XML import with
apoc.import.xml()
(thanks Stefan Armbruster) - Pass Cypher parameters to
apoc.export.csv.query
- MongoDB integration (Thanks Gleb Belokrys)
- Added paging parameter in the get and find procedure
stream apoc.export.cypher
script export back to client when no file name is givenapoc.load.csv
- Handling of converted
null
values and/ornull
columns - explicit
nullValues
option to define values that will be replaced bynull
(global and per field) - explicit
results
option to determine which output columns are provided
Collection Functions
apoc.coll.combinations()
,apoc.coll.frequencies()
(Thanks Andrew)- Update/remove/insert value at collection index (Thanks Brad Nussbaum)
Graph Refactoring
- Per property configurable merge strategy for
mergeNodes
- Means to skip properties for
cloneNodes
Other Additions
- Added
apoc.date.field
UDF
apoc.load.jdbc
(type conversion, connection handling, logging)apoc.refactor.mergeNodes
apoc.cypher.run*
apoc.schema.properties.distinctCount
- Composite indexes in Cypher export
- ElasticSearch integration for ES 6
- Made larger parts of APOC not require the unrestricted configuration
apoc.json.toTree
(alsoconfig
for relationship-name casing)- Warmup improvements (dynamic properties, rel-group)
- Compound index using
apoc.schema.assert
(thanks Chris Skardon) - Explicit index reads don’t require read-write-user
- Enable parsing of lists in GraphML import (thanks Alex Wilson)
- Change
CYPHER_SHELL
format from upper case to lower case. (:begin
,:commit
) - Allowed
apoc.node.degree()
to use untyped directions (thanks Andrew)
Feedback
As always, we’re very interested in your feedback, so please try out the new APOC releases, and let us know if you like them and if there are any issues.
Please refer to the documentation or ask in neo4j-users Slack in the #neo4j-apoc channel if you have any questions.
Enjoy the new release(s)!
Get My Free Copy