Py2neo v4: The Next Generation

Nigel Small, a Neo4j Engineer

Nigel Small

Neo4j Drivers Team Lead

As both a busy dad and the team lead of the Neo4j Drivers Team, the amount of spare time I’ve had to work on personal projects over the past couple of years has been somewhat reduced. But with all the interest by Python users in keeping opens in new tabPy2neo going, I couldn’t resist starting work on it again. So I’m delighted to finally be able to deliver a brand new major release with a number of shiny new features and improvements. Please welcome Py2neo v4!

As usual you can find the library at the opens in new tabPython Package Index and can install it with pip (pip install py2neo). Take a look at the opens in new tabcomprehensive documentation and if you encounter any problems, shout out in the #neo4j-python channel in the opens in new tabneo4j-users Discord.

Py2neo now wraps the 1.6 release of the opens in new tabofficial Python driver, which takes care of all the low-level, nitty-gritty, binary-winery(?) things a database driver needs to handle. This allows Py2neo to focus on higher-level features and proper pythonic API and integrations.

Grabbing a Graph

As with every previous version of Py2neo, the main way into the library is through the opens in new tabGraph. The constructor can accept a range of settings and the code behind this has now been completely overhauled to fix several former issues, such as a failure to recognise custom port numbers. The default protocol is also now Bolt, rather than HTTP, and the hard requirement on HTTP has been completely removed.

So, to get connected, simply create a opens in new tabGraph object:

>>> from py2neo import Graph
>>> graph = Graph("bolt://myserver:7687", auth=("neo4j", "psswrd"))

Along with a regular connection URI, the full set of settings accepted by the Graph constructor is as follows:

  • auth – a 2-tuple of (user, password)
  • user
  • password
  • secure (boolean flag)
  • scheme
  • host
  • port
  • user_agent

API Overview

Py2neo exposes several logical layers of API on top of the official Python driver. The lowest level Cypher API provides Cypher execution facilities very similar to those in the driver, but with a few extras such as coercion to a opens in new tabTable object:

>>> graph.run("MATCH (a:Person) 
               RETURN a.name, a.born LIMIT 3").to_table()
 a.name             | a.born 
--------------------|--------
 Laurence Fishburne |   1961 
 Hugo Weaving       |   1960 
 Lilly Wachowski    |   1967
>>> graph.evaluate("MATCH (a:Person) RETURN count(a)")
142

The next level up, the Entity API, wraps Cypher in convenience functions that provide a full set of CRUD operations on opens in new tabNode and Relationship objects. This can make for clearer application code at the expense of fine-grained control. The opens in new tabNodeMatcher, for example, constructs and executes a Cypher MATCH statement and returns opens in new tabNode objects:

>>> [(a["name"], a["born"])
     for a in graph.nodes.match("Person").limit(3)]
[('Laurence Fishburne', 1961),
 ('Hugo Weaving', 1960),
 ('Lilly Wachowski', 1967)]

Other Entity API methods include opens in new tabGraph.create, opens in new tabGraph.delete and opens in new tabGraph.merge (as well as similar opens in new tabtransactional variants). Note that Graph.merge has now been completely rewritten to use Cypher’s UNWIND clause internally. This addresses some previous performance issues for the method when used at scale.

The topmost level of API is Py2neo’s opens in new tabOGM API. This allows creation of opens in new tabGraphObjects that wrap nodes in native classes and provide attributes to model their relationships and properties.

>>> from py2neo.ogm import GraphObject, Property
>>> class Person(GraphObject):
...     name = Property()
...     born = Property()
...
>>> [(a.name, a.born) for a in Person.match(graph).limit(3)]
[('Laurence Fishburne', 1961),
 ('Hugo Weaving', 1960),
 ('Lilly Wachowski', 1967)]

More about Matchers

The old py2neo.selection module has been renamed to opens in new tabpy2neo.matching and the NodeSelector is now called opens in new tabNodeMatcher. There’s also a new opens in new tabRelationshipMatcher, which is an evolution of the old opens in new tabGraph.match method implementation.

A opens in new tabNodeMatcher offers a DSL that can be used to locate nodes which fulfil a specific set of criteria. Typically, a single node can be identified passing a specific label and property key-value pair. However, any number of labels and any condition supported by the Cypher WHERE clause is allowed.

For a simple match by label and property use the opens in new tabmatch method:

>>> graph.nodes.match("Person", name="Keanu Reeves").first()
(_224:Person {born:1964,name:"Keanu Reeves"})

For a more comprehensive match using Cypher expressions, the opens in new tabwhere method can be used to further refine the selection. Here, the underscore character can be used to refer to the nodes being filtered:

>>> list(matcher.match("Person").where("_.name =~ 'K.*'"))
[(_57:Person {born: 1957, name: 'Kelly McGillis'}),
 (_80:Person {born: 1958, name: 'Kevin Bacon'}),
 (_83:Person {born: 1962, name: 'Kelly Preston'}),
 (_224:Person {born: 1964, name: 'Keanu Reeves'}),
 (_226:Person {born: 1966, name: 'Kiefer Sutherland'}),
 (_243:Person {born: 1957, name: 'Kevin Pollak'})]

Orders and limits can also be applied:

>>> list(matcher.match("Person").where("_.name =~ 'K.*'")
    .order_by("_.name").limit(3))
[(_224:Person {born: 1964, name: 'Keanu Reeves'}),
 (_57:Person {born: 1957, name: 'Kelly McGillis'}),
 (_83:Person {born: 1962, name: 'Kelly Preston'})]

And if only a count of matched entities is required, the length of a match can be evaluated:

>>> len(matcher.match("Person").where("_.name =~ 'K.*'"))
6

The underlying query is only evaluated when the selection undergoes iteration or when a specific evaluation method is called (such as with opens in new tabfirst). This means that a opens in new tabNodeMatch instance may be reused before and after data changes for different results.

Relationship matching is similar:

>>> keanu = graph.nodes.match("Person", name="Keanu Reeves").first()
>>> list(graph.relationships.match((keanu, None), "ACTED_IN") 
              .limit(3))
[(Keanu Reeves)-[:ACTED_IN {roles: ['Neo']}]->(_6),
 (Keanu Reeves)-[:ACTED_IN {roles: ['Neo']}]->(_158),
 (Keanu Reeves)-[:ACTED_IN {roles: ['Julian Mercer']}]->(_151)]

And lastly, the Node and Relationship objects received can be reused, along with new instances, for further operations. Note that Relationship objects are now always dynamic subclasses of the Relationship base class and can be created via those subclasses:

>>> mary_poppins = Node("Movie", title="Mary Poppins")
>>> ACTED_IN = Relationship.type("ACTED_IN")
>>> graph.create(ACTED_IN(keanu, mary_poppins))
>>> graph.match((keanu, mary_poppins)).first()
(Keanu Reeves)-[:ACTED_IN {}]->(_189)

Reporting, Analytics and Data Science Integrations

Version 4 brings some new opportunities for reporting and data analysis as well as integration with several popular data science libraries.

The new opens in new tabTable class provides methods for multiple styles of output, including opens in new tabGithub-flavored markdown, HTML, CSV and TSV. It also has a opens in new tab_repr_html_ method attached, which allows results to be rendered elegantly in opens in new tabJupyter.

Oh, and keep an eye out for more Jupyter support. Exciting things are in the pipeline!

The opens in new tabCursor object now has methods to export to opens in new tabnumpy, opens in new tabpandas and opens in new tabsympy objects. For example, the opens in new tabnumpy.ndarray and the opens in new tabpandas.DataFrame:

>>> graph.run("MATCH (a:Person) RETURN a.name, a.born LIMIT 3")
    .to_ndarray()
array([['Laurence Fishburne', '1961'],
       ['Hugo Weaving', '1960'],
       ['Lilly Wachowski', '1967']], dtype='<U18')
>>> graph.run("MATCH (a:Person) RETURN a.name, a.born LIMIT 3")
    .to_data_frame()
   a.born              a.name
0    1961  Laurence Fishburne
1    1960        Hugo Weaving
2    1967     Lilly Wachowski

The Cypher Lexer

For good measure I’ve also added a opens in new tabCypher lexer for opens in new tabPygments. This can be used to tokenise Cypher statements or split multiple statements that are separated by semicolons.

>>> from pygments.lexers import get_lexer_by_name
>>> lexer = get_lexer_by_name("py2neo.cypher")
>>> list(lexer.get_tokens("MATCH (a) RETURN a"))
[(Token.Keyword, 'MATCH'),
 (Token.Text.Whitespace, ' '),
 (Token.Punctuation, '('),
 (Token.Name.Variable, 'a'),
 (Token.Punctuation, ')'),
 (Token.Text.Whitespace, ' '),
 (Token.Keyword, 'RETURN'),
 (Token.Text.Whitespace, ' '),
 (Token.Name.Variable, 'a'),
 (Token.Text.Whitespace, 'n')]

The lexer is used in the new interactive console, which I cover in a second.

Managing Neo4j Instances

The old neokit module and versions.txt have been rolled into py2neo.admin.dist and py2neo.admin.install — these modules contain information about Neo4j distributions as well as download and install facilities. They are also used by the command line py2neo get subcommand (more on that later).

The py2neo.admin.dist module contains details of Neo4j server versions as well as a Distribution class with a download method.

The py2neo.admin.install module then brings facilities for installing, starting, stopping and configuring Neo4j instances, as well as classes for manipulating auth files.

The majority of the functionality in these modules is exposed through the py2neo command line tool.

Command Line Usage

Finally, and excitingly, there is a new command like tool called (somewhat unimaginatively) py2neo. This comes with multiple subcommands and, is currently only tested to work on Linux. If you try it on OSX, BSD or Windows and it works there, please add a comment.

py2neo get

The get subcommand can be used to download Neo4j tarballs. When used without arguments, the latest version is downloaded; alternatively, a version can be specified.

py2neo auth

The auth subcommand hides a set of sub-subcommands all related to auth file management. These can be used to list, update and remove users in a Neo4j auth file.

$ py2neo auth update data/dbms/auth alice
Password:  *******
Repeat for confirmation:  *******
$ py2neo auth list data/dbms/auth
neo4j
alice

py2neo run

The run subcommand gives a simple way to run Cypher at the command line, sending the results to stdout. Connection parameters are managed through environment variables, such as NEO4J_URI=bolt://myserver:7687 and NEO4J_PASSWORD=P4ssw0rd.

py2neo console

The console subcommand kicks off an interactive Cypher console with full syntax highlighting. This rolls my separate (and now obsolete) console project, n4, back into Py2neo.

In a similar way to the run subcommand, environment variables can be used to manage the connection details.

Here’s a demonstration of the console in action…

opens in new tabPy2neo v4 console demonstration

Conclusion…

So there are the new features! I hope that Py2neo v4 is useful for you and I look forward to hearing how it’s being used.

Finally, if you’d like to chip in with the project, I could do with help on improving the docs, building example projects and providing more third party integrations. Oh, and helping to make it all work better in Windows!


opens in new tabPy2neo v4: The Next Generation was originally published in opens in new tabNeo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.