Py2neo v4: The Next Generation


As both a busy dad and the team lead of the Neo4j Drivers Team, the amount of spare time I’ve had to work on personal projects over the past couple of years has been somewhat reduced. But with all the interest by Python users in keeping Py2neo going, I couldn’t resist starting work on it again. So I’m delighted to finally be able to deliver a brand new major release with a number of shiny new features and improvements. Please welcome Py2neo v4!

As usual you can find the library at the Python Package Index and can install it with pip (pip install py2neo). Take a look at the comprehensive documentation and if you encounter any problems, shout out in the #neo4j-python channel in the neo4j-users Slack.

Py2neo now wraps the 1.6 release of the official Python driver, which takes care of all the low-level, nitty-gritty, binary-winery(?) things a database driver needs to handle. This allows Py2neo to focus on higher-level features and proper pythonic API and integrations.

Grabbing a Graph

As with every previous version of Py2neo, the main way into the library is through the Graph. The constructor can accept a range of settings and the code behind this has now been completely overhauled to fix several former issues, such as a failure to recognise custom port numbers. The default protocol is also now Bolt, rather than HTTP, and the hard requirement on HTTP has been completely removed.

So, to get connected, simply create a Graph object:

>>> from py2neo import Graph
>>> graph = Graph("bolt://myserver:7687", auth=("neo4j", "psswrd"))

Along with a regular connection URI, the full set of settings accepted by the Graph constructor is as follows:

  • auth – a 2-tuple of (user, password)
  • user
  • password
  • secure (boolean flag)
  • scheme
  • host
  • port
  • user_agent

API Overview

Py2neo exposes several logical layers of API on top of the official Python driver. The lowest level Cypher API provides Cypher execution facilities very similar to those in the driver, but with a few extras such as coercion to a Table object:

>>> graph.run("MATCH (a:Person) 
RETURN a.name, a.born LIMIT 3").to_table()
 a.name             | a.born 
--------------------|--------
Laurence Fishburne | 1961
Hugo Weaving | 1960
Lilly Wachowski | 1967
>>> graph.evaluate("MATCH (a:Person) RETURN count(a)")
142

The next level up, the Entity API, wraps Cypher in convenience functions that provide a full set of CRUD operations on Node and Relationship objects. This can make for clearer application code at the expense of fine-grained control. The NodeMatcher, for example, constructs and executes a Cypher MATCH statement and returns Node objects:

>>> [(a["name"], a["born"])
for a in graph.nodes.match("Person").limit(3)]
[('Laurence Fishburne', 1961),
('Hugo Weaving', 1960),
('Lilly Wachowski', 1967)]

Other Entity API methods include Graph.create, Graph.delete and Graph.merge (as well as similar transactional variants). Note that Graph.merge has now been completely rewritten to use Cypher’s UNWIND clause internally. This addresses some previous performance issues for the method when used at scale.

The topmost level of API is Py2neo’s OGM API. This allows creation of GraphObjects that wrap nodes in native classes and provide attributes to model their relationships and properties.

>>> from py2neo.ogm import GraphObject, Property
>>> class Person(GraphObject):
... name = Property()
... born = Property()
...
>>> [(a.name, a.born) for a in Person.match(graph).limit(3)]
[('Laurence Fishburne', 1961),
('Hugo Weaving', 1960),
('Lilly Wachowski', 1967)]

More about Matchers

The old py2neo.selection module has been renamed to py2neo.matching and the NodeSelector is now called NodeMatcher. There’s also a new RelationshipMatcher, which is an evolution of the old Graph.match method implementation.

A NodeMatcher offers a DSL that can be used to locate nodes which fulfil a specific set of criteria. Typically, a single node can be identified passing a specific label and property key-value pair. However, any number of labels and any condition supported by the Cypher WHERE clause is allowed.

For a simple match by label and property use the match method:

>>> graph.nodes.match("Person", name="Keanu Reeves").first()
(_224:Person {born:1964,name:"Keanu Reeves"})

For a more comprehensive match using Cypher expressions, the where method can be used to further refine the selection. Here, the underscore character can be used to refer to the nodes being filtered:

>>> list(matcher.match("Person").where("_.name =~ 'K.*'"))
[(_57:Person {born: 1957, name: 'Kelly McGillis'}),
(_80:Person {born: 1958, name: 'Kevin Bacon'}),
(_83:Person {born: 1962, name: 'Kelly Preston'}),
(_224:Person {born: 1964, name: 'Keanu Reeves'}),
(_226:Person {born: 1966, name: 'Kiefer Sutherland'}),
(_243:Person {born: 1957, name: 'Kevin Pollak'})]

Orders and limits can also be applied:

>>> list(matcher.match("Person").where("_.name =~ 'K.*'")
.order_by("_.name").limit(3))
[(_224:Person {born: 1964, name: 'Keanu Reeves'}),
(_57:Person {born: 1957, name: 'Kelly McGillis'}),
(_83:Person {born: 1962, name: 'Kelly Preston'})]

And if only a count of matched entities is required, the length of a match can be evaluated:

>>> len(matcher.match("Person").where("_.name =~ 'K.*'"))
6

The underlying query is only evaluated when the selection undergoes iteration or when a specific evaluation method is called (such as with first). This means that a NodeMatch instance may be reused before and after data changes for different results.

Relationship matching is similar:

>>> keanu = graph.nodes.match("Person", name="Keanu Reeves").first()
>>> list(graph.relationships.match((keanu, None), "ACTED_IN") 
.limit(3))
[(Keanu Reeves)-[:ACTED_IN {roles: ['Neo']}]->(_6),
(Keanu Reeves)-[:ACTED_IN {roles: ['Neo']}]->(_158),
(Keanu Reeves)-[:ACTED_IN {roles: ['Julian Mercer']}]->(_151)]

And lastly, the Node and Relationship objects received can be reused, along with new instances, for further operations. Note that Relationship objects are now always dynamic subclasses of the Relationship base class and can be created via those subclasses:

>>> mary_poppins = Node("Movie", title="Mary Poppins")
>>> ACTED_IN = Relationship.type("ACTED_IN")
>>> graph.create(ACTED_IN(keanu, mary_poppins))
>>> graph.match((keanu, mary_poppins)).first()
(Keanu Reeves)-[:ACTED_IN {}]->(_189)

Reporting, Analytics and Data Science Integrations

Version 4 brings some new opportunities for reporting and data analysis as well as integration with several popular data science libraries.

The new Table class provides methods for multiple styles of output, including Github-flavored markdown, HTML, CSV and TSV. It also has a _repr_html_ method attached, which allows results to be rendered elegantly in Jupyter.

Oh, and keep an eye out for more Jupyter support. Exciting things are in the pipeline!

The Cursor object now has methods to export to numpy, pandas and sympy objects. For example, the numpy.ndarray and the pandas.DataFrame:

>>> graph.run("MATCH (a:Person) RETURN a.name, a.born LIMIT 3")
.to_ndarray()
array([['Laurence Fishburne', '1961'],
['Hugo Weaving', '1960'],
['Lilly Wachowski', '1967']], dtype='<U18')
>>> graph.run("MATCH (a:Person) RETURN a.name, a.born LIMIT 3")
.to_data_frame()
   a.born              a.name
0 1961 Laurence Fishburne
1 1960 Hugo Weaving
2 1967 Lilly Wachowski

The Cypher Lexer

For good measure I’ve also added a Cypher lexer for Pygments. This can be used to tokenise Cypher statements or split multiple statements that are separated by semicolons.

>>> from pygments.lexers import get_lexer_by_name
>>> lexer = get_lexer_by_name("py2neo.cypher")
>>> list(lexer.get_tokens("MATCH (a) RETURN a"))
[(Token.Keyword, 'MATCH'),
(Token.Text.Whitespace, ' '),
(Token.Punctuation, '('),
(Token.Name.Variable, 'a'),
(Token.Punctuation, ')'),
(Token.Text.Whitespace, ' '),
(Token.Keyword, 'RETURN'),
(Token.Text.Whitespace, ' '),
(Token.Name.Variable, 'a'),
(Token.Text.Whitespace, '\n')]

The lexer is used in the new interactive console, which I cover in a second.

Managing Neo4j Instances

The old neokit module and versions.txt have been rolled into py2neo.admin.dist and py2neo.admin.install — these modules contain information about Neo4j distributions as well as download and install facilities. They are also used by the command line py2neo get subcommand (more on that later).

The py2neo.admin.dist module contains details of Neo4j server versions as well as a Distribution class with a download method.

The py2neo.admin.install module then brings facilities for installing, starting, stopping and configuring Neo4j instances, as well as classes for manipulating auth files.

The majority of the functionality in these modules is exposed through the py2neo command line tool.

Command Line Usage

Finally, and excitingly, there is a new command like tool called (somewhat unimaginatively) py2neo. This comes with multiple subcommands and, is currently only tested to work on Linux. If you try it on OSX, BSD or Windows and it works there, please add a comment.

py2neo get

The get subcommand can be used to download Neo4j tarballs. When used without arguments, the latest version is downloaded; alternatively, a version can be specified.

py2neo auth

The auth subcommand hides a set of sub-subcommands all related to auth file management. These can be used to list, update and remove users in a Neo4j auth file.

$ py2neo auth update data/dbms/auth alice
Password: *******
Repeat for confirmation: *******
$ py2neo auth list data/dbms/auth
neo4j
alice

py2neo run

The run subcommand gives a simple way to run Cypher at the command line, sending the results to stdout. Connection parameters are managed through environment variables, such as NEO4J_URI=bolt://myserver:7687 and NEO4J_PASSWORD=P4ssw0rd.

py2neo console

The console subcommand kicks off an interactive Cypher console with full syntax highlighting. This rolls my separate (and now obsolete) console project, n4, back into Py2neo.

In a similar way to the run subcommand, environment variables can be used to manage the connection details.

Here’s a demonstration of the console in action…

Py2neo v4 console demonstration

Conclusion…

So there are the new features! I hope that Py2neo v4 is useful for you and I look forward to hearing how it’s being used.

Finally, if you’d like to chip in with the project, I could do with help on improving the docs, building example projects and providing more third party integrations. Oh, and helping to make it all work better in Windows!



Neo4j Online Developer Expo and Summit is back for 2021.

Register for NODES 2021 today and enjoy the talks from experienced graph developers.


Save My Spot

Py2neo v4: The Next Generation was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.