REST::Neo4p – A Perl “OGM”


REST::Neo4p – A Perl “OGM”

This is a guest post by Mark A. Jensen, a DC area bioinformatics scientist. Thanks a lot Mark for writing the impressive Neo4j Perl library and taking the time to documenting it thoroughly.
You might call REST::Neo4p an “Object-Graph Mapping”. It uses the Neo4j REST API at its foundation to interact with Neo4j, but what makes REST::Neo4p “Perly” is the object oriented approach. Creating node, relationship, or index objects is equivalent to looking them up in the graph database and, for nodes and relationships, creating them if they are not present. Updating the objects by setting properties or relationships results in the same actions in the database, and returns errors when these actions are proscribed by Neo4j. At the same time, object creation attempts to be as lazy as possible, so that only the portion of the database you are working with is represented in memory.
The idea is that working with the database is accomplished by using Perl5 objects in a Perl person’s favorite way. Despite the modules’ “REST” namespace, the developer should almost never need to deal with the actual REST calls or the building of URLs herself. The design uses the amazingly complete and consistent self-describing information in the Neo4j REST API responses to keep URLs under the hood.

The Backstory

I am a bioinformatics scientist contracted to a large US government research initiative. Our clients were interested in alternatives to RDBMS to represent the big data we manage. As a long-time object-oriented Perler with only a smattering of Java, I wanted to investigate Neo4j, but on my own terms. My CPAN and other searches came up with some experimental approaches to working with the Neo4j REST service, but there was little true object functionality and robustness in place.
I was surprised to see so little Neo4j activity in the open-source Perl domain, in the face of many pleading requests in the Neo4j community for a decent Perl interface. So I took on the challenge, and I am definitely hoping for positive feedback and constructive criticism.

The Basics

Download and install

REST::Neo4p is really a family of Perl modules/classes. Get it and install it like this:
 $ cpan
cpan> install REST::Neo4p
That’s it. The cpan utility comes with every Perl installation. If you haven’t used it before, you will be asked some setup questions, for most of which the defaults will be fine.
Tests: The installation process will run a complete suite of tests. To take advantage of these, enter the full API URL and port of your (running!) neo4j engine when prompted. (It defaults to https://127.0.0.1:7474.)
If any tests fail, please report a bug right here.
The cpan utility will also help you install the dependencies, other CPAN modules that make REST::Neo4p go. There are only a few. A trick to get them all to install automatically is the following:
 $ cpan
cpan> o conf prerequisites_policy follow
cpan> install REST::Neo4p

Connect and manipulate

I’m going to assume you’re familiar with Perl 5 objects. If you’re not, check out
straight from the camel‘s mouth.
I’ll walk through a simple model from the bioinformatics domain, nucleotides in DNA (follow link for a nice introduction).
Nucleotides are the letters which spell the words (genes) encoded in DNA. There are four that are most important, and are referred to as A, C, G, and T. These letters stand for their chemical names. DNA can change or mutate when one of these letters is changed to another. The letters are the nodes in our model.
Mutations are changes in the DNA from one letter to another. These changes themselves are classified, and are called either transistions or transversions. The details don’t matter here, except that mutations from one letter to another are the relationships in our model.
First, include the REST::Neo4p modules, then connect to the database:
#-*- perl -*-
use REST::Neo4p;
use strict;
use warnings;

eval {
REST::Neo4p->connect('https://127.0.0.1:7474');
};
ref $@ ? $@->rethrow : die $@ if $@;

Errors, including communication errors, are transmitted as exception objects from a hierarchy (see REST::Neo4p::Exceptionsfor a full description). The last line here just checks whether an exception was thrown at connect time and, if so, dies with a message. More sophisticated handling is possible and encouraged.
Now to create nodes along with indexes to easily handle them. The new constructor does the creation and returns the objects for the Neo4p entity classes Index, Node, and Relationship.
my @node_defs = 
(
{ name => 'A', type => 'purine' },
{ name => 'C', type => 'pyrimidine' },
{ name => 'G', type => 'purine'},
{ name => 'T', type => 'pyrimidine' }
);
my $nt_types = REST::Neo4p::Index->new('node','nt_types');
my $nt_names = REST::Neo4p::Index->new('node','nt_names');
my @nts = my ($A,$C,$G,$T) = map { REST::Neo4p::Node->new($_) } @node_defs;

$nt_names->add_entry($A, 'fullname' => 'adenine');
$nt_names->add_entry($C, 'fullname' => 'cytosine');
$nt_names->add_entry($G, 'fullname' => 'guanosine');
$nt_names->add_entry($T, 'fullname' => 'thymidine');

for ($A,$G) {
$nt_types->add_entry($_, 'type' => 'purine');
}

for ($C,$T) {
$nt_types->add_entry($_, 'type' => 'pyrimidine');
}

In general, you provide a hash reference that maps properties to values to the Node and Relationship constructors. To create an Index, the first argument is the index type (‘node’ or ‘relationship’), followed by the index name. Use the add_entry method to add an object to an index with a tag => value pair.
On to relationships. We create a relationship index to corral the mutation types, and express the mutation types as relationship objects. Using the relate_to method from node objects, we create relationships between pairs of nodes with a pretty natural syntax:
my $nt_mutation_types = REST::Neo4p::Index->new('relationship','nt_mutation_types');

my @all_pairs;
my @a = @nts;
while (@a) {
my $s = shift @a;
push @all_pairs, [$s, $_] for @a;
}

for my $pair ( @all_pairs ) {
if ( $pair->[0]->get_property('type') eq
$pair->[1]->get_property('type') ) {
$nt_mutation_types->add_entry(
$pair->[0]->relate_to($pair->[1],'transition'),
'mut_type' => 'transition'
);
$nt_mutation_types->add_entry(
$pair->[1]->relate_to($pair->[0],'transition'),
'mut_type' => 'transition'
);
}
else {
$nt_mutation_types->add_entry(
$pair->[0]->relate_to($pair->[1],'transversion'),
'mut_type' => 'transversion'
);
$nt_mutation_types->add_entry(
$pair->[1]->relate_to($pair->[0],'transversion'),
'mut_type' => 'transversion'
);
}
}

The relate_to method returns the relationship object that is created. Here we use that side effect directly in the add_entrymethod of the index.
If you prefer, you can use a relationship constructor:
 $transition = REST::Neo4p::Relationship->new($A, $G, 'transition');
Relationship properties can be added in the constructor, or after the fact:
 $transition->set_property('involved_types' => 'purines');

Perl garbage collection removes objects from memory, but does not delete from the database. You must do this explicitly, using the the remove method on any entity:
 for my $reln ($A->get_relationships) {
$reln->remove;
}
$A->remove;
$nt_types->remove;
# etc.

Retrieve and query

The REST::Neo4p module itself contains a few methods for retrieving database items directly. The most useful is probably get_index_by_name. Index objects have find_entries for retrieving the items in the index.
use REST::Neo4p;
use strict;
use warnings;

REST::Neo4p->connect('https://127.0.0.1:7474');
my $idx = REST::Neo4p->get_index_by_name('nt_names','node');
my ($node) = $idx->find_entries(fullname => 'adenine');
my @nodes = $idx->find_entries('fullname:*');

Note that find_entries always returns an array, and it supports either an exact search or a lucene search.
REST::Neo4p also supports the CYPHER query REST API. Entities are returned as REST::Neo4p objects. Query results are always sent to disk, and results are streamed via an iterator that is meant to imitate the commonly used Perl database interface, DBI.
Here we print a table of nucleotide pairs that are involved in transversions:
my $query = REST::Neo4p::Query->new(
'START r=relationship:nt_mutation_types(mut_type = "transversion")
MATCH a-[r]->b
RETURN a,b'
);
$query->execute;
while (my $result = $query->fetch) {
print $result->[0]->get_property('name'),'->',
$result->[1]->get_property('name'),"n";
}

The query is created with the CYPHER code, then executed. The fetch iterator retrieves the returned rows (as array references) one at a time until the result is exhausted. Again, the result is not held in memory, so queries returning many rows should not present a big problem.

Production-quality Features

My goal for REST::Neo4p is to go beyond the Perl experiments with Neo4j that are out there to create modules that are robust enough for production use (yes, people DO use Perl in production!). This meant a couple of things:
  • Be robust and feature-rich enough that people will want to try it.
  • Be responsive enough to bugs that people will see it maintained.
  • Incorporate unit and integration tests into the user installation.
  • Have complete documentation with tested examples.
  • Create bindings to as many of the Neo4j REST API functions as is possible for a guy with a real job.
  • Be concerned with performance by being sensitive to memory use, and taking advantage of streaming and batch capabilities.
  • Capture both Perl and Neo4j errors in a catchable way.

There isn’t space here to discuss these points in detail, but here are some highlights and links:
  • REST::Neo4p::Agent is the class where the REST calls get done and the endpoints are captured. It subclasses the widely-used LWP::UserAgent and can be used completely independently of the object handling modules, if you want to roll your own Neo4j REST interface.
  • The batch API can be used very simply by including the REST::Neo4p::Batch mixin. Visit the link for detailed examples.
  • When paths are returned by CYPHER queries, they are rolled up into their own simple container object REST::Neo4p::Path that collects the nodes and relationships with some convenience methods.
  • You can choose to have REST::Neo4p create property accessors automatically, allowing the following:
     $node->set_property( name => 'Fred' );
    print "Hi, I'm ".$node->name;
  • REST::Neo4p::Index allows index configuration (e.g., fulltext lucene) as provided by the Neo4p REST API.

I hope Perlers will give REST::Neo4p a try and find it useful. Again, I appeciate the time you take to report bugs.