Introducing Drivine: Graph Database Client for Node.js and Typescript

Owner, Liberation Data

July 7, 2020

11 min read

Learn about Drivine, a database client for Node.js and Typescript

Last week, Liberation Data launched the v2.x stream of Drivine.

Drivine is a client library for Node.js and TypeScript. It is designed to support multiple graph databases, simultaneously if you wish, and to scale to hundreds or thousands of transactions per second. Drivine allows you to meet these goals without compromising architectural integrity.

Drivine provides a sweet-spot level of abstraction, with management and light-weight object to graph mapping (OGM) features. The library helps to implement flexible graph-powered systems on Node.js that have the highest performance characteristics. And without sacrificing architectural concerns.

Coupled with Neo4j AuraDB, a cloud graph database as a service, we have a compelling replacement for traditional relational databases (RDBMS) in online transactional systems.

This blog outlines the technical rationale for this thesis, and shows that Neo4j’s value proposition entails not just the database engine, but the maturity of the tooling within the ecosystem.

Let’s Begin

It is usually best to start at the beginning, so before I talk about what makes Drivine better, let’s recap what graph databases do better. And before I do that, let’s take a moment to make the first-timers among us welcome.

What Is a Graph Database?

What do Google, Twitter and LinkedIn have in common? One similarity is that each deals with networks of information – an information, professional and social network respectively.

A graph database stores information in a mathematical structure known as a network, which is based on a 300-year-old field of mathematics, known as graph theory.

A network can be visualized like this:

While Google and Twitter invested great sums in proprietary technology, LinkedIn uses an off-the-shelf solution because, well, such technology is available as a commodity now.

Why?

Commodity, you say? Isn’t commodification achieved through economy of scale as a result of high demand? It is. In fact Gartner predicted that “graph processing and graph DBMSs will grow at 100 percent annually through 2022.” But what is driving that demand?

What do graph databases do better?

A graph database introduces new possibilities to online transactional systems. Let’s enumerate briefly.

Performance

Yes, speed.

Consider a typical use-case for an online commerce site, where a customer looks up a product from the inventory: They’ll need to know the price, some related products that can be recommended, reviews, delivery options and other information related to the potential purchase.

We start with a point of interest, in this case the product, and then span out to other related pieces of information. In a traditional relational database, each of these look-ups involves an index (typically a binary tree) hit and thus entails a performance cost. With a graph database, only the initial point of interest does. We traverse the network of related data in near-instant O(1) time complexity.

In fact, when graph databases were still a niche technology, the ability to very quickly process deeply nested data, such as in recommendation engines, was an area that spurred adoption.

Today, the fact that they can handle these not uncommon cases with ease is referenced as one of the ways that they’re better in general.

After all, how many enterprise integration or digital transformation use-cases can you think of like the following? “Start with a point of reference and factor in a neighborhood of related concerns.” A lot, right? In fact there is a common phrase among graph database professionals, using the de facto graph database query language:

(graphs)-[ARE]->(everywhere)

Expressiveness

Which brings us to the next point. Graph database query languages, especially openCYPHER (currently the defacto standard) are enjoyably expressive.

Consider the following query, which states: Given a person named “Tom Hanks” and all of the media (movies, films, TV shows, commercials) they’ve acted in, return a list of all those who acted alongside.

MATCH (:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name

The equivalent query in an SQL language is starting to get long-winded to the point of becoming a cognitive burden. In Cypher, something like this is easy to express. You could learn how to work with Cypher after just a day of training. Maybe you get the gist of it already.

Analytics

Many computer science and statistical algorithms deal with some kind of annotated graph. In a graph database, the data is already in the native format, Degree, Centrality and Path Finding (Google maps, anyone?) are all available at the drop of a hat.

Adaptability

In a relational database, one designs a schema to serve a given set of use-cases. To accommodate new requirements requires a complex migration.

This is not the case with a graph database. Given the fluid nature of networks, one can enrich the data with new information as needed. While it is possible to refactor a graph database for performance or logistic reasons, it isn’t necessary to perform complex migrations just to add new properties, entities or relationships.

How?

This is where Drivine fits in.
At Liberation Data, our customers, for a variety of reasons love Node.js. They’d come to us with a business case and it would be clear that they could benefit from a graph database. We’d lay it out to them, and after addressing the “what” and the “why” of course the next question was: “How?”

The case for graph databases is compelling, but getting a clear picture of return on investment (ROI) entails factoring in not just the cost-benefit of the graph database itself, but other factors such as the maturity of tooling, maintainability and of course, using the technology in a correct way.

The Choices Until Now

Until recently, the choices for working with a graph database, in the Node.js space, engendered using a client library with:

High level of abstraction and productivity, traded for vendor lock-in and a prescriptive technology stack.
Low level of abstraction – working at the database driver level, resulting in the implementation of a lot of tedious and error prone “boiler-plate” code. In this approach software architectural concerns are left to the end-user, with plenty of scope to go in a wrong direction.

While a prescriptive stack might be a great choice, it is not always feasible as part of an evolving story in the enterprise. There might be existing technology to integrate with, for example. This leaves the second approach, which in a complex, resource constrained project, has obvious drawbacks. Nobody wants to reinvent the wheel, when the bottom-line is at stake.

Thus Drivine now provides a third option – what we call the “sweet spot level of abstraction.” It is the culmination of research, over a number of years, in finding flexible and effective approaches for using graph databases in high performance, high throughput Node.js systems.

A key concern for us has been to meet performance goals, while ensuring the solution is maintainable. Cost of ownership is a critical factor for clients, especially after hand-over of a solution.

Credentials

Drivine is based on an architecture that was put in place for MSTS, after the company signed several Fortune 100 clients, including Alibaba, and needed to scale their payment service. The result was not only an improved architecture, but in testing response times improved dramatically – from just under 2000ms (unacceptable) seconds to around 63ms (very satisfactory).

The initial ideas emerged while building Vampr – a social network targeting millions of musicians and music-lovers. This app needed to serve hundreds of transactions per second 24/7. During this project I came face-to-face with how clean architecture (maintainable at a reasonable cost) and performance can compete.

Experience as a past committer to the Spring Framework including on Spring Data Neo4j helped as well.

Drivine Features

Supports multiple graph databases
Manages infrastructure concerns, such as obtaining and releasing connections and sessions
Facilitates implementation of repositories, which can be injected into services. Your code adheres to Single Responsibility Principle (SRP)
Supports declarative transactions
Supports streaming, without backpressure. RxJS compatible streams which are only half of the picture. RxJS is a push model streaming library and can result in backpressure, which is why the companion IxJS project exists. Drivine uses pure Node.js duplex streams, which can easily be turned into either kind of stream
Maps and transforms query results to typed entities or models. In order to get the best of what a graph database can offer, Drivine encourages focusing on use-case specific graph projections, not a one-size-fits-all model

Supports Multiple Graph Databases

We know that Neo4j is the clear leader when it comes to a general purpose graph database, with a mature ecosystem. Nonetheless our customers want assurance that there is a healthy amount of competition and choice. The Drivine sample application shows how one database can be seamlessly swapped for another, just by changing configuration settings.

Moreover, Neo4j 4.x supports multi-tenancy and our customers love how Drivine makes it easy to work with multiple Neo4j databases, in a single server.

Manages Infrastructure Concerns

The library contains a PersistenceManager interface.

export interface PersistenceManager {

    query<T> (spec: QuerySpecification<T> ): Promise<T[]> ;

    getOne<T> (spec: QuerySpecification<T> ): Promise<T> ;

    maybeGetOne<T> (spec: QuerySpecification<T> ): Promise<T | undefined> ;

    openCursor<T> (spec: QuerySpecification<T> ): Promise<Cursor<T> > ;
}

It is the job of the persistence manager to obtain a connection, using database details that are registered when the library is bootstrapped, or at runtime. It will use pooling if this entails a performance benefit on the given database platform.

Repositories

Repositories are a common pattern of structuring object-oriented code, in order to adhere to Single Responsibility Principle (SRP). They logically group database operations for a particular type of entity. Simply by using composition, the PersistenceManager can be used to implement repositories.

Here is an example:

@Injectable()
export class RouteRepository {

    constructor(
        @InjectPersistenceManager() readonly persistenceManager: PersistenceManager,
        @InjectCypher(__dirname, 'routesBetween') readonly routesBetween: CypherStatement) {}

    @Transactional()
    async findFastestBetween(start: string,destination: string): Promise<Route>  {
        return this.persistenceManager.getOne(
            new QuerySpecification()
                .withStatement(this.routesBetween)
                .bind([start, destination])
                .limit(1)
                .transform(Route)
        );
    }
}

Declarative Transactions

Just as repositories can be composed using a PersistenceManager, so services can be composed using repositories. But what about transactions?

When analyzing the functional and non-functional requirements of a system, transactions fall under what is known as a cross-cutting concern. They are called as such because they’re required in many places. In other words, they cut across many modules.

Requirements like these don’t fit well with pure object-oriented programming because they can compromise efforts to adhere to SRP. Imagine implementing a service method that is all about transferring funds, and then adding transaction behaviors. Now add security. And then audit. Before too long, the class that neatly represented a single role has become a mess.

Fortunately, transactional concerns can easily be modularized in TypeScript using Decorators, a kind of higher-order function that wraps the original function, in our case with transactional concerns applied.

With Drivine it only takes a code decorator to declare that a unit of code should start or participate in a transaction scoped to a thread of execution, such as an HTTP request.

@Transactional()
    async findFastestBetween(start: string,destination: string): Promise<Route> {
        return this.persistenceManager.getOne(
            new QuerySpecification<Route>()
                .withStatement(this.routesBetween)
                .bind([start, destination])
                .limit(1)
                .transform(Route)
        );
    }

Object Mapping

Drivine provides graph-to-object mapping facilities, however due to its design goal of optimal performance it differentiates from the most typical approaches. In a typical application that uses an ORM or OGM:

Generalised entities – a network of classes that represent things in our domain are defined.
Instances of these domain entities are loaded. The OGM tool generates queries for this on our behalf.
Having loaded instances of our generalised entities, information is plucked and mapped onto use-case specific payload objects.

Trouble Ahead

The approach can work, however when an application needs to be highly scalable, several problems are entailed:

There is an overhead to the mapping, both in terms of implementation effort as well as runtime performance cost.
Because the queries are system-generated, there is limited opportunity to efficiently undertake performance tuning. In a system where performance and throughput are critical, profiling and tuning of queries will be necessary.
Similarly, because the entities are generalized, it is unlikely that the generated queries will be the optimal ones for a specific use case.

Object Mapping: Drivine Approach

In order to benefit from cleanly architected code that can scale to many thousands of transactions per second, Drivine takes the following approach:

Queries are based on use-case specific scenarios defined by you in a self-contained file. Because they are self-contained, they can be modified, formatted, tested and tuned using the tool of your choice.
Queries are injected to be used in repositories with a decorator.

The premise of an ORM tool is based on an impedance mismatch between an object graph and relational table structures. With a graph database such a mismatch isn’t a problem because it is easy to write performant queries that return an object-graph for a given usage scenario:

MATCH (actor:Person {name: $name})
WITH actor, [(actor)-[:ACTED_IN]-(movie:Movie) | 
    movie {.title, .tagline, .released}] as movies
RETURN 
{
     name: actor.name,
     born: actor.born
     movies: movies
}

Consequently there is no need for a heavy framework, and the duty of a mapper can be relegated to mostly type conversion concerns.

Streaming

Drivine supports streaming, and helps you prevent back-pressure.

For some use-cases the volume of data is too large to be buffered in a single result set. Drivine’s PersistenceManager has the ability to open a Cursor, which provides two kinds of streaming capabilities.

Cursors Are AsyncIterable

Cursor implements AsyncIterable. This means that it can be used with a for await… of statement. During the execution of the loop, results will be pulled in batches, until the upstream is depleted.

const cursor = await repo.asyncRoutesBetween('Cavite Island', 'NYC');
for await (const item of cursor) {
    // The cursor will read more results, executing
    // database reads, until the stream is consumed.
}

Cursors Pose as a Readable Stream

Besides AsyncIterable cursors turn themselves into a Readable stream. Why would we need this? AsyncIterable is helpful, but it may lead to problems when a tight loop is pushing data into a stream, such as a file-stream. Even though data is pulled in batches, pushing too quickly into a stream will cause problems. The following example could potentially crash:

const cursor = await repo.asyncRoutesBetween('Cavite Island', 'NYC');
for await (const item of cursor) {
    const result = fs.write(item);
}

Don’t overload stream buffers. Use readable streams, as follows:

cursor.asStream().pipe(fileStream);
await StreamUtils.untilClosed(fileStream);

Now new information will be pulled from the cursor as needed. It is the sink stream (filestream) that will coordinate this, and at the rate at which it can handle.

Many Node.js libraries use RxJS for streaming, without supporting the companion IxJS library. While the former is push, the latter is for pull-through scenarios. Drivine uses native Node.js streams, which are duplex in nature, and such streams can easily be converted to RxJS or IxJS streams.

Test Driven Development

Drivine supports test driven development, including integration testing of database related code. Tests can be run inside an optional rollback transaction to leave the database in a pristine state. This helps when testing against real production data scenarios or for running tests concurrently.

RunWithDrivine();
describe('RouteRepository', () => {
    let repo: RouteRepository;

    beforeAll(async () => {
        const app: TestingModule = await Test.createTestingModule({
            imports: [AppModule],
            providers: [RouteRepository],
            controllers: []
        }).compile();
        repo = app.get(RouteRepository);
    });

    it('should find routes between two cities, ordered by most expedient', async () => {
        const results = await repo.findRoutesBetween('Cavite Island', 'NYC');
        expect(results.length).toBeGreaterThan(0);
        expect(results[0].travelTime).toEqual(26);
    });

});

Next Steps

Graph databases started as a niche proprietary technology. Now, with Neo4j and others offering a pay-as-you-go cloud service we have a compelling RDBMS replacement.

You can start using Drivine along with Neo4j AuraDB today. The best way to start benefiting from graph technology is to begin. The best place to start with Drivine is to visit the website.

Want to take your Neo4j skills up a notch? Take one of our online training classes or get certified at our GraphAcademy, and level up!

Take a Class