Implementing Graph-Based Applications

News
Implementing Graph-Based Applications

Graphs have proven to be widely applicable to model a range of business problems and domains. Yet, the flexibility that graphs bring requires an additional level of attention to implementation and an adaptation of familiar programming idioms to increase the benefits while avoiding common pitfalls.

The following topics summarise patterns and strategy I used across a number of Neo4j projects. In my case I used the native Java API, Cypher and even some JavaScript for custom graph rendering. To make the ideas more generally useful, I tried to extract what seems to be reusable concepts and avoid focusing exclusively on any one programming model in particular.

Choosing an Implementation Strategy

One of the primary considerations when developing a graph-based application is the choice of an implementation strategy and the frameworks and tools that will support it. On some platforms, Object Graph Mapping (OGM) frameworks are available, such as Spring Data Neo4j (SDN). In a nutshell, OMG frameworks provide a simple and consistent means of mapping between graph resources and application objects within a programming model that is familiar to most developers. Given these benefits, a fair question to ask is whether we should routinely pick up an OGM framework every time we implement an application backed by a graph database. The answer is, as usual, it depends. The most important factor that should guide this choice is the nature of the domain and the available data. Broadly speaking, applications can be domain-centric ordata-centric. For more background about this concept please refer to the previous part: Designing Graph-Based applications. If the application is domain-centric, it is sensible to consider using an OGM framework. In domain centric-applications, the domain and the data that compose it are clearly defined and are under the control of the application. Therefore we are able to predict the quality and the structure of the data that the application is going to manipulate, and to design upfront the basis of a stable object domain model that is able to graceful evolve when new requirements arise. In such cases, using an OGM framework offers a number of advantages over a manual mapping approach. Firstly, an OGM framework guarantees that mapping between graph resources and the application’s objects and data types is always consistent. This is an essential quality given that graph resources are typically more malleable than their object counterparts. As an example, the order of nodes in a traversal operation can differ from their natural order in the domain model. Moreover, the same node can appear a number of times on a given path depending on the query. Sloppy mapping can potentially introduce inconsistencies in the data model in this case, by making use of inadequate collection types, furthering the gap between the graph and the object model in a way that is hard to reconcile. The primary challenge though is keeping the two models constantly in sync as more and more functionality is built on top of the object model. OGM frameworks alleviate the issue significantly, provided that the domain and the data lend themselves to this approach. Therefore, using an OGM framework sensibly can result in a powerful and productive programming model. There are naturally situations where this approach doesn’t work as well, primarily when an application is data-centric e.g. when an application relies on multiple external data sources that are not under the control of the developer. In these cases, it is desirable to preserve the graph representation as long as possible. Application logic can be implemented on top of a lightweight representation that matches the underlying graph model to improve the flexibility and durability of the application’s design. [In such cases, algorithms are implemented directly on top of graph entities and the object model is typically used for presentation purposes only.] Conversely, a rigid object model founded on inaccurate or premature assumptions about the data will break as soon as these assumptions are challenged; this is most likely to happen as we build a better understanding of the domain or when new requirements arise.

The Graph as a Fundamental Data Structure

The fundamental assumption behind the OGM programming model is that the object model serves as the main representation of the domain. Accordingly, the underlying graph is abstracted away as much as practically possible.

A contrasting but equally compelling perspective is to consider the graph as the principal application domain. Essentially, graphs are simple data structures for which a vast range of well-understood and widely-applicable algorithms are readily available.

Pushing this idea further, there becomes no need to wrap the graph with custom domain APIs and data structures; application logic and behaviour can simply be defined through functions that are evaluated immediately on the domain’s graph. This conception is aligned with a functional programming mindset. It is equally much closer to the data-driven nature of graphs; it encourages simplicity and reuse of existing general-purpose graph algorithms.

Cypher is an excellent example of this approach. Cypher already provides a number of functional programming idioms and graph algorithms, and I would anticipate that it will continue to evolve to become eventually a more complete functional DSL that is capable of describing a wider range of complex graph operations.

Querying the Graph

First-class support of high-level query languages is much more prevalent today for non-relational databases. Neo4J has Cypher, in addition to the REST interface and the native Java API (available in embedded mode and for plugins). For most use cases, I would privilege Cypher over the other APIs available, mainly because it is overall a better all-round tool to get the job done. From my perspective, Cypher operates at a right level of abstraction for designing and implementing graph queries. it enables an exploratory and iterative approach to problem-solving as well as rapid prototyping in collaboration with domain-experts, who often don’t possess the technical skills to comprehend lower level APIs. This is often key to building solutions that are both valuable and lasting. From a performance perspective, while the native API is presumably faster, Cypher offers more transparent optimisation opportunities due to its declarative nature.

Keywords: cypher • Object Graph Mapping

Neo4j News

Implementing Graph-Based Applications

Graphs have proven to be widely applicable to model a range of business problems and domains. Yet, the flexibility that graphs bring requires an additional level of attention to implementation and an adaptation of familiar programming idioms to increase the benefits while avoiding common pitfalls.

Choosing an Implementation Strategy

The Graph as a Fundamental Data Structure

Querying the Graph