[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]


Graph databases and especially Neo4j have proven to be a great solution for close-related data analysis. This is why biotechnology research groups are looking into such technologies, and why we’ve given birth to a new Bolt driver. A Haskell one.

My name is Pavel Yakovlev, and I am a director of the Computational Biology Department at BIOCAD — a leading Russian biotechnology company. Our department is involved in R&D of anti-cancer drugs of several types: small molecules, monoclonal antibodies and gene therapy.

One of the keys to success in our company is wide usage of computational technologies, like docking-based rational drug design, wet-lab automatisation and data analysis of automated experiments. Most of our computations are very computationally expensive, so we cannot afford runtime errors, and growing data flows require easy parallelism. These factors have led us to use functional programming paradigms like strong typing, purity and immutable data structures.

Learn all about Hasbolt – the new Haskell-Neo4j language driver using the Bolt binary protocol

But algorithms on isolated data cannot solve every problem – we need relationships. For example, each drug candidate should has a lot of experimental or predicted ADME(T) parameters and be linked with its target.

Forming data in such a way, we can do predictive analytics, such as which molecular subunits are the most important to achieve the quality profile of an active drug component. So, we looked into graph databases and their ability to solve our tasks.

As my primary education is applied math and physics, I used to write scientific code a lot in the past, but now I do not write any for our systems. Nevertheless I began a one-week pet project to dive into graph technology using the Bolt driver and some toy programs (with real biological data) around it.

I stated with boltkit, especially driver.py. The guide is great, but it’s for imperative programming languages.

When you use Haskell, you have to think of how to organise the code in some type-driven way. Also, I did not understand the structure concept clearly. I have implemented a first version, but then I have found a Bolt Protocol specification and fully rewrote my implementation.

The Hasbolt Neo4j Bolt Driver

To use Neo4j via Bolt from Haskell, we want to support an API like this example demonstrates:

myConfiguration :: BoltCfg
myConfiguration = def { user = "neo4j", password = "neo4j", host = "example.com" }
main :: IO ()
main = do pipe <- connect myConfiguration
          records <- run pipe $ query "MATCH (n:Person) WHERE n.name CONTAINS \"Tom\" RETURN n"
          let first = head records
          cruise <- first `at` "n" >>= exact :: IO Node
          print cruise
          close pipe

For this to work, we need some building blocks in Hasbolt which I’ll explain in the following sections. If you’re into Haskell you hopefully enjoy this and can also provide me some good feedback.

The Hasbolt driver has two main low-level concepts:

  1. Value — serialization and deserialization of primitive types, strings, lists, maps and structures. It also introduces Neo4j types like node, relationship, unbounded relationship and path.
  2. Connection — network overlay for sending and receiving data.

In Data.Value.Type, we can find the definition of all Bolt-able types:

data Value = N ()               -- Null 
           | B Bool             -- Boolean 
           | I Int              -- 64-bit integer 
           | F Double           -- 64-bit float 
           | T Text             -- UTF8 strings 
           | L [Value]          -- lists 
           | M (Map Text Value) -- maps with string keys 
           | S Structure        -- bolt structures 
  deriving (Show, Eq)

As we have to unpack lots of values from a single bytestring, I used StateT monad transformer to save a current state — yet unpacked bytestring suffix:

type UnpackT = StateT ByteString

All these values can be packed and unpacked to bytestrings using the BoltValue type-class:

class BoltValue a where
  pack :: a -> ByteString
  unpackT :: Monad m => UnpackT m a
  unpack :: Monad m  => ByteString -> m a
  unpack = evalStateT unpackT

Of course, all Values already have BoltValue implementations.

Nodes, relationships and paths as well as protocol requests and responses are structures. So, we have to convert these types to structures and back. I have a Structable type-class for this purpose:

class Structable a where
  toStructure :: a -> Structure
  fromStructure :: Monad m => Structure -> m a

I use monadic context for each unpacking operations since it can fail: bytestring can be invalid and structure can have an unknown signature. Failure is a side effect, so you can use any monad to work with it.

Connection concepts are not so interesting, so I will describe just some important classes. First of all, to create a new connection, you need to fill a configuration record:

data BoltCfg = BoltCfg { magic         :: Word32  -- '6060B017' value 
                       , version       :: Word32  -- '00000001' value 
                       , userAgent     :: Text    -- Driver user agent 
                       , maxChunkSize  :: Word16  -- Maximum chunk size of request 
                       , socketTimeout :: Int     -- Driver socket timeout 
                       , host          :: String  -- Neo4j server hostname 
                       , port          :: Int     -- Neo4j server port 
                       , user          :: Text    -- Neo4j user 
                       , password      :: Text    -- Neo4j password 

Of course, most of these values would not change in near future, so BoltCfg implements a Default type-class that points the configuration to localhost with an empty user and password. As a result, you can fill only the fields of interest:

myConfiguration :: BoltCfg
myConfiguration = def { user = "neo4j", password = "neo4j", host = "example.com" }

To create a new connection just put this configuration to a connection function like this in any MonadIO:

pipe <- connect myConfiguration

And to close the connection, run:

close pipe

The other important type is the BoltActionT monad transformer. It lets you chain queries using one connection. So, every query function returns a computation inside this monad:

-- |Runs Cypher query and ignores response 
query_ :: MonadIO m => Text -> BoltActionT m ()
-- |Runs Cypher query and returns list of obtained Records 
query :: MonadIO m => Text -> BoltActionT m [Record]
-- |Runs Cypher query with parameters and returns list of obtained Records 
queryP :: MonadIO m => Text -> Map Text Value -> BoltActionT m [Record]

To run this transformer use run function:

run :: MonadIO m => Pipe -> BoltActionT m a -> m a

If you are interested in the response, you can receive a Record. You can think about Records like maps from strings to any data but with a possibility to extract any strong-typed value via the RecordValue type-class:

class RecordValue a where
  exact :: Monad m => Value -> m a

The implementation is provided for all Value types, nodes, relationships and paths.

With these building blocks, we can now write the full code of our original example:

myConfiguration :: BoltCfg
myConfiguration = def { user = "neo4j", password = "neo4j", host = "example.com" }
main :: IO ()
main = do pipe <- connect myConfiguration
          records <- run pipe $ query "MATCH (n:Person) WHERE n.name CONTAINS \"Tom\" RETURN n"
          let first = head records
          cruise <- first `at` "n" >>= exact :: IO Node
          print cruise
          close pipe

GitHub repository: https://github.com/zmactep/hasbolt
Docs: https://hackage.haskell.org/package/hasbolt

Example Application

The example movie application that is also used for many of the other Neo4j drivers is a single page web app that just uses jQuery to talk to three different endpoints of a backend implemented in the programming language and stack of a given driver.

The three endpoints provide movie search, single movie and cast listing and graph visualization of the whole example movie database. The front end just consumes the responses from these three endpoints and renders the results in place.

The HTTP backend uses the lightweight Scotty web framework. To store the internal state with the connection pool, we can use a ReaderT monad transformer over the resource-pool. Both packages will be installed from stackage with the stack build command.

To deploy on Heroku, just follow these steps:

export app=neo4j-movies-haskell-`whoami`
heroku apps:create $app

Add the Neo4j addon and make it available from the application:

heroku addons:add graphenedb:chalk --app $app

Set the Haskell Stack buildpack:

heroku buildpacks:set https://github.com/mfine/heroku-buildpack-stack

Deploy a Heroku app:

git push heroku master

Open the application:

heroku open --app $app

Open the application on GrapheneDB:

heroku addons:open graphenedb

In the GrapheneDB user interface, use “Launch Neo4j Admin UI”. In the Neo4j Browser, import the :play movies dataset.

Haskell + Neo4j example movie application

You can find the Hasbolt example application with a very detailed README here: https://github.com/neo4j-examples/neo4j-movies-haskell-bolt

Want to learn more about graph databases and Neo4j? Click below to register for our online training class, Introduction to Graph Databases and master the world of graph technology in no time.

Sign Me Up



About the Author

Pavel Yakovlev, Director of Computational Biology, BIOCAD

Pavel Yakovlev Image

Pavel Yakovlev is the Director of the Computational Biology Department at BIOCAD. He is based in St. Petersburg, Russia.

Leave a Reply

Your email address will not be published. Required fields are marked *


Upcoming Event


Neo4j GraphTour Register Now

From the CEO

Emil's Blog

Have a Graph Question?

Contact Us

Share your Graph Story?

Email us: content@neotechnology.com

Popular Graph Topics