Graph Databases in Drupal: A Neo4j Module with Rules Integration


[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

Intro to Drupal


Drupal, an open source CMS written in PHP, is the second-most used content management system (CMS) after WordPress. What makes Drupal special is that a large active community of developers have developed a range of extensions for Drupal (modules) that solve numerous specific requirements and that integrate it with a large number of services. Hundreds of thousands of websites have been built with it. Typically chosen when building projects with extensive integration requirements and complex business logic, it competes in the enterprise market with CMSs like Adobe and Sitecore.

Drupal’s key advantage is its large number of contributed modules that accelerate development and that, when done well, to some degree help standardise development so that experienced Drupal developers can be onboarded faster. Drupal has a range of features that make it a great fit for community portals, social network sites and other systems that require groups of people to collaborate in spaces that have granular access control.

Drupal is also a powerful data modelling tool that allows administrators to build custom content types using the graphical user interface. Besides a documented API, most contributed modules also expose their functionality through a GUI, this makes it an interesting fit for prototyping purposes: you don’t need to know how to write code to do so.

Why We Built the Neo4j-Drupal Module


We wanted to create a basic integration that makes it possible for non-developers to work with Neo4j. Combined with Drupal’s content modelling capabilities, we believe it could be a powerful tool for people to explore graph database using a GUI. There is also a case to be made for the use of graph databases in the Drupal ecosystem: sites that already use Drupal could benefit from its capabilities.

Drupal already had a module for Neo4j, that was originally built by Peter Arato, a former colleague. The module, developed a few years ago, however used the HTTP-only connector, and not the superiour Bolt binary protocol. That is why we decided to reimplement the module, under a separate namespace.

With the Bolt PHP driver, the new modules we created for Drupal 7 and 8 have a better performance. For both Drupal versions, the base module is an API module that provides settings for the connection and that supplies other modules with the connection object.

This way developers can use the API module to build their own integrations (e.g., for applications that are contributed to the open source community or for integrations specific to an organisation).

Compatibility with Drupal 7 & 8


The Neo4j module has support for both Drupal 7 and Drupal 8, however the versions have slightly different feature sets. Out of the box, both versions have a page logging functionality that can help you discover browsing patterns of your visitors.

The main difference is currently the Rules support, as at this point we found it to be too much work to create a Rules integration for Drupal 8. In the long term, however, there will be more possibilities with the Drupal 8 version, because it offers much richer APIs, allowing module developers to do even more.

At present, the Drupal 8 integration only provides the default page logging functionality that stores the browsing journeys of your site’s visitors. It can however also be extended with a custom module to store other data. The Drupal 7 module has this page logging functionality and the Rules integration.

Link & Installation Instructions


The module is available on Drupal.org here. For both versions, the PHP library dependencies are specified in a composer.json file. For D7, use Composer Manager, for D8, you can manage the composer dependencies globally. Consult the Drupal documentation on composerfor more information.

Integration with the Rules Module


The Rules module is a powerful Drupal module that enables site builders to define actions that will be executed when they get triggered by an event and when their conditions are met. With the help of Rules, it is possible to set up complex site behavior without having to write code. In other words, Rules turns clickers into developers.

By building the Drupal connector module with a Rules integration, we’ve added a powerful content modelling tool to the Neo4j ecosystem that empowers non-coders to do complex things with graph databases.

For now, the Rules integration in the Neo4j module requires you to execute a custom Cypher query as an action. In the future we could imagine creating a GUI that lets you compose the Cypher query so you could select the data sources and the way you want to store them in the database. This is something we could do if there is sufficient demand for it.

At this point, you can use data sources from Drupal by including the appropriate tokens as query parameters. You could for example use the {node__nid} token to refer to the currently active page context (more details in the example below). Using tokens, the Rules module allows you to hook into a wide range of processes that happen in Drupal, that can trigger almost any data available in the Drupal context to be stored in a graph database.

Try Out the Neo4j Module


In our tutorial, we’ve used the Drupal 7 module, since it has for now more interesting capabilities.

First, download the module into your local Drupal installation, and install the connector with composer. Download Neo4j, install it, start it, open the web-interface on localhost:7474 and set the initial password. After you have enabled the Neo4j module, go to the configuration page, and enter the details for the connection.

After enabling the module, go to the Configuration page and click on Neo4j.

Neo4j-Drupal module configuration page


On the “General” tab, enter the credentials for your database. We recommend using the Bolt protocol instead of HTTP if it is available.

HTTP vs Bolt protocol in Neo4j-Drupal module


To enable page logging, check the “Page log” checkbox on the “Logging” tab and click on the “Save Configuration” button.

Neo4j-Drupal module with page logging option


The page log feature will save three types of graph nodes: Page, User and Visit. The Page contains very basic information about the page itself. The User contains the user ID. These two are only recorded once. The Visit node contains data for the specific page visit, like the timestamp or the user’s IP address. The Visit nodes are also connected for the same user, storing the user’s path on the site.

To set up the Rules example, go to the Rules admin page and create a new rule.

Drupal Rules module admin page


Add an event. For this example, use a content view of a content type.

Drupal Rules module with option to run Cypher query


When you add an action, select “Run Cypher query”, and in the settings, enter the following query:

MERGE (e:Entity { entity_id: {node__nid}, entity_type: "node" })
WITH e
MATCH (v:Visit) WHERE id(v) = {visit}
CREATE (v)-[:SEEN]->(e)

This query uses two parameters. The first one {node__nid} is generated from Drupal’s replacement tokens. The second one, {visit} is a special pattern that references the ID of the Visit node in the graph database (not the Drupal node), so it is easy to connect future nodes in the graph database to the current visit. To get the full list of available parameters (like {node__title} or field values), open the “Replacement patterns” fieldset.

In this example, MERGE (get-or-create) ensures that there is a node in the graph database for the Drupal node that is being visited and that a connection is created between that node and the current visit.

After visiting a site with this plugin enabled, the graph that is created will look something like this:

Learn about the Drupal module for the Neo4j graph database with a Rules module integration


How to Create an Integration Module


The following tutorial shows how to create a simple module for Drupal 7. The Drupal 8 version of the API module is very similar, the main difference is that the Neo4j client object is available through a service, not as a global function.

Our example module will recommend content based on user data.

Create a directory called neo4j_hello_world in sites/all/modules/custom, and a file called neo4j_hello_world.info.

name = Neo4j Hello World
package = Neo4j
core = 7.x
dependencies[] = neo4j

In the module file, create the info hook for the block:

/**
 * Implements hook_block_info().
 */
function neo4j_hello_world_block_info() {
  return [
    'neo4j_recommendation' => [
      'info' => t('Neo4j content recommendation'),
      'cache' => DRUPAL_NO_CACHE,
    ],
  ];
}

Following that, implement the block view hook:

/**
 * Implements hook_block_view().
 */
function neo4j_hello_world_block_view($delta = '') {
  $block = [];

  switch ($delta) {
    case 'neo4j_recommendation':
      $block['subject'] = t('Recommended content');
      $block['content'] = [];
      if (($page_node = menu_get_object())) {
        $recommended_nodes = _neo4j_hello_world_get_recommendations($page_node->nid);
        $links = array_map(function ($node) {
          return l($node->title, "node/{$node->nid}");
        }, $recommended_nodes);

        $block['content'] = [
          '#theme' => 'item_list',
          '#items' => $links,
        ];
      }
      break;
  }

  return $block;
}

This block only has content on node pages. The helper function _neo4j_hello_world_get_recommendations() returns the list of node objects, and the list gets transformed into links by array_map().

Finally implement the recommendation function that executes the Cypher query to get the recommended pages from the Neo4j database:

/**
 * Retrieves a list of nodes that are usually visited by users who visit node/$nid.
 *
 * @param $nid
 *   Page node id.
 * @return array
 *   List of recommended nodes.
 */
function _neo4j_hello_world_get_recommendations($nid) {
  $client = neo4j_get_client();

  if ($client) {
    try {
      $res = $client->run("
        MATCH (p:Page)<-[:OF]-(:Visit)-[:SEEN]->(pe:Entity { entity_type: {entity_type}, entity_id: {entity_id} })
        MATCH (p)<-[:OF]-(:Visit)-[prev:PREV*..10]-(:Visit)-[:SEEN]->(e:Entity)
        WHERE NOT e = pe AND e.entity_type = {entity_type}
        RETURN e.entity_id AS entity_id, avg(size(prev)) AS distance
        ORDER BY distance ASC
        LIMIT 10
      ", [
        'entity_id' => $nid,
        'entity_type' => 'node',
      ]);

      $nids = [];

      foreach ($res->records() as $record) {
        $nids[] = $record->get('entity_id');
      }

      $nodes = node_load_multiple($nids);

      $ordered_nodes = [];
      foreach ($nids as $nid) {
        $ordered_nodes[] = $nodes[$nid];
      }

      return $ordered_nodes;
    } catch (Exception $e) {
      watchdog_exception('neo4j hw', $e);
    }
  }

  return [];
}

In this function, first we make sure that the client is available. Then make sure that when you interact with the client object, you wrap the code in a try-catch block.

The most interesting part of this function is the recommendation query itself. Let’s have a look at the graph model again.

Neo4j-Drupal module recommendation engine data model


MATCH (p:Page)<-[:OF]-(:Visit)-[:SEEN]->(pe:Entity { entity_type: {entity_type}, entity_id: {entity_id} })

First, it matches all pages for the Drupal node we’re currently visiting which we already have visit nodes in the database:

MATCH (p)<-[:OF]-(:Visit)-[prev:PREV*..10]-(:Visit)-[:SEEN]->(e:Entity)

The second match matches the entities (Drupal Nodes) that can be connected to the pages from the previous match with a distance of maximum 10 page visits.

WHERE NOT e = pe AND e.entity_type = {entity_type}

The WHERE statement filters the results from the previous matches, excluding the current entity, and narrowing the results to nodes.

RETURN e.entity_id AS entity_id, avg(size(prev)) AS distance

In the RETURN statement, a simple weight is generated, from an average of the distance between the entities. This weight will be too simple for most real-world applications, but to keep things simple, it is sufficient for the scope of this example:

ORDER BY distance ASC
LIMIT 10

Finally, order by the distance and take the first 10 items.

In an example graph starting from Entity 1 on the left, all these Entities (2,3,4) on the right would be returned as recommendations:

Neo4j-Drupal module recommendation engine result graph


After the query, a FOREACH loop takes a list of node ids, and with a node_load_multiple() we load the nodes from the database that are then resorted.

Closing Thoughts


These were very basic examples of how the module can be used to start building a graph database. Please try it out and let us know how you would like to use the module; we would love to hear about any applications you think this could be used for. Also, if you run into any errors you can report them in the project’s issue queue on Drupal.org.

Neo4j is already used in production on Drupal sites, but all those implementations were one-off integrations. We hope that this project could be the start of a community effort for a full-featured Neo4j-Drupal integration that enables even complex use cases without much coding.

We at Pronovix are a Drupal consultancy. If you need a deeper integration, or if you would like to extend the functionality of the Neo4j module please let us know us. We are specialised in documentation systems and developer portals, systems that need to deliver content in the context of a user’s activities and problems.

That is why we started working with Neo4j, because we want to use it to deliver personalised help content proactively. We would like to become the go-to company for organisations that need help with their Neo4j project on Drupal. That is why we invest in this module and actively share knowledge about interesting problems graph databases can solve in the CMS world. If you want to get notified when we publish a new blog post, sign up for our Documentation Automation & Personalisation, AI, and Graph Databases mailing list.

If you want to chat with us, you can find us on the #neo4j-drupal channel at the neo4j.com/slack. You can also contact us on our company website, or connect with Kristof on Linkedin.


Want to learn more about graph databases and Neo4j? Click below to register for our online training class, Introduction to Graph Databases and master the world of graph technology in no time.