Creating a Graph of Chemical Reactions


Photo by Omkar Jadhav on Unsplash

Chemical reactions produce a substance that can act as an input for another chemical reaction. This concept is represented visually as a graph, and it’s enough to get me excited about exploring its potential in Neo4j!

What Is ORD?

The Open Reaction Database (ORD) is an open-access database for storing and sharing information about organic reactions. The project aims to create a standardized system for data representation and infrastructure, enabling the development of applications that will significantly enhance the current state of computer-aided synthesis planning, reaction prediction, and other predictive chemistry tasks. By providing a centralized repository, ORD hopes to facilitate the sharing of data and promote collaboration within the scientific community.

Logo of Open Reaction Database

Loading It In

Get the full code at GitHub wagenrace.

In our quest to understand the intricacies of chemical reactions, we focus on three key aspects: input components, output components, and the reaction itself. We use the International Chemical Identifier (InChI) to connect these elements seamlessly. What sets the InChI apart is its deterministic nature, allowing me to quickly generate it in milliseconds if needed. It also means the InChI is there without an institute having to assign it.

In ORD, not every component has an InChI. If one is missing, we can use SMILES (a simple text-based representation of a molecular structure) to calculate the InChI instead. To do this, we use Rdkit, a popular open-source software tool for cheminformatics available in Python:

chem = Chem.MolFromSmiles(smiles)
inchi = Chem.MolToInchi(chem)

We tested the accuracy of using SMILES to calculate the InChI of 7,672,422 components in ORD. The findings reveal that the method produces a different InChI in only 0.069 percent of cases, indicating that using SMILES for this purpose is highly reliable. This suggests that SMILES can be used as a safe alternative to InChI when missing.

If we only take the reactions where all components have InChI, we have 1,460,743 reactions. However, if we include the reactions where all components have InChI or SMILES, we get 1,955,165 reactions.

Our final model is simple. We have two nodes: components and reactions. We also have relationship types, input, and output.

Exploring Cycles

With our data now stored in our Neo4j instance, we can begin analyzing it. To start, let’s locate some circles within the graph. We can use the Apoc package and its Cycles function to accomplish this. By executing this command, we will identify the circles present in the graph:

MATCH (m1:Reaction) WITH collect(m1) as nodes
CALL apoc.nodes.cycles(nodes, {relTypes: ["INPUT", "OUTPUT"]})
YIELD path
RETURN path LIMIT 3

We find many cycles, but this one is nice and small between reaction d096 and reaction e6bb.

Example of a circle reaction in Neo4j

Summary

Creating a graph of chemical reactions using tools like Neo4j and using databases such as ORD can greatly enhance our understanding of chemical processes and their interdependencies. By representing chemical reactions as nodes and establishing relationships between input and output components, we can visualize complex reaction networks and explore their potential applications.


Create a Graph of Chemical Reactions was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.