Syngenta Transforms Chemical Intuition into Digital Precision with Neo4j

Tens of thousands

Reactions modeled in the Neo4j knowledge graph

10,000 in <10 seconds

Synthetic routes extracted via a custom Neo4j route-mining plugin

99%

Reaction data now graph-ready, up from 1% before structured validation

Above: Syngenta operates where biology, chemistry, and data meet.

Syngenta operates where biology, chemistry, and data meet. With 30,000 employees in more than 100 countries, the company develops crop protection solutions that help farmers feed a growing population. The core of this work lies in chemical synthesis: the intricate process of building target molecules from available starting materials.

For years, the life sciences industry treated reaction data as a byproduct of experimentation — notes scribbled in lab notebooks or buried in PDFs. Syngenta recognized that to accelerate discovery, they needed to treat this data as a primary asset.

Syngenta is building an informatics platform that supports scientists in synthesizing active ingredients. The “Data for Synthesis” platform uses Neo4j to organize millions of disconnected data points into a unified knowledge graph. Scientists will use this system to increase synthetic efficiency and optimize supply chains before a single beaker is filled.

The Challenge: The Lab Notebook Problem

A synthetic route is a complete recipe for building a complex molecular compound from simpler, commercially available starting materials. A chemist must choose a particular sequence of reactive steps toward the target, finding an optimal trade-off between multiple criteria — cost, efficiency, speed, sustainability factors, and more. For each step, a number of possible reactive conditions (temperature, pressure, etc.) are usually possible and should be accounted for. Designing a synthetic route is a logic puzzle of immense complexity, and open to multiple solutions.

Using a traditional relational database, answering questions becomes computationally difficult:

“Show me all synthetic routes to this intermediate that use a specific catalyst.”

“Are there alternative routes to synthesize this target molecule?”

“What is the shortest route to this target?”

“Which steps are common across multiple routes to this compound?”

Path-based questions are awkward to express in a tabular model. As the amount of available chemical reaction data grows, it becomes increasingly hard to identify and reason about connections across multiple reactions. The problem is also the inherent complexity of modeling synthesis routes in tables, which are not well suited to represent this kind of interconnected data.

“The synthetic route is the main business object we work with,” explains Nataliya Lopanitsyna, a Graph Specialist and Backend Developer at Syngenta. “It is a sequence of chemical transformation steps. In the past, information about those steps was spread across different database tables, Word documents, and PowerPoint presentations. How do you make the connection? It’s difficult with anything other than graphs.”

The limitations of tabular data created a provenance gap. If a colleague in a different country had already failed at a specific reaction condition, there was no easy way to flag that risk automatically. The data existed, but it wasn’t connected.

“We needed to shift from viewing reaction data as a byproduct to viewing it as an asset,” says Lopanitsyna. “We needed a system that starts from the design phase, captures data clearly, and feeds it back to our chemists so they can make a more informed decision.”

The Solution: Mapping Chemistry as a Graph

Syngenta chose Neo4j as a pillar of its Data for Synthesis platform. The decision was driven by the natural alignment between chemical structures and graph topology. A chemical reaction network is inherently a graph: molecules and reactions are entities, and the edges represent their roles (product, reactant).

The team began by analyzing the daily workflows of their chemists. They mapped the questions scientists asked most frequently and realized they translated almost directly into Cypher, Neo4j’s query language.

“Chemists can now ask questions like, ‘How can I find the shortest path from this target to the starting materials?’ or ‘Are there alternative paths to synthesize this specific compound?’” says Lopanitsyna. “These are graph traversal questions. In a relational database, you are limited to a few steps before the joins become hard to manage scale. With Neo4j, we can traverse complex reaction networks across 10+ steps without hard-coded depth limit and in an interactive way.”

Bridging Code and Chemistry with “Noctis”

Syngenta built and open-sourced a Python package called Noctis that lets researchers and developers turn their own reaction datasets into navigable knowledge graphs, ready to be explored in Neo4j.

Noctis acts as a bridge between reaction data, the Neo4j database, and the Pythonic world where data scientists live. It converts graph objects — nodes and relationships — into Python classes that can be easily manipulated, validated, and analyzed. This allows the team to build complex validation logic and schema expansions without rewriting the core database interaction code.

“Noctis allows us to translate graph objects into Python classes so we can manipulate them,” Lopanitsyna explains. “At its core, Noctis is schema-agnostic. We’ve added a base layer tailored to chemical synthesis, but anyone can extend it however they need. We released it as open source to give back to the community and to show that we are doing work that stands up to peer review.”

From Design to Experimentation

The platform supports a workflow that links design, experimentation, and analytics. When a chemist designs a route, the system captures that intent. As experiments run, the data is linked back into the graph, enriching the network.

This connectivity is intended to address the data silo problem. A route designed for a manufacturing plant, for example, can inform a research chemist working on a similar molecule in the lab. Similarly, graph-based views make it possible to identify dependencies on specific intermediate chemicals, opening the door to analyses such as supply chain risk.

“Consider a sequence of reactions done for Purpose A and another for Purpose B,” says Lopanitsyna. “In the graph, they collapse into each other. You can see that if someone has done optimization for this route, you can use that same optimized path in another project.”

The Cultural Shift: Connected Data

The transition to a graph-based workflow required a cultural shift as chemists were asked to trust a digital system.

“It was a challenge to change a process that has been optimized by humans for years,” admits Lopanitsyna. “Chemists are often driven by experience, translating known solutions to present problems.”

Today, Syngenta’s proof-of-concept graph includes tens of thousands of reaction records covering unique molecules. Beyond core chemical entities, the team implemented an extended schema layer designed specifically for chemists, enabling advanced searches and flexible groupings that mirror how scientists actually reason about synthesis.

Performance has reinforced confidence. A custom route-mining query — developed as a Java plugin in collaboration with Neo4j — can extract 10,000 potential synthetic routes in under 10 seconds, enabling chemists to explore complex reaction pathways interactively rather than waiting for batch-style analysis.

Equally important was a breakthrough in data readiness. Before introducing a structured validation gate into the workflow, only about 1% of historical reaction data could be converted into graph form without manual curation. Today, nearly 99% of reaction data can be represented directly in the graph, transforming what was once fragmented and inconsistent into a structured, reusable knowledge asset.

The turning point came when chemists realized the graph could reveal “negative results” — experiments that failed. In traditional publishing and internal reporting, failures are rarely indexed, leading to doomed repetition. The knowledge graph captures the full context of experimentation.

“Now, when they go back and compare it to the previous system, they say, ‘Actually, now it is much nicer,’” says Lopanitsyna. “It frees them from the search for data so they can focus on the science.”

Impact and Future: The AI Frontier

The Data for Synthesis platform is building the foundation for advanced AI and machine learning applications. By structuring data as a graph, Syngenta creates a clean, verified dataset that can train the next generation of chemical AI models.

By open-sourcing the Noctis package, Syngenta has positioned itself as a technology leader in the scientific community. It signals a move away from secretive, proprietary data handling toward a collaborative, standardized approach to chemical informatics.

“We have the ambition of attracting the best talent by contributing to the scientific community,” notes Lopanitsyna. “When you publish your work openly, the chemists gain confidence too — it shows that this is a scientifically valid approach.”

Syngenta’s move to Neo4j is a restructuring of how chemical knowledge is stored and accessed. By mapping the relationships between molecules, reactions, and results, Syngenta transforms disconnected data points into a navigational chart for discovery. As Syngenta looks ahead, the knowledge graph will only grow in importance.

“We are moving from being data-driven to being knowledge-driven,” says Lopanitsyna. “When you can see the connections, you can optimize the entire journey from idea to product.” The company plans to expand the graph to include more experimental data, linking chemical properties to biological outcomes. This will enable scientists to design more effective and sustainable crop protection solutions.

Partners

  • Amazon Web Services (AWS)

Use Cases

  • Knowledge Graph
  • Research & Development

Industry

  • Healthcare & Life Sciences

Products Used

  • Neo4j Graph Database
  • Europe

Explore More