QIAGEN Turns 25 Years of Biomedical Curation into an AI-Ready Graph Intelligence Platform with Neo4j

25+ years

Continuous manual review of full-text literature, methods, and supplementary data by PhD-level scientists.

4 hours

Time PharmaEssentia needed to identify a new acute myeloid leukemia (AML) indication opportunity for an existing interferon drug using QIAGEN’s Neo4j-powered knowledge graph.

643M

Curated links between genes, diseases, drugs, pathways, and omics data in QIAGEN’s Biomedical Knowledge Base (BKB). The dataset can be loaded in minutes.

QIAGEN Digital Insights (QDI) sits at a critical junction in life sciences. Its mission is to take customers from biological sample to actionable insight. Customers range from discovery scientists and translational researchers to clinical trial designers and physicians interpreting complex reports.

They all face the same problem.

“Our customers today face just a deluge of data,” says Venkatesh Moktali, director of product at QIAGEN Digital Insights. “Ultimately they are faced with lots of different forms of data, and it’s non-trivial to go from there to answers they can digest and use.”

Genomic profiles, omics datasets, public databases, trial outputs, and constant new publications all compete for attention. Somewhere in that mix are the connections that matter: a gene that links two diseases, a pathway that explains response to a drug, a target that makes a trial worth running.

For more than 25 years, QIAGEN’s scientists have purchased full-text biomedical papers and manually extracted validated, empirical relationships between genes, diseases, drugs, and pathways. The result is the Biomedical Knowledge Base (BKB) — a curated asset that few organizations could rebuild from scratch. This depth of harmonized, expert-validated knowledge is difficult to replicate and increasingly serves as the foundation for AI-driven drug discovery systems, where model quality depends on the quality of underlying biological truth.

PharmaEssentia is a Taiwan-headquartered biopharma company focused on blood disorders. Their flagship product, Ropeginterferon alfa-2b, is an FDA-approved interferon therapy for polycythemia vera — a rare condition where the body makes too many red blood cells. PharmaEssentia faced years of research to identify opportunities for treating acute myeloid leukemia. Using QIAGEN’s Neo4j-powered Biomedical Knowledge Base, researchers uncovered validated connections between their drug’s mechanisms, affected genes, and disease pathways in just four hours.

For pharmaceutical companies, that’s the difference between beating competitors to a new indication or watching research investment become obsolete. For patients, it means access to life-saving medication years sooner. The breakthrough comes from QIAGEN’s decision to transform two decades of PhD-level curation into a graph intelligence platform powered by Neo4j. By structuring more than 25 years of curated biomedical relationships into a graph-native architecture, QIAGEN has created a programmable intelligence layer that pharma teams can embed directly into their analytics platforms and emerging AI systems.

SQL Joins and Static Tables Couldn’t Keep Up with Biology

Before graph, QIAGEN delivered curated content through custom formats and SQL databases. Customers had high-value data, but three recurring problems slowed them down.

  • Expertise bottlenecks
    Answering realistic questions—such as “find all drugs targeting genes upregulated in a specific cancer that also interact with proteins in the apoptosis pathway”—meant writing multi-join SQL across several tables. Only the most technical scientists could do this, and even they relied on QIAGEN’s support and templates. Time went into query mechanics instead of biology.
  • Invisible networks
    Drug discovery often advances when a scientist spots an unexpected link. In relational databases and document stores, those networks stay hidden in tables or nested JSON. There’s no natural way to see how one gene connects to a disease, a drug, and a pathway in a single view or to freely traverse multiple hops.
  • Schemas that can’t keep up
    Biology changes quickly. New data sources, disease classifications, and measurement types appear constantly. With relational tables, adding a new source or entity type late in a project meant revisiting schema design and migrations instead of focusing on science.

QIAGEN needed a way to represent biology that matched its real structure and let customers ask complex questions without rebuilding their data model every time.

Graph Modeling Matches How Biology Actually Works

Graph modeling matches the shape of QIAGEN’s core asset: millions of many-to-many relationships between biological entities that researchers need to traverse quickly and explain. By deploying BKB on Neo4j, QIAGEN transformed its curated knowledge from static content into an operational intelligence backbone, one that can power enterprise analytics, custom applications, and graph-grounded AI workflows.

Unstructured literature, structured databases, omics experiments, and ontologies become nodes and relationships: gene–disease associations, drug–target links, disease–pathway connections, cross-experiment relationships. Graph databases make it straightforward to follow those links several steps out and ask, “show all FDA-approved drugs that touch any gene in this pathway and have evidence in AML-related studies.”

QIAGEN chose Neo4j for three main reasons:

  • Customer pull and ecosystem
    Many pharma customers were already using Neo4j. “We wanted to meet our customers where they were, not force them onto a different stack,” says Moktali. Many leading pharmaceutical organizations were already standardizing on Neo4j as internal graph infrastructure. Aligning BKB with that ecosystem reduced integration friction and allowed QDI’s curated intelligence to plug directly into customer data science and AI environments.
  • Cypher and usability
    “With SQL, there is a long learning curve,” notes Moktali. “One of our field scientists had no experience with Neo4j or Cypher a year ago. Now he’s doing fluent presentations, building complex queries, and showing all the ways we can answer questions.” That shorter ramp lets domain experts—not just data engineers—build and demonstrate graph workflows.
  • Visualization and ‘aha’ moments
    Neo4j Browser and Neo4j Bloom give scientists a way to see networks of genes, diseases, and drugs, then overlay graph algorithms to surface important nodes. “Data is abstract until you visualize it,” says Moktali. “Neo4j made those ‘aha moments’ possible.”

Building a Living Knowledge Graph from 25 Years of PhD-Level Curation

QIAGEN’s BKB captures: genes with functional annotations, diseases with genetic and clinical links, drugs with molecular targets and mechanisms, biological pathways, and curated omics datasets from public initiatives.

Every relationship traces back to empirical evidence in literature or high-quality datasets. QIAGEN scientists read articles end to end, including methods and supplementary data, and encode only relationships that meet strict curation standards. BKB intentionally avoids patient identifiers and protected health information; customers connect it to proprietary data under their own controls. In practice, this means BKB functions not merely as a dataset, but as a trusted biological substrate on which downstream AI systems and scientific applications can reliably operate.

QIAGEN adds new relationships daily and ships quarterly BKB releases. Customers deploy the graph on Neo4j Enterprise or Neo4j Aura, in their cloud or on-premises, then blend it with internal datasets such as omics experiments, screening hits, trial arms, and portfolio plans. QIAGEN licenses this continuously updated biomedical intelligence layer as a foundation that customers integrate into internal platforms, discovery pipelines, and increasingly, AI copilots and agent-based systems.

In practice, the workflow looks like this:

  1. Curate and harmonize – QIAGEN scientists continuously review publications and datasets, identify empirical relationships, and map them into a harmonized ontology.
  2. Ship the Neo4j graph – Curated entities and relationships are loaded into Neo4j as nodes and edges, enriched with metadata and schema, and released quarterly.
  3. Extend with proprietary data – Customer teams add nodes and relationships for internal compounds, experiments, and ontologies without redesigning rigid schemas.
  4. Explore and model
    • Scientists use Neo4j Bloom or Browser for interactive exploration and hypothesis generation.
    • Internal applications call Neo4j via the Python driver, often running on AWS Lambda.
    • Features from Neo4j Graph Data Science feed into models in Amazon SageMaker.
    • LLMs in GraphRAG prototypes query Neo4j first to ground generated answers in factual relationships.

PharmaEssentia: From Drug Approval to a New Cancer Indication in Four Hours

PharmaEssentia, a biopharmaceutical company, shows how this setup changes pace. The company had an FDA-approved drug, Ropeginterferon alfa-2b, for treating polycythemia vera. The team wanted to know whether the same molecule could help patients with other hematological malignancies.

Using QIAGEN’s Neo4j-powered BKB, they:

  1. Represented their drug in the graph with links to known targets and mechanisms.
  2. Queried from that node across curated gene–disease, gene–pathway, and drug–disease relationships.
  3. Focused on connections relevant to acute myeloid leukemia (AML) biology.

In about four hours, they identified compelling evidence that Ropeginterferon alfa-2b could play a role in AML, surfacing connections that would have been extremely difficult to see in disconnected tables or document searches. The work led to a poster at the American Society of Hematology (ASH) conference on indication expansion opportunities for the drug.

“Every time you’re doing drug repurposing, you’re asking: is my approved drug playing a role in positively affecting another condition we’ve never looked at?” says Moktali. “In the five years since a drug gets approved, a lot happens. New studies get published. New relationships get established. We capture all of that. When you bring in your drug and start seeing all its connections, you can identify new opportunities.”

For pharma companies, that speed can be the difference between leading an indication and arriving late. For patients with rare or difficult diseases, it can mean a viable treatment enters trials years sooner.

What Comes Next: GraphRAG, Grounded LLMs, and the End of Hallucinated Biology

Repurposing is just one application customers run on QIAGEN’s graph. Others include target identification and prioritization, where teams use the graph to move from long gene lists to a focused set of high-confidence targets, and clinical trial optimization, where connected knowledge helps refine inclusion criteria and combination strategies. Large pharma organizations also embed the BKB into internal platforms that serve researchers across therapeutic areas.

As more customers experiment with LLMs and agentic systems, QIAGEN’s focus on curation is becoming central. “The challenge with AI in drug discovery is hallucination,” says Moktali. “LLMs generate plausible answers that are factually wrong. In an industry where incorrect information derails years of research, that’s unacceptable.”

Working with Neo4j and NVIDIA, QIAGEN is developing graph-grounded AI architectures in which Cypher queries against Neo4j provide a factual backbone for generated outputs. In these systems, large language models do not rely on unstructured text alone — they reason over curated biological relationships encoded in the graph. This approach reduces hallucination risk and enables AI systems that can explain conclusions by pointing to specific evidence paths.

QDI is now bringing all of its subscription data products into Neo4j. The curated literature graph was the first step; standardized omics comparisons and AI-curated content are next.

“Potentially by next year, we would have integrated all of our subscription-based data products into the Neo4j platform,” says Moktali. “Our customers can then use that foundation for whatever comes next—whether it’s advanced AI, new trial designs, or questions we haven’t thought of yet.”

Every time a pharma team moves from months of manual literature review to hours of graph exploration, more time is freed for experiments, design, and patient care. “Our expertise is in curation and data quality,” says Moktali. “We’ve spent more than 25 years building this so our customers don’t have to. Together, QIAGEN’s curated biomedical intelligence and Neo4j’s graph infrastructure provide a scalable foundation for AI-driven drug discovery at enterprise scale.”

Partners

  • Amazon Web Services (AWS)

Use Cases

  • Drug Discovery
  • GenAI

Industry

  • Healthcare & Life Sciences

Products Used

  • Neo4j Bloom
  • Neo4j Graph Database
  • Global

Explore More