Gilead Sciences Combats $431 billion Pharmaceutical Fraud Threat with Neo4j Graph Analytics
Global Pharmaceutical Leader Deploys Neo4j AuraDB on AWS to Protect Patient Safety and Preserve Program Integrity
1000x
faster fraud pattern detection compared to relational databases
20%
improvement in fraud detection rates using graph analytics
50%
reduction in false positives for fraud investigations

Above: Gilead Sciences Headquarters in Foster City, CA
Thousands of counterfeit bottles line the shelves of a locked war room at Gilead Sciences headquarters. These pills and prescription medications are confiscated evidence in an invisible war against fraud that now represents a $431 billion global threat.
“When we’re preventing fraud, it’s because criminals are defrauding programs meant to benefit folks who can’t afford medication,” explains Thomas Luu, Associate Director of Global Product Security at Gilead. “Everything goes back to patient safety. When fraud happens, that’s who it affects the most.”
Counterfeiting is never just a financial crime. The human cost becomes devastatingly clear when patients receive antipsychotic medication instead of Gilead’s life-saving HIV drugs, or bottles filled with rocks instead of medicine.
In 2023, Thomas Luu’s anti-fraud team worked with authorities in Florida to dismantle a $230 million counterfeiting ring, detailed in CNBC’s “Fraud in a Bottle“. The fraudulent activities Luu uncovered were networked, fast-moving, and designed to hide across prescribers, pharmacies, clinics, and distributors. Signals lived in different systems and evidence was scattered. Much of the analysis depended on Luu meticulously stitching data together with pivot tables in Excel.
The challenge has intensified with the expansion of telemedicine. Some providers now write as many as 150,000 prescriptions annually, within legal limits of their states. That’s one script every three minutes for 24 hours a day. Automated questionnaire systems bypass traditional physician-patient interaction requirements, with prescribed drugs delivered out-of-state. At this scale, fraud becomes nearly impossible to distinguish from legitimate telemedicine. Luu recognized that existing approaches could not match the speed or intensity of the evolving fraud tactics he saw firsthand. Gilead needed a way to identify and act on the relationships between these crimes in real time. Patient safety and program integrity are on the line in each investigation, where legal teams must build defensible, explainable narratives that stand up to scrutiny.
“Excel sheets had taken us as far as they could. Neo4j offered us a way to pull down the barriers between our data sets and build something entirely new,” says Luu.
Breaking out of the Excel Bottleneck
“I was the bottleneck,” Luu admits. “I tried to teach many people over the years to do this analysis, but the data sets are complex enough where you’d have to really know both the data and the business reasons why things happen.”
In the Florida “Fraud in a Bottle” investigation, criminal networks sent buses to homeless encampments, offering cigarettes and gift cards for free medical checkups. These vulnerable individuals were shuttled to clinics, subjected to tests billed at inflated rates, and prescribed HIV medications they neither wanted nor needed — all while the drugs disappeared into black markets. These multi-layered operations create exactly the kind of complex relationship patterns that traditional databases struggle to detect. Traditional fraud detection methods rely on siloed datasets and rule-based systems. These approaches are ineffective against multi-party networks. International fraud operations often involve cryptocurrency transactions, virtual private server masking, and nominee ownership structures. These tactics require multi-dimensional analysis that traditional row-and-column databases can’t provide.
Neo4j’s graph intelligence platform transforms this scattered information into knowledge by storing relationships as first-class citizens, making it easy to traverse connections in real-time. When fraudsters use the same phone number to register at clinics with multiple fake profiles, graph algorithms surface these patterns so they can be flagged for investigation.
Unifying Data Sources in Neo4j AuraDB
While Gilead had initially explored Amazon Neptune as the incumbent graph solution within its AWS environment, Luu determined that Neo4j AuraDB graph database offered a more mature, end-to-end graph platform.
“Our mission is fraud detection, and speed to value was critical. We evaluated Neptune, self-hosted solutions, or managed services, but the decision came down to efficiency. Each alternative would require significant upfront infrastructure work before we could focus on our core objective. Neo4j AuraDB on AWS eliminated those barriers. We could immediately focus on what matters: building sophisticated fraud detection capabilities, rather than spending months building and managing the platform. “
Luu’s first challenge was unifying data sources that had never been designed to work together. The Gilead fraud detection graph brings together five critical datasets: patient assistance program (PAP) claims, copay assistance records, commercial sales data, prescription claims, and geographic information. Each dataset told only part of the story.
“Previously, we could only look at one data set at a time,” Luu explains. “The folks in operations processing rebates weren’t looking for fraud. But the fraud signals become clear when you can see sales declining while claims from the same pharmacy are increasing.”
Another breakthrough came through entity resolution: the ability to match records across systems despite inconsistent spellings and deliberately obscured identities. A prescriber might appear as “Dr. John Smith” in one system, “J. Smith, MD” in another, and “John A. Smith” in a third. Traditional databases would treat these as separate entities yet Neo4j users can implement string similarity functions like Levenshtein distance, Jaro-Winkler, or soundex to identify likely matches and link these entities together.
The system now processes prescription patterns, pharmacy relationships, physician networks, and patient locations as interconnected entities rather than isolated data points. When a criminal network spans multiple states with dozens of participants, the graph can trace these relationships through the entire network. Geographic data becomes particularly powerful. The system can immediately flag when a physician’s patients all fill prescriptions at a single pharmacy, or when supposed local patients are scattered across multiple states for routine medications that should be prescribed locally.

Above: Gilead’s AWS Architecture
Data flows from multiple sources within Gilead’s AWS ecosystem through Starburst for processing before loading into Neo4j AuraDB. This hybrid approach lets Gilead keep their existing AWS data infrastructure while adding purpose-built graph capabilities. The Neo4j Connector for Apache Spark enables efficient data movement between AWS services and the graph database. Amazon Location Service provides geocoding capabilities that enhance fraud detection algorithms.The architecture supports real-time analysis through Neo4j Bloom’s visualization interface and batch processing through Apache Spark integration. Databricks notebooks provide the development environment for data scientists to build and refine graph algorithms, while the Unity Catalog ensures data governance across the entire pipeline.
Behind the scenes, Gilead deploys Graph Data Science algorithms including GraphSage, Louvain community detection, and PageRank centrality measures. When investigators need to trace a suspicious network, they’re working with algorithms designed specifically for relationship analysis rather than forcing relational databases to perform tasks they weren’t built for.
Building a Foundation for Agentic Fraud Detection
Today, Gilead’s graph analytics achieve 1000x faster fraud pattern detection compared to relational databases, with 15-20% improvement in detection rates and 20-50% reduction in false positives. Investigation times that previously required weeks of manual analysis now complete in hours or minutes. For patients, this speed translates to life-saving protection. Criminal networks distributing dangerous counterfeits get shut down before thousands more people receive bottles filled with counterfeit HIV medication.
The platform has saved the equivalent labor of a full-time six-person team, significantly reducing manual effort while making data analysis easier for investigators without deep data science backgrounds. Gilead’s legal team funds its operations through successful fraud recoveries — so speed and accuracy are paramount. AuraDB provides enterprise-ready solutions for identifying fraud and building strong recovery cases. And as telemedicine reshapes healthcare delivery, Gilead can match this evolution, with flexible schemas that accommodate new fraud patterns without database refactoring.
Gilead’s fraud graph implementation represents just the beginning of a larger transformation. Luu envisions an agentic AI approach where investigators can simply ask, “What are the ten most likely fraudulent entities in the last three months?” and receive case packages including supporting documentation, transaction histories, and relationship maps.
“We’re democratizing the data,” Luu explains. “Investigators who have no idea how to work with data can now see connections clearly.With counterfeiting incidents increasing 35-fold since 2002 and current anti-counterfeiting measures only 50% effective according to PwC research, traditional approaches need to adapt. As Luu observes, “All of the future AI is going to be based upon this model because it’s the only framework that will be fast enough to keep up.”