Digital twins that learn: connected asset intelligence with Neo4j and Databricks
Global Cloud Partnership Director
7 min read

Manufacturing plants are instrumented better than ever, producing sensor data at a scale that was unimaginable a decade ago. Yet when something fails, engineers still spend hours tracing the fault through disconnected systems to understand what caused it and what else is at risk.
Digital twins were built to close that gap, but most were architected as monitoring systems rather than reasoning systems. They surface anomalies, trigger alerts, and flag readings outside tolerance. The upstream cause, the propagation path through connected systems, and the fleet-wide failure pattern stay hidden.
The next generation of digital twins needs to do more than mirror an asset’s current state. It needs to understand how that asset is connected, reason across those relationships when something changes, and learn from every resolved fault. That is where Databricks and Neo4j come together: Databricks provides the governed operational data foundation, while Neo4j adds the connected asset intelligence layer that helps the twin explain what happened, what is at risk, and what action to take next.
Why asset topology is the layer that matters
Physical assets are deeply relational. A single aircraft has hundreds of systems, thousands of components, and sensor readings generated continuously across months of operations. Those components don’t fail in isolation. A bearing degrades, stressing a connected shaft and affecting a downstream assembly. The failure propagates through the asset’s topology, and the path it takes depends on configuration, maintenance history, and operating conditions that no single sensor can capture. Telemetry captures the signal, but not the structure that determines where it leads.
Databricks handles this data at scale. Time-series telemetry, maintenance events, flight records, and operational snapshots land in Delta tables, governed and queryable at volume. Anomaly detection and trend analysis run across the full operational history of the fleet.
Databricks provides the governed data foundation for telemetry, maintenance, and operational history. Neo4j complements that foundation with a graph-native topology layer that makes it possible to traverse how assets are connected and how faults propagate.
Neo4j stores relationships as physical entities in the graph, so traversing a multi-hop dependency chain across hundreds of components follows pointers rather than executing joins at query time. Fault propagation queries that are expensive in a relational system resolve in milliseconds in Neo4j. The asset topology, how components relate, depend on each other, and propagate failure, lives in the Knowledge Graph as a set of natively stored, instantly traversable relationships.
The Aircraft Digital Twin
That capability is what the Aircraft Digital Twin solution accelerator is built to demonstrate on a commercial aviation fleet where the stakes of a missed fault are high, and the asset topology is genuinely complex. Databricks holds the telemetry layer, optimized for aggregations, trend analysis, and anomaly detection at scale. Neo4j stores the aircraft Knowledge Graph, with component, system, and assembly nodes connected by relationships that reflect the asset’s actual physical dependencies.

A supervisor agent built on Databricks Mosaic AI classifies each question and routes it to the right agent. Operational and statistical questions go to the Genie Space Agent, which queries the Delta tables directly. Structural and dependency questions go to the Neo4j Knowledge Graph Agent, which uses Neo4j’s Graph Data Science library to traverse fault propagation paths, run similarity algorithms to identify components with matching failure profiles across the fleet, and apply community detection to group assets with common risk patterns. The supervisor combines their outputs into a single coherent response, providing the engineer with a single intelligent interface across both data layers.
A dual-database architecture for the digital twin
The Aircraft Digital Twin accelerator splits the workload across two platforms, each handling what it does best. Databricks Lakehouse stores hourly sensor readings across a 90-day operational window for the aircraft fleet — the kind of columnar, time-series data where aggregations, trend analysis, and anomaly detection over wide windows are exactly the right fit. Neo4j AuraDB stores the topology that gives those readings meaning: aircraft, multiple systems, components, sensors, and the maintenance events, flights, delays, and route relationships that connect them.
An anomalous bearing reading appears on Aircraft 7. In a traditional monitoring environment, that alert starts a manual investigation: an engineer pulls maintenance logs, checks whether similar components have failed on other aircraft, reviews operating history, and tries to determine whether the fault is isolated or part of a broader pattern across the fleet. That process takes hours and depends on the engineer knowing where to look.
In the Aircraft Digital Twin, that investigation starts automatically. Neo4j Agent Memory traverses the Knowledge Graph and surfaces that an identical component failed on Aircraft 3 eight months ago. The Genie Space agent retrieves the maintenance record from the Delta tables — the corrective action, the parts used, and the time to resolution. The Neo4j Knowledge Graph Agent then runs a similarity query across the fleet to identify every aircraft sharing the same component configuration, surfacing four aircraft at elevated risk. A prioritized maintenance schedule is returned before the engineer opens a ticket.
That full sequence completes in seconds: anomaly detected, history retrieved, fleet-wide risk assessed, action recommended. Each step that previously required separate manual investigation across disconnected systems now runs as a single connected query through the graph.
That difference compounds over time. Neo4j Agent Memory stores resolved faults, past queries, and learned patterns as connected data in the Knowledge Graph alongside the structural asset topology. Each fault the agent resolves narrows the search space for the next one. The agent accumulates a structured record of how this specific fleet fails, drawing on every incident it has resolved and every engineer’s decision stored in the graph. This is what transforms a digital twin from a monitoring tool into an asset intelligence platform that improves with use.
In this context, “learning” does not mean replacing engineers or making opaque decisions. It means preserving the context of prior incidents — what failed, what else was affected, what action was taken, and what the outcome was — so future investigations start from accumulated operational knowledge rather than a blank slate.
Outcomes
For maintenance teams, the practical difference is that problems get smaller before they get bigger. Faults that previously escalated into unplanned groundings are caught earlier because the agent surfaces fleet-wide risk before secondary failures occur. Engineers spend less time on manual investigation and more time on the decisions that require human judgment. And maintenance planning improves over time because every resolved fault adds to the graph’s record of how the fleet actually behaves in operation.
| Outcome | Business Impact | What enables it |
| Governed fleet intelligence | Sensor readings across the full fleet are consolidated, governed, and ready for anomaly detection and trend analysis at volume | Databricks Lakehouse stores time-series telemetry in Delta tables, queried through Genie Space |
| Faster root-cause analysis | Engineers see what sits downstream of a degrading component before a secondary failure forces an unplanned grounding | Neo4j stores asset relationships as native pointers, resolving multi-hop fault propagation paths in milliseconds |
| Proactive maintenance planning | Maintenance planning targets every aircraft sharing the failing configuration, not just the one that triggered the alert | Similarity queries across the knowledge graph identify matching component configurations fleet-wide |
| Unified engineer experience | Engineers ask in natural language and get a combined response across both telemetry and topology | A supervisor agent on Databricks routes to the Genie agent or the Neo4j agent and combines the result |
Get started
Digital twins are only as useful as the context they can reason across. By combining Databricks’ governed telemetry and operational data with Neo4j’s graph-based asset topology, teams can move beyond monitoring toward connected asset intelligence that explains what happened, identifies what else is at risk, and improves with every resolved fault.
Explore the Aircraft Digital Twin solution accelerator on GitHub to see how Databricks and Neo4j bring telemetry, topology, agentic workflows, and graph memory together in a working reference architecture.








