Enhancing Data Trust & Transparency: Governance and Migration with IBM Manta Data Lineage and Neo4j

Financial organizations need to be able to trust their data and work with it effectively. That includes the ability to:

  • Optimize and modernize the way data moves through their organization
  • Deliver data and cloud migration projects faster, at a lower cost
  • Comply with regulatory requirements around data governance and privacy

To effectively govern and analyze data, businesses must understand how data flows and transforms across various systems, including cloud databases, operational systems, data warehouses, and mainframe sources. Data quality and explainability become even more critical as businesses integrate AI into their workflows and data becomes more complex.

IBM Manta Data Lineage is a world-class data lineage platform. It helps customers answer questions like these by automatically scanning code, custom SQL, ETL jobs, and business intelligence reports. It then presents data flows in an interactive, color-coded map that shows end-to-end lineage.

The first generation of the Manta platform was built on a relational database management system (RDBMS) — quickly showing the limitations of this technology. This is because relational databases aren’t built or optimized to handle connected data. The system struggled to handle the relationship-based queries necessary to review how objects are laced together to track their data flow.

Graph technology could provide the performance, flexibility, and agility Manta needed, but initial attempts using an open-source Titan graph database failed to keep up with growing demands. Customers needed a partner in lineage who could handle the full scope and complexity of their changing data environment.

Manta conducted a six-month proof of concept involving multiple commercial graph database vendors, ultimately choosing Neo4j for its performance, maturity, and responsive support team.

Multi-Hop Graph Queries Deliver Faster, Deeper Understanding of Data

The Neo4j property graph data model delivers high performance for multi-hop queries, dividing a single question into multiple sub-questions across multiple documents and sources. This eliminates slow and expensive join operations required for relational databases. Companies can now address lineage questions in real time, enhancing the platform’s holistic visibility.

In addition to displaying a graphical data lineage map of data and relationships as of the current moment, IBM Manta Data Lineage allows users to conduct introspective queries to understand what has happened to data in the past. For example, customers can view the lineage for a Tableau report for the current quarter, at the start of their fiscal year, or any other point in time. Without Neo4j, these queries would take far longer to run.

Neo4j Graph Database Powers Key Data Lineage Use Cases

Regulatory Compliance and Governance

Financial industry and data privacy compliance standards such as Basel II, BCBS 239, and GDPR impose strict requirements for data governance and stewardship, including being able to show data lineage. Neo4j Graph Database helps businesses fulfill this custodial duty and produce evidence for auditors, such as where personal data is stored and how it moves through the organization.


DataOps teams must ensure the seamless flow of data pipelines to uphold a data-driven business. Before a database administrator makes a change to a Microsoft SQL Server system, for example, they first need to make sure they’re not going to disrupt downstream Power BI reports that analysts depend on. Thanks to graph technology, DataOps professionals can quickly track lineage both forward and backward to streamline debugging and application reliability and maintenance.

Cloud Migration

Clear visibility into data lineage helps companies make accurate decisions about cloud migration. Data lineage helps customers determine the tools and systems are still in active use, so they can reduce licenses for unneeded environments, optimize their migration, and speed up their cloud ROI.

A data lineage query highlights the downstream impact of a field edit in red.
Above: A data lineage query highlights the downstream impact of a field edit in red.


Industry-Leading Graph Technology Drives Competitive Advantage

The impact of Neo4j on Manta’s business has been considerable:

  • Enterprise-class performance. Neo4j meets the considerable performance and reliability expectations of IBM Manta Data Lineage’s large-scale banking, insurance, manufacturing, and government customers, with tens of millions of relationships between assets.
  • Rapid innovation. From broad API support to 100% Java development, the Manta engineering team has found Neo4j much easier to code with than other solutions. As a result, Manta can quickly add new technologies, capabilities, and enhancements to its platform.
  • Future-proof business. Neo4j’s potential to scale with techniques like sharding was a critical factor in IBM’s choice of Neo4j.

Manta sees exciting new use cases for data lineage as generative AI and large language models (LLMs) transform businesses.

Get in Touch

Curious about what insights you could unlock for your business with graph-powered solutions? Let’s talk — reach out, and we’ll get in touch.