Data Lineage Tool Improves Risk Management, Drives Compliance
Challenge
UBS needed to comply with regulations put in place to strengthen systems for risk data
aggregation and internal risk reporting in the wake of the 2007 global financial crisis. Specifically, UBS sought compliance with the Basel Committee on Banking Supervision issued standard 239 (BCBS 239).
Under this regulation, banks need to provide transparency into the data flows that feed their risk reporting. This requires broad data governance and detailed data lineage.
Data lineage is an essential component of risk management. Data lineage involves tracking the entire lifecycle of information – its origin, evolution and movement through the organization.
With data lineage, organizations can track information as it flows through the enterprise,
monitor its quality, discover errors and trace them to the source, minimize damage and reduce
data duplication.
UBS built an application called Group Data Dictionary (GDD) as its data lineage and data
governance tool.
The first iteration was built on Oracle, but UBS soon discovered limitations with an RDBMS
approach, which relies on JOINS to connect data across tables. UBS decided it needed a
better solution suited to creating real-time data lineage visualizations and exporting lineage
information for ad-hoc analysis via Excel.
Solution
Data lineage is a problem that is best solved using connected data.
“Data lineage is a series of highly connected data, and is more naturally persisted in a graph
database,” explained Sidharth Goyal, a senior software engineer and technical lead at UBS.
Neo4j offered several advantages over a relational database, including querying using Neo4j’s Cypher query language.
“Cypher allowed us to much more easily traverse connected data, especially compared to PL/SQL, which relies on JOINS across multiple tables to generate the lineage in a relational database format, add a processing layer to format this as an object and then visualize it. Cypher and Neo4j are a much more natural fit for the work we’re trying to do,” said Goyal.
The new data lineage and data governance tool would need to smoothly integrate with
the legacy system. All UBS workflows and auditing capabilities remained on Oracle, so
synchronization was essential.
UBS synchronized Neo4j with the Oracle system, starting with an initial data load and then
performing an incremental sync in which transactions were read from an Oracle table and
written into Neo4j in real time.
UBS used Neo4j to evaluate data lineages and depict the results in GraphJSON. This information flows into a D3.js visualizer to render the data as a lineage diagram.
Having all the metadata makes for easy reporting. The data can be used for ad hoc reporting
when specific questions arise, and entire lineages can be exported to Excel.