Advantages of a Graph-Based Metadata Repository [Community Post]


[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

Many higher education institutions, like the University of Washington (UW), are implementing large, complex, Software-as-a-Service solutions for HR, payroll and other administrative systems. While these enterprise systems provide useful new tools and resources, organizations face a significant challenge in helping users understand what’s changing and how to prepare for that change. The UW is in the process of replacing its 30+ year-old HR and payroll system with a Software-as-a-Service human capital management tool. This multi-year effort is the largest administrative transformation in UW’s history, touching every person and department at the University. Knowledge Navigator (KN) was created as a metadata repository and to facilitate system migration. KN represents concepts and data relationships both visually and textually, highlighting linkages between the old and new systems in a web-based, interactive platform as seen below.
The University of Washington metadata repository used by the Knowledge Navigator app
KN keeps users informed and engaged throughout the enterprise system migration by providing self-service access to conceptual and technical descriptions, definitions, lineage, interactive relationship maps, and impact analysis information.

Picking the Right Solution

  When effectively governed, a metadata repository establishes a common understanding and expectations across the University. It provides a view into the flow of data, the ability to perform impact analysis, a common business vocabulary and accountability for its terms and definitions. The comprehensive management of metadata is vital to enabling an organization to oversee changes while delivering trusted, secure data in a complex data integration environment. Solid metadata management tools play a central role in holistic system management, including system migration. Multiple metadata solutions have been evaluated, from custom SharePoint repositories to custom relational database (RDBMS) solutions to commercial vendor tools. None worked well, nor offered the amount of customization required, and they did not provide the ability to “stitch” together various data sources into a complete before-and-after picture. In 2014, at a national information management conference, the University of Notre Dame presented on graph databases. Through a subsequent collaboration with them, the UW team was inspired to move to Neo4j as the database of choice.

Keep It Simple

  Many metadata and data governance efforts fail because they attempted to accomplish too much at once. We focused on a specific problem we wanted to solve – how to most easily demonstrate change from the “old-to-new” perspective – that could return tangible value, but was not boiling the ocean from the standpoint of data-volume, organizational or architectural complexity. The data model for Knowledge Navigator is relatively simple, as seen in the partial data model below for databases and business intelligence (BI) reports:
The data model for the UW metadata repository

Relevance

  Knowledge Navigator stands out among metadata repositories because of its unique ability to provide communication about changes to the University’s data end-users. Built into KN is help for understanding the impact of migrating from a mainframe system to a cloud-based system, and how to prepare for the changes. Data lineage maps illustrate the parallels between the old and new systems. Interactive diagrams invite users to select objects and artifacts to understand the relationships between them. KN explains the new concepts and definitions users should know to work effectively within the new system. Users can view high-level conceptual models as well as technical metadata about tables and columns. For example, users can see that the legacy table called Person is related to Person in the new system, or that 14 columns from legacy Person relate to columns in the new Person, and that there is a brand-new, yet related table, called AcademicAppointee.
The Knowledge Navigator change management tool

Data Lineage and Impact Analysis

  KN identifies affected tables and columns in the reporting operational data store, as well as affected business intelligence reports. In the screenshot below, all source tables are displayed for the Academic Personnel Appointment Report.
An Academic Personnel Appointment Report in Knowledge Navigator
The report dependency data can be exported to a CSV, and then compared with the report traffic information to do report impact analysis:
An impact analysis of reports via the metadata repository
Looking ahead, on the heels of the HR/Payroll Modernization (HRPM) program, the University will replace its equally outmoded finance system. The Finance Business Transformation (FBT) program promises to be even larger and more complex than HRPM, and we are already working with FBT leadership to plan the data that will go into KN to facilitate impact analysis and user guidance throughout the project. In addition, KN is used by internal project teams to document metadata used in the development process and not accessible to the general public, such as: internal glossaries and databases, internal notes and links, and source-to-target mappings of databases, APIs and other data sources. Likewise, new data buildout projects benefit from the exposed source-to-target mapping and data transformations. Developers and testers save time and effort through ready access to these critical underlying details.
The data relationships in impact analysis
Using all the impact relationships, we can easily query Neo4j to show us which tables have the most dependencies, like you can see below:
Learn about the advantages of a metadata repository backed by a graph database in this UW case study

Conclusion

  Knowledge Navigator has become an essential tool for data users and metadata repository managers to understand the meaning, usage and impact of data and business concepts at the University of Washington. Using Neo4j as our database, we are well-positioned to expand its capabilities to include the metadata necessary to build it into a powerful enterprise tool.  
Learn more about deploying Neo4j in an enterprise environment: Sign up for this free online training course, Neo4j in Production, and discover how to deploy the world’s leading graph database at scale. Register Now