Rethink Your Master Data: The Limits of Relational Databases

Senior Director of Global Solutions, Neo4j

June 29, 2020

2 min read

Data is both our most valuable asset and our biggest ongoing challenge. As data grows in volume, variety and complexity, across applications, clouds and siloed systems, traditional ways of working with data no longer work.

Increasingly, businesses are recognizing a need to harness all of their data, particularly their data around customers, products, partners and more – often called master data. Pressing business priorities such as compliance and digital transformation require a holistic view of
this master data.

Achieving that holistic view requires connecting data across a myriad of sources and silos. Connecting data using flexible graph technology offers a proven approach to solving these data challenges, capturing not only data but an unlimited number of connections and relationships between data.

This blog series describes the power of connecting your most important data about customers, products, employees, business partners and more using graph technology. Along the way, real-world use cases from global enterprises to disruptive startups illustrate the power of
connected data.

This week, in blog two of our five-part blog series, we examine the limits of relational databases for analyzing connected data and delve into the way graph technology overcomes those limitations.

The Limits of Relational Databases

Most MDM systems rely on relational databases with grid-like structures that are not optimized for traversing relationships. Despite the name, relational databases are not designed to capture relationships between data points. Even with all the recent advances in computer processing and high-speed networks, the performance of relational database applications continues to lag when it comes to ad hoc, multi-hop queries.

The root cause usually boils down to one factor: queries about data relationships.

Relational databases were not built to handle connected information, so queries about data relationships require numerous JOIN tables. These operations are costly in terms of computing and memory – and the burden rises exponentially with the size and complexity of
queries. Lengthy SQL statements are required to accomplish simple operations. Performance degrades sharply with the number and levels of data relationships (hops) and the size of the database.

While relational databases continue to serve many purposes, they do not serve connected data use cases effectively. Because JOINs are expensive, they can’t analyze relationships beyond three hops. These multi-hop queries are time-consuming and may even hang, never returning an answer.

The Power of Graph Technology

Graph databases connect all types of data stores – both flexibly and at scale – providing a sweet spot that complements existing databases. Graphs enable next-generation approaches that connect master data wherever it is by building a metadata fabric that weaves connections in the underlying data.

Graph queries are fast, nimble and able to identify and exploit the natural connections hidden in data – and this advantage increases with scale and complexity. With graph databases, queries are much faster – ten times faster is normal but in some cases performance is a thousand or even a million times faster than a relational database.

The advantages of graph technology include:

Support for any query
Lightning fast, no matter how many connections (hops)
Simple query language
Complements existing systems; no need to rip and replace
AI/ML on connected data using graph algorithms
Visualization and communication (whiteboard style structure)

Introducing Neo4j

Neo4j is the leading graph database platform. Hundreds of organizations have turned to Neo4j from industries such as financial services, government, energy, software, retail, media, manufacturing and more.

Neo4j stores and queries data as nodes (entities) and relationships (connections). Nodes linked by relationships form a network. Think of nodes as nouns and relationships as verbs. Properties can be attached to both nodes and relationships, akin to adjectives and adverbs, respectively.

Relational databases force data into a pre-defined model; in contrast, graphs capture the natural structure of a given dataset. Information is stored according to how it is retrieved – thus revealing how individual entities are naturally connected.

The relationships between data are as important as the data points themselves. By contrast, relational databases compute relationships at query time through expensive JOIN operations.

Graph databases excel at managing highly connected data and complex queries. Neo4j uses the Cypher query language (similar to SQL but designed for graphs). With a native graph
database, you can traverse millions of connections per second.

Conclusion

As we have shown, most MDM systems rely on relational databases with grid-like structures that are not optimized for traversing relationships. But graph databases connect all types of data stores – both flexibly and at scale – providing a sweet spot that complements existing databases.These problems can be durably solved through the use of graph technology.

Next week, in blog three of this series, we will show how graphs and MDM intersect and give examples of how companies across industries use Neo4j to get more value from their connected data.

Fuel your success by connecting all of your data across silos. Click below to get your copy of Rethink Your Master Data Management.

Get My White Paper