What is a graph, or graph database and why should I care? How does it work? What are its benefits? What makes it different? Is Facebook the first to use it? How do Graphs address complexity? What does this have to do with Big Data? What is Semi-structured data? What is Connected Data? What are some examples of connected data queries? Why are these Graph Problems? How is a Graph Database different from a Relational Database? Who else has used it before? What are some examples? What does this mean for business?

What is a graph, or graph database and why should I care?

A graph databases is a type of NOSQL database that is optimized for use cases where you have connected data. Connected data is prevalent in social networking (as you mention), logistics networks (for package routing), financial transaction graphs (for detecting fraud), telecommunications networks, and optimization, recommendation engines, bioinformatics, and in many other places. Companies like Adobe, Cisco and Deutsche Telekom have found that Neo4j (which is the most popular graph database in the world) frequently outperforms a traditional database by a factor of one thousand when it comes to queries on connected data.

How does it work?

Graph databases assume that the relationships are as important as the records. This difference has numerous consequences for ease of use as well as for performance. A table-based system makes a good fit for static and simple data structures, while a graph-based fits complex and dynamic data better. Back to top ↑

What are its benefits?

Minutes-to-Milliseconds Performance Over relational database and other NOSQL alternatives, Neo4j turns complex joins into simple & fast graph traversals. Neo4j is over 1000x faster than relational databases for connected data queries. Fully ACID, Robust & Scalable Neo4j is an enterprise database with full support for high-availability clustering and transactions. Drastically-Accelerated Development Cycles Thanks to its flexible data model and Cypher, an intuitive graph query language, Neo4j is easy to use. Back to top ↑

What makes it different?

It used to be that databases were just tasked with digitizing forms and automating business processes. The data were often tabular—take an accounting ledger, for example—and the processes being modeled were reasonably static. Today, the types of data that we are interested in are much more diverse and dynamic. We are interested in capturing information about all sorts of things that are happening around us, which requires us to deal with dynamic systems that often generate large quantities of data that are semi-structured and volatile, where the connections between the discrete data points are as important as the sum of its distinct parts. Back to top ↑

Is Facebook the first to use it?

The first organizations to develop solutions for these problems were Internet giants faced with extreme growth in a highly dynamic and rapidly evolving world. Amazon developed DynamoDB and SimpleDB to deal with the problem of product catalogs, shopping cart, and order data. This made it possible for them to quickly evolve the business, as it grew to sell increasing variety of items: from books to music to videos, to software and household goods. Facebook developed Cassandra to back its growing messaging platform; and Google developed Big Table to handle the variety of data that is the World Wide Web. There is no doubt that these pioneers were (and still are) pushing the limits. However as organizations embrace new challenges that entail volume, velocity, variety, and connectedness, more companies are coming against the same limitations when they try to apply relational technology to these problems. The way was paved for the emergence of new database technologies, leading to the rich choice technologies available in today’s market. Back to top ↑

How do Graphs address complexity?

To understand how graphs address data complexity, we need first to understand the nature of the complexity itself. In practical terms, data gets more complex as it gets bigger, more semi-structured, and more densely connected. Back to top ↑

What does this have to do with Big Data?

The volume of net new data being created each year is growing exponentially—a trend that is set to continue for the foreseeable future. But increased volume isn’t the only force we have to contend with today: on top of this staggering growth in the volume of data, we are also seeing an increase in both the amount of semi-structure and the degree of connectedness present in that data. Back to top ↑

What is Semi-structured data?

Semi-structured data is messy data: data that doesn’t fit into a uniform, one-size-fits-all, rigid relational schema. It is characterized by the presence of sparse tables and lots of null checking logic—all of it necessary to produce a solution that is fast enough and flexible enough to deal with the vagaries of real world data. Increased semi-structure, then, is another force with which we have to contend, besides increased data volume. As data volumes grow, we trade insight for uniformity; the more data we gather about a group of entities, the more that data is likely to be semi-structured. Back to top ↑

What is Connected Data?

Insight and end-user value do not simply result from ramping up volume and variation in our data. Many of the more important questions we want to ask of our data require us to understand how things are connected. Insight depends on us understanding the relationships between entities—and often, the quality of those relationships. Back to top ↑

What are some examples of connected data queries?

Here are some examples, taken from different domains, of the kinds of important questions we ask of our data:
  • Which friends and colleagues do we have in common?
  • What’s the quickest route between two stations on the metro?
  • What do you recommend I buy based on my previous purchases?
  • Which products, services and subscriptions do I have permission to access and modify? Conversely, given this particular subscription, who can modify or cancel it?
  • What’s the most efficient means of delivering a parcel from A to B?
  • Who has been fraudulently claiming benefits?
  • Who owns all the debt? Who is most at risk of poisoning the financial markets?
To answer each of these questions, we need to understand how the entities in our domain are connected. In other words, these are graph problems. Back to top ↑

Why are these Graph Problems?

Because graphs are the best abstraction we have for modeling and querying connectedness. Moreover, the malleability of the graph structure makes it ideal for creating high-fidelity representations of a semi-structured domain. Traditionally relegated to the more obscure applications of computer science, graph data models are today proving to be a powerful way of modeling and interrogating a wide range of common use cases. Put simply, graphs are everywhere. Back to top ↑

How is a Graph Database different from a Relational Database?

Today, if you’ve got a graph data problem, you can tackle it using a graph database—an online transactional system that allows you to store, manage and query your data in the form of a graph. A graph database enables you to represent any kind of data in a highly accessible, elegant way using nodes and relationships, both of which may host properties:
  • Nodes are containers for properties, which are key-value pairs that capture an entity’s attributes. In a graph model of a domain, nodes tend to be used to represent the things in the domain. The connections between these things are expressed using relationships.
  • A relationship has a name and a direction, which together lend semantic clarity and context to the nodes connected by the relationship. Like nodes, relationships can also contain properties: attaching one or more properties to a relationship allows us to weight that relationship, or describe its quality, or otherwise qualify its applicability for a particular query.
The key thing about such a model is that it makes relations first-class citizens of the data, rather than treating them as metadata. As real data points, they can be queried and understood in their variety, weight and quality: important capabilities in a world of increasing connectedness. Back to top ↑

Who else has used it before?

Today, the most innovative organizations are leveraging graph databases as a way to solve the challenges around their connected data. These include major names such as Google, Twitter, Adobe, and American Express. Graph databases are also being used by organizations in a range of fields including finance, education, web, ISV, and telecom and data communications. Back to top ↑

What are some examples?

  • Adobe Systems currently leverages a graph database to provide social capabilities to its Creative Cloud – a new array of services to media enthusiasts and professionals. A graph offers clear advantages in capturing Adobe’s rich data model fully, while still allowing for high performance queries that range from simple reads to advanced analytics. It also enables Adobe to store large amounts of connected data across three continents, all while maintaining high query performance.
  • Europe’s number one professional network, Viadeo, has integrated a graph database to store all of its users and relationships. Viadeo currently has 40 million professionals in its network and requires a solution that is easy to use and capable of handling major expansion. Upon integrating a graph model, Viadeo has accelerated its system performance by more than 200 percent.
  • Telenor Group, is one of the top ten wireless Telco companies in the world, and uses a graph database to manage its customer organizational structures. The ability to model and query complex data such as customer and account structures with high performance has proven to be critical to Telenor’s ongoing success.
  • Deutsche Telekom leverages a graph database for its highly scalable social soccer fan web site attracting tens of thousands of visitors during each soccer match, where it provides painless data modeling, seamless data model extendibility, and high performance and reliability.
  • Squidoo is the popular social publishing platform where users share their passions. They recently created a product called Postcards, which are single page, beautifully designed recommendations of books, movies, music albums, quotes, and other products and media types. A graph database ensures that users have an awesome experience as it provides a primary data store for the Postcards taxonomy and the recommendation engine for what people should be doing next.
Such examples prove the pervasiveness of connections within data and the power of a graph model to optimally map relationships. A graph database allows you to further query and analyze such connections to provide greater insight and end-user value. In short, graphs are poised to deliver true competitive advantage by offering deeper perspective into data as well as a new framework to power today’s revolutionary applications. Back to top ↑

What does this mean for business?

Graphs are a new way of thinking for explicitly modeling the factors that make today’s big data so complex: semi-structure and connectedness. As more and more organizations recognize the value of modeling data with a graph, they are turning to the use of graph databases to extend this powerful modeling capability to the storage and querying of complex, densely connected structures. The result is the opening up of new opportunities for generating critical insight and end-user value, which can make all the difference in keeping up with today’s competitive business environment. Back to top ↑

For more information on Graphs Contact Us →

 

Keywords: