Neo4j as an Embedded Database: When Does Embedding a Graph DB Make Sense?



An embedded database is a database used inside another company’s application, providing added value and functionality. It enhances the functionality of the “host” application, usually without the end user realizing they are engaging with the embedded database.

In this blog series, we’ll discuss how Neo4j can be used as an embedded database. In this part 2 of the series, we’ll help you identify when embedding a graph database makes sense.

For all of the reasons stated in the first blog, graph databases are a foundational part of a huge number of applications. It is natural that ISVs and other product developers would seek to embed this capability into their products.

When examining the question of embedding, it is crucial to understand when graphs will make a transformational difference in your application.

Graph database use cases and scenarios run the gamut. Common use cases include human capital management; semantic search and data lineage; cybersecurity and fraud prevention; supply chain visibility; and monitoring IT infrastructure. By studying these you can understand some of the ways a graph database can fit into your application.

The Big Picture of How Graph Databases Create Value


The high-level story about why graph databases are so attractive can be told quickly by examining two insights:
    • Graphs unlock the power of your data
    • Graphs are everywhere, and growing fast

Unlocking the Power of Connected Data


Since Neo4j introduced its first open source database in 2007, it has been used in hundreds of different use cases as a catalyst for making applications more effective.

Graph databases open a new, richer avenue of understanding and insight for applications by capitalizing on the inherent relationships in the whole dataset. In contrast, more traditional structured data models, which strip much of this context, often hit a plateau as shown in Figure 2. Without a graph database, growth in helping users in new ways had stalled.

With the increased data context that a graph database provides, new questions can be asked and answered, new types of analytics can be created, and new patterns can be revealed. Analysis of the connections in a graph database starts telling you more about everything.

As a result, the power of the applications, and the effectiveness of the use cases supported, starts to grow in new ways.

Connected Data Elevates Use Case Effectiveness

Graph Ubiquity


Graphs also dramatically expand the amount of data that can be captured, organized, and analyzed. We have seen applications become more powerful by making better use of the data they have, but also incorporating vast new quantities of data in the following three categories as shown.

Graph Scaling of Neo4j

Graphs of things. Graphs of things help organizations to build a complete picture of their customers, products, services, and so on. Graphs of things are a great way to start to explore how graphs can add value. For example, you can build a better profile and a deeper understanding of your customers as they connect across multiple touchpoints and lines of business.

The largest Neo4j databases have billions of nodes and connections, representing all the things relevant to a business, whether they are people, products, devices, sensors, or whatever. Most organizations will have nodes and connections ranging from a few thousand on up when they get started. Without a graph, many organizations can’t connect their customers across all touchpoints, products, purchases, interactions, and engagements.

Gaining a comprehensive view across these disparate data points will help an organization use these connections to gain a better understanding of who their customers are and how to market to them more effectively.

A telecom might use the same principle, but rather than knitting together the connections between customers and products, they connect all of the things associated with the network. More laterally, an IT department might create a map showing all the devices and apps across different networks to improve an organization’s cybersecurity posture.

Graphs of transactions. The next step is a graph of transactions. A graph of transactions shows all the transactions for each customer, with all the products and services a person buys over time. It could be a payment network that connects people sending money to each other. The graph of transactions is a few orders of magnitude larger than a graph of things.

Graphs of activity and behavior. At the largest scale, a business can use graph technology to capture activity and behavior. In this instance, any atom of data in itself is minuscule. The advantage is being able to analyze masses of data to glean conclusions.

Companies haven’t mastered the ability to fit millions, or billions of data connections into an affordable repository that supports powerful and speedy queries and analytics. Embedding Neo4j into a product extends these capabilities to any organization using standard machinery.

Tactical Benefits of Embedding a Graph Database


The high-level benefits just discussed have their origin in the capabilities of graph databases to do more work and make programmers and analysts more productive. We call these the tactical benefits of a graph database – the numerous advantages of graph databases for performing specific tasks when creating applications and analyzing data.

We can divide these tactical benefits into two categories: modeling and performance. On the modeling side, Neo4j enhances a solution by providing greater flexibility, agility, and ease of connecting disparate sources. With respect to performance, Neo4j reduces queries from days and minutes to milliseconds; it also minimizes the computing footprint needed to run queries while improving developer productivity.

Modeling: Communication and Collaboration


Neo4j provides benefits that both a technologist and a businessperson will understand and appreciate. An organization typically has two very different models in operation. The logical and conceptual side of the house consists of the business and development staff. Database administrators (DBAs), developers, and IT staff fall into the physical side of the house. The root cause of the divide between the conceptual, logical model in which the business and data analysts operate and the physical side of the house, where the DBAs, developers, and IT people operate, is the disconnect between how the data is used by the business and how it is stored in the database.

In a graph database, the logical, conceptual data model is nearly identical to the physical data model used to store the data. As a result, the usual disconnects between the business and IT become less pronounced.

A graph data model aligns conversations between the business and IT: you can show stakeholders how it works.

Modeling: Flexibility


A malleable data model that is easily changed and accommodates sparse or missing data solves myriad development challenges. The inflexibility of the traditional RDBMS model – rigid, schema-driven, and tabular – requires predefining a schema for the data. But, in a fast-moving, dynamic real-world application, you need to start working with the data to find out its characteristics, ways to enrich it, and additional feeds that might be incorporated. Using a predefined relational model puts you at a disadvantage because it hinders agility and fast iteration.

With a graph database, instead of defining a schema up front, you create and evolve the data model. You apply a schema when and where desired to ensure that the data stored is not incomplete or inconsistent.

Modeling: Agility in Adapting the Data Model


For people who have been trained to solve problems using tables, thinking in graphs can feel a bit alien. Yet after a very short time working with a graph database, organizations quickly realize that graphs are a natural approach to working with data, and often find many valuable uses for them. Graphs are more like how human beings think and do things.

Using a relational database, a developer builds an application as version 1. Soon after, a flurry of new business requirements come in. These requirements might spring from creativity, responding to a competitor, or reacting to a market condition, whether large or small. Organizations need the flexibility to adapt quickly.

Neo4j’s flexible data model enables rapid evolution without disruption. As a result, projects can deliver remarkable outcomes in less time and with less risk.

Having a flexible model allows an organization to add data easily. A new data source such as a cloud data warehouse adds one or more nodes to the graph. It is simple to add properties, nodes, and relationships to the graph data model.

Because of this agility, a graph database like Neo4j thrives on change, making it ideal for high-risk data migration scenarios like shifting to the cloud as well as iterating next generation applications with input from business and IT stakeholders.

Modeling: Insights from Connected Data


Another modeling advantage lies in the new opportunities an organization gains simply by using connected data. The connections between data can be the foundation for new levels of insight about what the data really means. Graph algorithms and AI and ML techniques are able to capture and harvest these insights. The additional context delivered by a graph database, coupled with graph algorithms and AI/ML, provides a very powerful way to harvest insights. Databases based on non-graph architectures lack the additional context needed and can leave organizations at a disadvantage.

Performance: Scalability


Graph databases shine when the relationships between data provide value and information. Graph databases store relationships, adding a new dimension of information that is always available. And while the number of customers grows linearly, the number of transactions per customer and the number of interactions with a customer may grow exponentially.

This large and growing volume of data can provide valuable insights for recommending products, enhancing customer service, and providing timely offers, all in real-time at the point-of-sale. But within the world of relational databases, deriving these insights is bogged down by rejoining data, complex queries involving multiple hops, inflexible data structures, and analytics processing that can often take minutes.

Neo4j enables users to reduce these types of queries from minutes to milliseconds, dramatically shrinking the time it takes to run highvalue queries, as shown in Figure 4. As a result, organizations can take advantage of new forms of value creation across an increasingly complex data landscape.

The more data you have and the more connected it is, the deeper you go past one hop to two or three or more. Organizations can process queries thousands of times faster than even the most robust technologies. The speed equates to much more than efficiency; it completely opens up new possibilities for the business.

As an example, Marriott International’s commerce site processes more than 100 million pricing quotes a day. In the past, every one of those pricing quotes consisted of tens or hundreds of queries.

When Marriott moved their complex pricing workload to Neo4j, they were able to process the same queries a thousand times faster using one-tenth of the computing footprint.

Neo4j also demonstrated its scalability by creating a graph with a trillion relationships, as described in this video.

Performance: Computing Efficiency


Neo4j often requires 10x less compute footprint than other databases when carrying out graph workloads. This includes relational, NoSQL, and RDF databases as well as other graph databases built atop one of those other technologies. As a result, organizations can shave their bottom-line costs, reducing their total cost of ownership (TCO).

Minutes-to-Milliseconds Performance Advantage

Performance: Developer Productivity


Neo4j significantly improves developer productivity and efficiency levels, by leveraging the most widely used graph database query language, as well as providing an extensive array of tools, language drivers, frameworks, and integrations. Coding using the Cypher query language often requires 10x less code than SQL. Cypher originated with Neo4j, but is an open standard, used by dozens of database and tooling vendors.

Conclusion


Graphs unlock the power of your data. They’re everywhere, and growing fast. Especially when you’re dealing with graphs of things, transactions, and activity and behavior, it makes sense to embed a graph database in your application to gain an edge on data modeling and performance.

In the third part of the blog series, we’ll cover what to look for in an embedded graph database.


Want to learn more about embedded graph databases? Get this free white paper Neo4j Inside: A Guide to Neo4j as an Embedded Database now.

Get the White Paper