Slow SQL Queries Are Killing Your Recommendation Engine

Discover Why Slow SQL Queries of Your Relational Database Are Killing Your Recommendation EngineIn today’s fast-paced world, users won’t wait minutes for your recommendation engine to query the database.

In fact, you’re lucky if they’re willing to wait more than a few seconds.

The question is then: Is your relational database performing at the speed required to offer relevant and insightful recommendations to your users? If not, your slow SQL queries could be to blame.

In this series on SQL strain, we’ll dive into the causes – and cures – of relational database (RDBMS) performance issues, including the future-proof alternative of graph databases.

Previously, we’ve covered five sure signs of SQL strain and a comparison between RDBMS and graph database models. This week, we’ll dive into the specific example of a recommendation engine and compare relational databases with graph databases in terms of both data modeling and writing queries.

Data Modeling in a Relational Database Versus a Graph Database

In a relational database, data modeling for a simple recommendation engine requires creating multiple tables. You’d create separate tables for customers, orders and products, as well as intermediate JOIN tables that represent which customers purchased which products in which order.

In contrast, data models for a graph database can be easily sketched on the back of a napkin. A customer purchased a certain product. The customer-to-product relationship is stored as part of storing the order. Other customers may have purchased the same product.

With this simple database model, you can easily ask, “What other products did those people purchase?”

Modeling a recommendation engine with a graph database is whiteboard-friendly. It isn’t just intuitive for developers but for everyone familiar with the business domain.

In other words, with a graph database the logical model – the way we think about the problem – corresponds to the physical model – the way the data is stored, queried and visualized by the database.

For many interesting data problems people are facing today – especially building relevant recommendation engines – it’s not enough to know that two entities (products, users, ratings, reviews, etc.) are connected. After all, a relational database can tell us that much.

It’s also important to know about those data connections: their meaning, their significance and their strength, weight or quality. In our recommendation engine example these might be questions like, Which products are related and what’s the strength of that relationship? Which users have left a high-quality (or low-quality) reviews for a particular product? How many friends-of-friends bought a similar product?

A fully featured graph database answers these types of data relationship questions, enabling you to incorporate relevant information about the characteristics or strength of those connections.

SQL Queries Vs. Graph Database Queries

Unlike relational SQL queries, graph database queries are straightforward to write and understand.

Graph databases often have their own syntax for such queries. In the case of Neo4j this syntax is governed by a simple but expressive language called Cypher that is purpose-built for traversing data relationships.

Cypher queries are much simpler than SQL queries. In fact, a long SQL query can frequently be compressed to many fewer lines in Cypher.

Here’s a sample Cypher query for a hypothetical recommendation engine:

MATCH (u:Customer {customer_id:’customer-one’})-[:BOUGHT]->(p:Product)
      <- [:BOUGHT]-(peer:Customer)-[:BOUGHT]->(reco:Product) 
WHERE not (u)-[:BOUGHT]->(reco) 
RETURN reco as Recommendation, count(*) as Frequency 

This Cypher query says that for each customer who bought a product, the recommendation engine should look at the products that peer customers have purchased and then suggest them to the current user. The WHERE clause removes products that the customer has already purchased, since we don’t want to recommend something the customer has already bought.

Each of the arrows in the MATCH clause of the Cypher query represents a relationship that would be modeled as a many-to-many JOIN table in a relational model with two JOINs each. So even this simple query encompasses six JOINs across tables.

Here’s the equivalent SQL query:

SELECT product.product_name as Recommendation, count(1) as Frequency 
FROM product, customer_product_mapping, (SELECT cpm3.product_id, cpm3.customer_id 
          FROM Customer_product_mapping cpm, Customer_product_mapping cpm2, 
               Customer_product_mapping cpm3 
          WHERE cpm.customer_id = ‘customer-one’ 
          and cpm.product_id = cpm2.product_id 
          and cpm2.customer_id != ‘customer-one’ 
      and cpm3.customer_id = cpm2.customer_id 
      and cpm3.product_id not in (select distinct product_id 
          FROM Customer_product_mapping cpm 
          WHERE cpm.customer_id = ‘customer-one’) 
   ) recommended_products 
WHERE customer_product_mapping.product_id = product.product_id 
and customer_product_mapping.product_id in recommended_products.product_id 
and customer_product_mapping.customer_id = recommended_products.customer_id 
GROUP BY product.product_name 
ORDER BY Frequency desc

This SQL query not only suffers from performance issues due to the JOIN complexity but it will also degrade in performance as the given dataset gets larger. When it comes to building your recommendation engine, this level of performance is simply unacceptable.

If your recommendation engine is powered by a relational database, users won’t wait for your suggestions to load, even if they are precisely relevant. And if your users won’t wait, your business suffers.

It’s time to choose the right tool for the job, moving your recommendation engine to a graph database that reduces database query times from minutes to milliseconds. And with so many ways to get started quickly, adding a graph database to your recommendation engine is the best investment of time you can make.

Is your recommendation engine suffering from the limited capabilities of a relational database? Click below to download a free copy of this white paper, Overcoming SQL Strain and SQL Pain and discover how to harness connected data like never before.

Catch up with the rest of the SQL Strain series: