Orita Powers Customer 360 with AuraDS Resulting in a 500x Increase in Speed
Startups need scalability, especially when they offer a service that everyone needs. When Orita wanted to build a platform to solve the problem of dirty data, it found a scalable solution in Neo4j AuraDS, a fully managed Graph Data Science offering.
The Orita data confidence platform helps businesses of all sizes get a better understanding of customers, products, and processes. The platform does identity resolution, making sense of messy, siloed data, using powerful graph algorithms offered in Neo4j Graph Data Science.
By the numbers: Orita
- Graph scale: Tens of millions of nodes, hundreds of millions of relationships
- Sample customer metrics: 3M customer records from 6 sources resolved to 1.5M identities
- Platform: Neo4j AuraDS on Google Cloud Platform
Every customer counts. Companies need to know who their customers are and how they respond to promotions and products. The winners succeed, and investors are watching.
Orita Co-Founders Daniel Brady and Zack Gow have deep data science backgrounds as well as hands-on experience at startups. They wanted to start a consulting agency that would empower e-commerce companies to be more data-driven.
They reached out to associates to work with some live data. “We did all sorts of projects, from dynamic pricing to getting business intelligence tools online, to helping hire a team of people,” said Brady. “But for almost every single client, we built a data warehouse.”
But creating that data warehouse required a major chore: bringing order to the chaos of messy, uneven data. In every instance, the clients had numerous data sources and duplicate, erroneous, or incomplete data within those sources.
After using simple methods to unify customer data five or six times, it seemed like a viable product idea. "We keep doing this; why don't we think about a smarter way to do it?" said Brady.
Messy Data Is Bad for Business
Born digital companies need to know their customers well for many reasons. Customer loyalty, revenue, offers: all of these depend on knowing your customer.
And if a customer creates multiple accounts – whether deliberately for another free trial code or because they forget that they already have an account under another email address – the view of the customer becomes incomplete, skewed, and fragmented.
The result is that reporting is wrong, marketing money is misspent, and recommendations don’t work well. A full customer profile drives greater engagement, among other benefits.
Brady observed that every customer dataset he encountered was “really, really messy. We decided that there's a real opportunity for us to make a nice standalone product that handles that specifically,” Brady said.
“We stumbled upon the world of entity resolution and identity resolution,” said Brady.
Too Much for NetworkX
Initially, Brady and Gow tried using NetworkX, an open source Python tool, for identity resolution, running graph algorithms. That worked reasonably well for smaller datasets, but as data sizes increased, so did runtimes. “All of a sudden it was taking forever, and it was eating up all of our memory,” recalls Brady. And the more the team considered the challenge they faced, the more they realized that the current approach wasn’t working.
The team wanted to store the graph. “Why don't we just make these nodes and edges once instead of redoing it every single time, especially now that our datasets are in the millions and millions of nodes and millions of edges?" said Brady.
“We wanted a convenient way to store the graph as opposed to loading everything in memory every time,” said Gow. At this point, they looked for something like NetworkX that worked with the Neo4j Graph Database. They found Neo4j Graph Data Science. From an architecture perspective, Neo4j Graph Data Science is a unified surface, one component that fits seamlessly with the rest of their technology stack.
Making Messy Data Nice with Graph Data Science
Orita’s product is powered by Neo4j Graph Data Science. Orita runs graph algorithms on what it calls an “identity resolution pipeline” to unify and clean data for businesses.
Orita’s graph is filled with nodes representing specific pieces of customer information, and it uses algorithms to find connections in the data that it can leverage for entity and identity resolution, deduplication, and other data hygiene and organization tasks.
Identity resolution using a graph data model (Source: Orita.ai)
The Benefits of Graph Data Science in the Cloud
Orita started out using the on-premises version of Neo4j Graph Data Science, but was an early adopter of the cloud offering, AuraDS. Brady said making the switch was “super easy.”
He cited many benefits to graph data science in the cloud. “I love being able to pause instances when we're not running any client data,” he said, adding that the cost savings are important for a startup.
Given the nature of its business, Orita can quickly find itself with a lot of data to process. It can scale its instance of Neo4j in a snap. “We rescale it temporarily when we're going to process a ton of data,” said Brady. “A couple hours from now after we process it, we scale it back down. That is really useful for us.”
Highly Scalable Cloud Graph Data Science Platform: ‘As A Startup That’s What We Want’
Orita chose to run its product on Neo4j AuraDS because it's convenient, flexible, and fast.
“It felt like the proper way to tackle this problem,” Brady said, noting the low barrier to entry to get the system running. “It was the least amount of work for us and we knew that we could scale easily with a few clicks of a button.”
‘One Line of Code Solved All of Our Problems’
With easy methods to bring data in, Neo4j has some built-in advantages that make incorporating it into a technology stack easy.
“Once you know a little Cypher and how to use the tools, Neo4j is just this nice little contained component that we can just think about very cleanly,” Brady said. “It’s really easy for us to design the rest of our stack around it.”
And when it comes to support, the Neo4j community is an invaluable resource. Brady said he has crowdsourced his way through tricky Cypher queries. “There's a whole community with a thousand people who have done this before, ready to help you sort out queries,” he said.
Orita also leans on Neo4j support. When Orita ran into an obstacle with a data projection it needed to run, Brady said the solution was as simple as reaching out to Neo4j support to figure out the issue. “It was literally one line of code and it solved all of our problems,” he said.
Delivering Clean Customer Data, High LTV
Orita not only does identity resolution: it also standardizes and validates customer data, giving customers a clean dataset along with metrics on the dirty data they filtered out. “We go through all the different information from customers and filter out spam emails, fake addresses, and fake phone numbers,” said Brady.
Using the remaining clean data, Orita calculates a number of business metrics such as customer lifetime value (LTV), repeat purchase rate, and cohort analysis and provides customers with a detailed report (and the data itself). For those curious to learn more, Orita’s website includes both a sample report and a demo.
For smaller businesses with limited developer and analytics support, Orita gives the customer back a CSV file that they can further analyze using pivot tables in Excel. Tech-savvy companies can feed their data directly into Orita’s API and receive clean customer data the same way.
After Orita runs its special blend of graph algorithms on client data, that cleaning typically means fewer customer entries in the database. But those customers are both real and valuable. “One of the funky things about doing entity resolution is that your LTV can only go up or stay the same,” said Brady.
With a mission of turning numerous messy customer data sources into clean, reliable data, Orita is in a growth market. Given that this challenge is nearly universal, the company’s prospects are bright.