Neo4j Helps Photo Organization Service Monetize Relationships in 1.2 Petabytes of Data
albelli provides users with the ability to sync photos from multiple sources to one location. The software automatically organizes the photos based on a variety of factors — such as date, time and location — into events that are displayed in a user-friendly timeline, keeping customers from having to sort through and organize thousands of photos. Customers have complete control over sharing options and can determine which albums are public or private.
After photos are uploaded and organized, users have the ability to purchase a variety of products that range from photo albums to calendars, mugs and canvas portraits.
Because of the large number of photos — there are over a million users with an average of 2,000 photos each — as well as the large size of the files in combination with all of the descriptors associated with each photo, the company was dealing with huge volumes of data.
The company was still using Microsoft SQL Server, a relational database that was cumbersome and slow because it required such a huge number of JOINs for relationship-based queries. It was becoming clear that they needed to add the capabilities of another database to the mix to overcome these data relationship challenges.
The development team started from scratch and recognized that their domain was made up of graph-like relations between photos and users, and they knew translating this graph-like domain to SQL Server wouldn’t be a good fit. So, they decided to start a proof-of-concept project with Neo4j.
"If the average photo collection user has 2,000 photos, how do we then cope with tens of nodes on each of those photos and keep the product fast and flowable?” said Josh Marcus, Chief Technology Officer at albelli. “That’s where we found our biggest technical challenge.
At peak, the team used up to 700 EC2 instances running at maximum speed for one and a half months to get 1.2 petabytes of data — which included 500 million images plus customers data — transferred transferred from on premise to cloud storage. The migration involved up to five heavily equipped Neo4j servers and saw over 10 billion messages sent across the system. During this process they also extracted the date, time and location information from each photo so that it could then be related to others.
"The huge advantage with Neo4j was that we were able to focus on modeling our data and how to best serve our customers instead of agonizing how to structure tables and JOINs," said Marcus. "It also required very little coding, so we were able to keep our focus on our customers."
In the end, their database boasted a total of one billion nodes, 4.1 billion properties and 2.6 billion relationships with Neo4j acting as the central database. Alongside Neo4j, albelli used other database technologies in their architecture, including Redis, DynamoDB, Aurora and Microsoft SQL Server – allowing them to take advantage of a polyglot persistence approach.