From Collections to Connections: Where Hadoop Adoption Goes from Here

Lance Walter, the Chief Marketing Officer at Neo4j

CMO, Neo4j

June 13, 2019

5 min read

Learn how Hadoop must adjust to connecting big data.

“These guys are killing it!”

Back in 2013, I was contacted by my friend Randy who’s a long-time Bay Area tech recruiter.

He was looking for sales and marketing referrals for a company called MapR. He really couldn’t have been more enthusiastic.

“I haven’t seen one like this in 20 years,” he said. “They have so much interest, they literally can’t keep up with it. They’re adding sales people as fast as they can, and every salesperson they have made their annual quota in the first quarter.”

To this day, I believe everything that he told me was true at the time.

Since then, Hadoop and its ecosystem have spawned billions in investment, purchasing and company exits. Global Systems Integrators all built big data practices. My friend and former co-worker James Dixon coined the term “data lake” to describe this new data management approach, and massive events like Hadoop Summit and Strata popped up to address seemingly-endless interest in Hadoop-powered data lakes.

What Could Possibly Go Wrong?

A lot, it turns out.

The “data swamp” concept isn’t new, but now we’re seeing the effects at scale across both the user and vendor landscape. From a user perspective, Gartner recently revised their estimated failure rate on data lake projects from 60% up to 85% – nearly 9 out of 10 customer projects failing is obviously unsustainable.

On the vendor side, the rumors that once-white-hot MapR is now on the verge of a shutdown have been confirmed. And last year, in the largest consolidation in the big data space, HortonWorks and Cloudera merged in what many saw as a sign of a weakening market or even Hadoop’s Obituary rather than a “1+1=3” kind of synergistic tech merger.

Now the CEO who put them together is stepping down, which sent Cloudera’s stock tumbling more than 25% the day that news was announced.

Big Data… Big Whoop?

Lots of people with more credible technical opinions than mine can debate the merits and shortcomings of Hadoop as a data platform. Having personally lived through the many early failures of “Data Warehousing 1.0” (and seen many eventual successes), it’s clear there’s one major issue: We’ve transitioned a failed approach from one platform to another, and it’s still failing.

Throwing all of your data into one place and hoping that things happen – often called “build it and they will come” – didn’t work before with relational data warehouses and isn’t working now with Hadoop.

Another interesting parallel I’ve observed is the naiveté of using data volumes as a proxy for business value.

People used to brag about terabytes, and now they’re bragging about petabytes. However, they’re still in a frustrating search for business value, often proven by the fact that the same people didn’t brag about how many active users they had, or what business changes they made from their new-found insights. But wow did they manage to stick a ton of data into one place.

We’ve built enormous collections of data, but have barely scratched the surface of connections in data.

You’ve Collected It, Now What?

Bolstered by recent enthusiasm around AI and machine learning, the world is increasingly waking up to the value of data relationships.

As an example, academics have proven a person’s personal network and relationships are a far stronger predictor of their likelihood to smoke than education, income or other attributes. Connections frequently tell a more insightful story than individual data points without connected context.

But connected data is more than just an academic exercise or a social media thought experiment. Connected data is accelerating NASA’s and humanity’s journey to Mars. It’s also how Google broke away from the pack in the early search market, using graph-powered relevance rankings (rather than simple word counts and “tag clouds” for those who remember) to build a digital advertising empire.

Based on these trends, Michael Moore, Ph.D. and Executive Director of Data and Analytics at EY recently predicted that 50% of database workloads would move to graph platforms in the next 10 years. Technology and market prognostication is a tricky business, but that’s a powerful prediction from a person who works with both the C-suite and the data architects at the world’s biggest and best companies.

We’ve had all kinds of step-function evolutions in the database market over the years from object databases and shared-nothing architectures to the emergence of a wide array of NoSQL platforms. But we’ve never seen a change of that magnitude in such a timeframe.

Where Do We Go from Here?

So does Hadoop go the way of the dinosaur when it’s barely even a teenager? Not likely.

It will continue to play a key role in enterprise data architectures. Just like they did with Data Warehousing 1.0, buyers and implementers will get better at implementing to real business requirements – and get better at identifying which data problems are “Hadoop problems” and which ones are better suited to different data architectures.

As Gartner’s Merv Adrian pointed out recently, at 4% of the database market, Hadoop is still larger in market size than many well-known NoSQL vendors combined. From my view, this is exactly why Gartner’s Hype Cycle for technology includes a “Trough of Disillusionment” stage. Technologies emerge and people think they may have found a universal technological cure-all to every problem they can imagine, and they’re disillusioned when they realize it only helps with some problems. That’s what the market is now figuring out with Hadoop.

Especially in the medium-term, buyers will continue to see graph technology as a way to connect and contextualize their data – often alongside Hadoop, or within Spark, or other technologies – to get back to success and value in their data projects.

Hadoop isn’t going away, but market attention is increasingly expanding its focus from collecting data to connecting data. There is ample market evidence including some great examples of Neo4j customers that demonstrate the value of connected data to bottom-line results. That said, I think we can expect a bumpy ride as vendors, buyers and practitioners adjust.

Want to learn more on how relational databases compare to their graph counterparts? Get The Definitive Guide to Graph Databases for the RDBMS Developer, and discover when and how to use graphs in conjunction with your relational database.

Get the Ebook