Blog April 15, 2021 |

How the Data Lake’s Failure Led to the CDP

A data lake is a regular fixture in many enterprises' technology stacks today. Per Gartner, two in five companies have data lakes to support their data infrastructure and analytics needs.

Companies' IT or analytics teams set up and own in-house, (relatively) cost-effective, semi-structured databases to store data (often unprocessed, raw data) from across the business.

Or, organizations' technology decision-makers invest in a cloud-based vendor such as Microsoft’s Azure Data Lake or Snowflake instead to store various types of data.

Regardless of whether a business buys or builds, how they leverage the solution is practically the same, regardless of niche or industry: Data scientists process customer data stored in the system so non-technical, growth-focused teams can act on insights they find.

But, for many large-scale businesses, a date lake alone can’t solve their biggest problems.

Marketing, ecommerce, digital product/experience, and other growth teams want to take advantage of their first-party data in a scalable, streamlined manner. But they often have to limit their use cases to the speed and flexibility of their tools and processes.

Data lakes are the de facto repository for many enterprises' big data and hold a lot of value for data scientists, in particular. But the solution fails to help business technology users capitalize on their first-party data efficiently and in real time.

As customer expectations grow, companies need to bring pertinent customer data closer to the systems that execute on customer experiences. While data lakes serve their purpose, you need a customer data platform (CDP) to deliver the best experiences today.

The data lake: Ideal for handling volume and variety of data — but not velocity

For many enterprises, a data lake is a critical component of their data management efforts.

Executives and IT leaders alike need data lakes to serve as the single source of truth for business intelligence and customer insights and consider it vital for carrying out core tasks:

  • Building and deploying custom machine learning models

  • Constructing, analyzing, and updating customer segments

  • Executing personalized campaigns to high-value audiences

  • Providing growth teams with recommendations on how to engage customers

The reality? Having first-party data in one place is helpful for these tasks. But any hopes of activating that data in real time come to a screeching halt when forced to use a data lake.

One issue? Getting utility out of it requires SQL query knowledge and lots of processing time. This technical expertise and data latency leads to inadequate data-driven processes.

Case in point: A recent Arcadia Data study found just half of business users can "create virtual data sets or semantic views," "blend data sets," and “view complex correlations" in their data lake. (Nor can these users activate data from the system with ease, if at all.)

"Conventional wisdom assumes that if a company loads all its data into a data lake ... they’ll be able to correlate all their data sets," MIT Sloan School of Management contributor Sara Brown noted. "But they often end up with data swamps, not data lakes."

When this occurs, data scientists need to spend time cleaning and processing data.

There is a fundamental tension between the speed at which teams responsible for customer experience (marketing, ecommerce, etc.) and data scientists’ ability to quickly query data if relying on a data lake. Not only is it a limitation in the tech, but also in processes.

Relying on data scientists to build segments and calculate customer scores means these requests are added to a long queue, further elongating the time from insight to action.

For example, if you want to use engagement scores to target "highly engaged" customers in an email campaign, what good are those scores if you don’t receive them for two weeks?

Marketing and other teams trying to reach these customers at a certain moment of time where they might have a high likelihood of purchases are losing out on this opportunity.

Where a data lake fits in the modern business technology stack today

Data lakes have downsides. But that's not to say the tech (and similar sources of truth: master data management systems, data warehouses, etc.) don’t have their benefits as well.

For instance, data that is not directly related to a customer, like financial data used for auditing purposes, should continue to live in data warehouses like the ones we mentioned.

Because these systems weren’t designed to help teams act in real time, they shouldn't be the centralized solution growth-focused teams tap into to execute in their day-to-day.

That designation should go to pure-play customer data platforms like BlueConic that can handle the volume, variety, and velocity of companies' first-party data.

Writing for CMSWire, BlueConic COO Cory Munchbach noted a CDP is now an essential element of modern tech stacks. But it doesn't always need to supplant existing databases.

"A CDP can indeed replace some of your databases, especially the legacy ones you pay an arm and a leg to marketing services providers to maintain for you, but it’s unlikely the CDP is the last and only database you have," Cory indicated.

Having said that, BlueConic CEO Bart Heilbron detailed to Computer Business Review how a customer data platform — BlueConic, specifically — exceeds the data lake (and similar legacy databases) from both a capability and enablement standpoint.

"We store [customer data] in a more structured way, something that data lakes don’t do," Bart stated. "Then we cluster that data around a single customer entity, so the customer is the central object and the is data structured around that customer."

Bart added the "lag between initial data collection, unification, and activation of the data" means data lakes just don't enable growth teams to act on their data in actual real time.

Your company may be heavily invested (financially and otherwise) in your data lake, given the (presumably) considerable resources poured into creating and/or implementing yours.

But growth teams can only extract value from data in the system when that data is readily accessible to them — meaning cleaned, unified, and stored in customer profiles in a CDP.

How a CDP streamlines companies' data management and activation efforts

Data science expert Michael Lukianoff said roughly 60-85% of companies' big-data initiatives fail. The primary reason? Reliance on data lakes to solve business users' problems.

Data lakes that take substantial resources to build and maintain rely on skilled data scientists to extrapolate insights. But they don’t ultimately help tech users accomplish their goals.

("[N]o one wants to admit about their overpriced data lake investment," Lukianoff said.)

The good news? These execs (and their IT leaders) can rectify their data management issues — and get their businesses back on track in terms of growth — by investing in a CDP.

Compare BlueConic's capabilities to those of the typical data lake, and it becomes apparent our pure-play CDP makes life much easier (and work far more efficient for) growth teams:

  • For instance, BlueConic standardizes and normalizes first-party data for our customers before storing them in persistent customer profiles. This enables tech users to readily access and utilize customer data — and, in turn, leads to better business outcomes.

  • Whereas raw data is 'dumped' into data lakes and left there for data scientists to make sense of (a laborious task), BlueConic offers a number of features that clean data and ensure high-quality data hygiene (e.g., validation for data types, profile merging) and a mechanism to store said data that makes it readily available to activation systems (email service providers, campaign management tools, etc.) and/or systems that produce insights (business intelligence tools, data lakes, etc.)

  • The data cleaning leads to improved identity resolution, and the mechanism of storage plus a business-user friendly UI allows for growth focused teams to deliver more timely and relevant content, products, and messages and — ultimately — a better CX.

  • With BlueConic in place, business users eliminate the long lead times waiting for insights to come back from their data science team. Meanwhile, data scientists aren’t beholden to the competing priorities of deep analytical work and the speed at which these growth teams want to act on insights.

Simply put, when many growth-focused professionals hear the term ‘data lake,' they hear 'inaccessible,' 'time- and labor-intensive,' and — most importantly — 'just not useful.'

In fact, a recent request for proposal we received from a multi-brand, enterprise company said, "Tell me why a customer data platform isn’t just another data lake."

With BlueConic, these tech users can access always-accurate, constantly updated first-party data they can analyze, segment, model, and activate whenever and wherever they need.

That is, they can liberate their customer data in a streamlined, scalable, real-time manner to grow the business. No (over)reliance on data scientists required.

Learn how you can accelerate business growth and increase operational efficiency with our pure-play customer data platform. Schedule your BlueConic demo today.

Related Resources