The ways poor data quality can sabotage your projects and undermine your organization could fill a book. In fact, we’ve filled no shortage of pages on this blog discussing them.
Often, the reasons data quality problems are so hard to address come down to pipeline debt—basically, the idea that once a pipeline ages, and the engineers originally involved in setting it up have left your company or are otherwise hard to reach, the harder it becomes to make sense of any strange results or behavior the pipeline may start producing.
Data pipelines—and the decisions underlying their implementation—need to be revisited and adjusted periodically:
If your organization adopts new standards and formats for displaying data, someone needs to update any pipelines that were created before that change to match.
If software bugs start to rear their head, you need a clear record of how the system was built so you can figure out what’s causing those bugs.
If you’re planning to integrate with a new system, you need to make any updates to your pipeline that might be necessary to make it compatible with that new system—before the integration starts.
Go too long without doing this work, and fixing the root causes of the inevitable data issues down the line is like getting gum out of hair.
Any of these problems can cause inconsistencies, inaccuracies, and gaps in your data. But what do they actually cost the average organization?
The answer depends on context and on what factors are at play, but brace yourself, because in pretty much any case, we’re dealing with figures that will make you weak in the knees.
The impact on US businesses
Issues like outdated code or deprecated systems—two common manifestations of pipeline debt—can lead to widespread data inaccuracies that, in turn, lead to widespread inefficiencies in your organization. No surprise, then, that the National Law Review puts the overall cost of bad data for businesses across the United States at $611 billion per year.
Behind this figure is a wide range of factors. The report estimates, for example, that IT teams spend half of their budget, on average, rehabilitating bad data.
They also note that bad data can put an organization out of compliance with data privacy laws like GDPR, and that the penalties for these violations can amount to up to 4% of an organization’s annual revenue.
The broader economic cost
If we consider the impact of poor data quality downstream from the businesses that own the data, the costs really start to balloon.
Forbes cited an IBM study putting the overall cost of poor data borne by the US economy as a whole at $3.1 trillion. That’s ‘trillion’—with a T.
For perspective, federal government revenue in fiscal year 2023 was $4.4 trillion. That means the annual economic cost of bad data quality is at least 70% as much as the total amount of money the United States government collected in a year.
And here’s the kicker: the study was from 2016. With businesses only becoming more data-centric and data-reliant since then, that number has surely grown.
Problems for AI adopters
An artificial intelligence model isn’t actually all that intelligent if the underlying data is full of holes.
SDX Central reports on a survey from the market research firm Vanson Bourne in which over 40% of respondents ‘experienced data inaccuracies, hallucinations and data biases in their AI outputs.’
The report estimates that data quality issues in AI can cost enterprises hundreds of millions of dollars.
Sunk costs and abandoned projects
While we’re on the topic of AI, Gartner predicts that, by the end of 2025, 30% of generative AI products will be abandoned.
First on their list of reasons: Data quality problems. Whatever a company has invested in its AI product by the time they shut it down goes down the drain. Considering how many such products are currently in development, the total sunk costs across the industry could be staggering.
Pipeline debt equals poor data quality equals dollars and cents gone from your company’s bottom line. Try GX Cloud to learn how you can tackle the problem before it starts costing you.