For Better user experience please view the website in portrait Mode

Big, bad data: What’s the big deal over data quality?

Jul 25, 2014

Mike Wheatley | siliconANGLE | July 25th, 2014 -- Big bad data: Big Data is rapidly expanding, and the nature of this data is more complex and varied than ever before. The concept brings with it a new world of insights and possibilities for organizations, but it also poses many challenges, most especially when it comes to the quality and management of data.

Every single byte of data has potential value if an organization can collect, analyze and use that data to gain insights, but care needs to be taken to manage and exploit this data. The term “bad data” might sound a little bit melodramatic, but the fact remains that if your data is corrupted or incorrect, your business is going to suffer because of it.

After all, even the most powerful Big Data analytics tools are only as useful as the data they crunch, and many times intelligence that’s built on bad data can be worse than not having made any analysis at all. Nevertheless, many companies willingly make important decisions, or formulate objectives based on data they know is flawed or incomplete.

The big cost of bad data

It’s hard to put a finger on the cost of bad data, but recent research from Experian Data Quality shows that inaccurate data affects the bottom line of some 88 percent of organizations, causing them to lose as much as 12 percent of their revenues. This loss of revenue is calculated according to wasted resources, wasted marketing spend, and wasted staff time.

But in actual fact, the true cost of bad data might be even higher than this. According to Experian, 28 percent of companies surveyed said their bad data resulted in problems delivering email, causing their customer service to suffer. Another 21 percent said they had experienced reputational damage as a result of bad data.

Bad Data

“Bad data as a result of outdated or incomplete data, or duplicate, incorrect entries impacts the ability to make quality decisions for the business, as well as provide enhanced and personalized customer service and experience,” said Ben Parker, Principal Technologist at Guavus, in an interview. “The business cost of bad data may be as high as 10-25% of an organizations revenue,” he adds, citing this report.

Where does bad data come from?

According to Dennis Moore, the senior vice president and general manager of Informatica’s MDM business, the main problem lies in the fragmentation of business-critical data that occurs as organizations grow.

“There is no big picture because it’s scattered across applications, including on premise applications (such as SAP, Oracle and PeopleSoft) and cloud applications (such as Salesforce, Marketo, and Workday),” writes Moore on the Informatica blog. “But it gets worse. Business-critical data changes all the time…. Business-critical data becomes inconsistent, and no one knows which application has the most up-to-date information. This costs companies money. It saps productivity and forces people to do a lot of manual work outside their best-in-class processes and world-class applications.”

Fixing bad data

If your organization relies on Big Data, the bad data is certainly going to cost you. So the question is, what can businesses do to avoid making key decisions based on inaccurate facts, and therefore avoid exposing themselves to increased risks?

Dr. Mark Whitehorn, a business intelligence consulatant and senior lecturer at the School of Computing at the University of Dundee, writes that there are three approaches to solving the bad data conundrum:

  1. Fix the existing data in the transactional systems.
  2. Improve the applications so no more bad data can be entered.
  3. Tolerate bad data in the source systems and fix it when we perform the extract, transform and load (ETL) process to move it into the data warehouse.

All well and good, but each one has its limitations. Fixing your existing data doesn’t do anything to stop more bad data arriving in the system later, and improving your applications to eliminate bad data is, in Dr. Whitehorn’s words, “often impossible”. As for the third option, this is probably the most effective way to remedy the situation, typically done using some kind of tool with “good built-in mechanisms for cleansing data”.

Another method is to use a relatively new technique called “streaming analytics” to ‘clean’ the bad data as it arrives in the system.

“To prevent too much big data from accumulating, organizations can leverage streaming analytics,” says Parker. “Streaming analytics collects and reduces the large array of real-time data in motion from multiple sources into a consistent set of usable data—the right and clean data on which to perform analysis.”

At the end of the day, data quality is the key enabler when it comes to getting actionable insights from Big Data. Bad data can lead to big problems, creating a mess of conflicting information that leads to poor decision making, disappointing results and other negative consequences.

For organizations that wish to leverage Big Data, the onus is on them to ensure that their data quality initiatives are up to speed.