Data quality: get it right now because it will be more expensive to get it right later
The fourth critical sin of businesses (see Alchemy, the Philosopher's Stone and the Ten Capital Sins of businesses) is disregarding data quality. Similarly to how, in How to handle big data, I mention the impact of lack of data veracity, similarly inaccurate, incomplete and, at times, incorrect data will not allow a firm to produce insights on which they can rely.
One of the most cited mantra in data analysis is the phrase: Garbage in, garbage out.
A study conducted by Acharya and Zhaoxia in 2017, along with another study by La Valle, Lesser, Shockley, Hopkins, and Krushwitz in 2010 highlight how one-third of business leaders do not trust the big data information they use to make business decisions.
According to a Gartner’s Data Quality Market Survey “nearly 60% of organizations don’t measure the annual financial cost of poor quality data”.
Poor-quality data constitutes a costly sin that is not easily rectifiable.
Dun & Bradstreet evaluates the cost of data at $1 per record, however it estimates the cost to correct the same record to be $100. They state: “It is far more cost-efficient to prevent data issues than to resolve them. If a company has 500,000 records and 30% are inaccurate, then it would need to spend $15 million to correct the issues versus $150,000 to prevent them”.
Lack of data integration, another capital sin in business, is also made even more difficult to fix by the lack of data quality. Poor data quality can complicate data integration efforts, hindering the ability to merge data from different sources for comprehensive analysis.
Having data the business cannot trust is a waste of resources leading to incorrect analysis and, ultimately, misguided business decisions.
Several ways in which companies can try to improve data quality include:
Data Cleansing: Regularly cleanse and preprocess data to remove duplicates, inconsistencies, and inaccuracies. This can involve techniques such as standardization, validation, and transformation.
Validation: Implement validation rules to ensure that data conforms to predefined standards. This can help catch errors at the time of data entry.
Quality Control Processes: Establish quality control measures and review processes to identify and rectify data issues before they affect analyses.
Data Governance: Establish clear guidelines for data collection, entry, and maintenance to maintain consistency and accuracy across the organization.
Automated Tools: Utilize data quality tools and automated scripts to detect and address issues in large datasets efficiently.
Meaningful and useful business insights cannot transcend from high-quality data that can be trusted. It is imperative to establish robust quality control processes and meticulously verify ETL (Extract, Transform, Load) processes for accuracy. When merging data from different sources, especially, utmost care must be taken to ensure the integrity and quality of the data. Attempting to cut corners on these processes will ultimately result in much higher costs in the long run.
“Data quality is free. It’s not a gift, but it’s free. What costs money are the unquality things – all the actions that involve not getting data quality right the first time and all the actions to correct these data quality issues” - Philip Crosby (adapted by J. Schwarzenbach)
The myriad data sins I have encountered in decades in the global capital markets make my head spin. And this before we get to the appallingly poor data quality processes at the government level. It is a small miracle the lights stay on. Managers should pay good heed to what S. Zocca is saying.