The increasing quantity of data has fueled an era of data-driven innovation. Industries are adopting big data, data science, analytics, and artificial intelligence to become leaders in their market. You may have invested in these technologies with the hope of improving your customer satisfaction, streamlining operations, and setting strategy. Yet efforts to become a data-driven company remain difficult.
One foundational factor to your success is your data, and many companies have a lot of bad data. The cost of bad data has been well established. IBM estimated losses of $3.1 trillion in the US alone. Gartner Research estimated the financial impact to a single business due to poor data quality to be between $9.7 million to $13.5 million.
Knowing you have bad data is difficult because these problems are hidden throughout the organization. Managers are making decisions with bad data. Leaders are missing opportunities because of absent data. Team members are correcting data instead of using it. The scattered existence of the problems hinders your ability to acknowledge you do not have high-quality data.
What do we mean by data?
Data comes in many forms. For marketers, data can be click-through-rates (CTR), social media posts, or cost per thousand (CPM) impressions. For sales people, data can be customer profiles, market data, and order history. For doctors, data can be the health record, x-ray images, and dosage history. For security services, data can be live video streams, photos, and motion sensor streams. Your data helps you understand what is happening today, so you can make decisions on what to do tomorrow. The better your data, the better your decisions.
How to uncover your data quality
You can become more aware of your data quality by engaging in productive conversations with the people who depend on data. To begin, identify a business priority and the metrics you are using to measure the outcome. Then gather the people involved in this outcome and the data they use to inform decisions. Collectively look at your data from these various perspectives:
1. Is the data complete?
Complete data is your team members having the required information to inform their work. For example, does the customer support team have access to recent transactions. Does your team have access to the appropriate amount of data.
2. Is the data clean?
Clean data means no redundant data. For example, the report is not sending duplicate records or unnecessary records.
3. Is the data accurate?
Accurate data is the information that reflects reality and does not contain errors.
4. Is the data valid?
Valid data follows the formats agreed to by all. Standard definitions of type, size, and format. For example, time stamps use the same format and time zone. Measurements are in centimeters not inches.
5. Is the data understood?
Understandability means the information is stored and used consistently across the systems and teams. The data is comprehended easily and correctly by those who use it.
6. Is the data relevant?
Relevant data means the information is provided with enough detail to inform a decision. For example, do you need the total transaction per month or do you need every line item. Ask if there is data that is not relevant to their decision.
7. Is the data timely?
Timely data means the information is available when it is expected and up-to-date for the decision it informs. For example, the inventory status could be updated hourly, the account balance is real-time, and hours worked could be daily.
8. Is the data trustworthy?
Trustworthy data is the information that is collected accurately and its provenance is understood. The data is unbiased and impartial.
9. Is the data compliant?
Compliant data means you are using information according to applicable laws, industry standards, and customer agreements.
You have a data quality problem when your team hesitates with any of these questions or shares some misgivings. The feedback may be difficult to hear. You can acknowledge there is a problem with the quality of your data and begin to build efforts to improve it.