Beginners Guide to Data Quality
Featured in:
Data’s importance for businesses is currently one of the most talked about issues. Organizations are constantly told how data analytics can drive better decision-making and that big data is crucial for business success. But is all data useful?
This guide will look at data in terms of data quality. We’ll explain what data quality means and how your business could suffer from low quality data. Furthermore, we’ll look at the benefits of and explain the core elements of efficient data quality management.
WHAT IS DATA QUALITY?
Data quality looks at the ability of data to be reliable enough to serve a specific purpose. Data quality possesses certain characteristics that determine whether the quality is sufficient.
Whilst there can be numerous ways to measure data quality in terms of properties, certain main dimensions are usually used. These main characteristics include:
- Completeness – The extent to which the expected data attributes are present. This doesn’t mean data has to be 100% complete to be high quality, but rather the completeness is measured in terms of user expectation and data availability.
Incomplete data records are among the biggest challenges organizations must deal with. This is especially the case with data provided by customers, as customers don’t always see the benefit of providing detailed information to companies. - Validity – Looks at whether the references are valid. That is especially important when multiple datasets are being connected with each other.
- Accuracy – Data has to be accurate to be high quality. Accuracy means the data reflects the reality. It’s important to note that data could be complete, yet remain inaccurate. For example, you might have the addresses of your entire clientele, but some of these addresses might be incorrectly spelled.
- Consistency – In certain instances, you also need to determine the consistency of the data. If you have multiple datasets, you can measure the consistency of facts across these different sets.
- Availability – The extent of availability of the data. Data shouldn’t be hard to access, but be readily available to everyone that requires it.
- Timeliness – Determines how up-to-date the data is in terms of the current task. Usually done by comparing the use date for data with the data source date.
All data should ideally be time-stamped to ensure that timeliness is achieved. You should be able to understand when certain data was provided, as it can help to better understand the validity and accuracy of that information.
The above dimensions are generally used for determining whether data qualifies as ‘high quality’. Overall, data quality is often measured through its ability to serve a specific purpose, as mentioned above. The specific purpose of data is often to support:
- Operations – Data quality is measured by how well it helps to achieve different operational tasks.
- Decision-making – Data quality can be an integral part of decision-making.
- Planning – Data quality is also crucial for corporate planning.
THE DANGERS OF INACCURATE DATA
Data quality is a crucial part of data analytics and any business involved with big data must understand the implications of inaccurate data. Data quality is one of those concepts which is important and you would think businesses understand it, but unfortunately data quality is not often a top priority.
Data itself is not enough, but it must be high quality for the business to reap benefits. If you don’t look after your data, then gathering it becomes meaningless. Consider it through the example of owning a luxury yacht. You can have the yacht, wash it regularly and give it a fresh coat of wax every year. But if you never look inside it or service it by changing oils or checking the motors, you won’t be able to enjoy the yacht. If you don’t look after it and you go out on the sea, you might just get in trouble.
In fact, many businesses are not looking after their data quality as they should. Halo Business Intelligence’s data shows nearly 40% of all company data is inaccurate. Perhaps more worryingly, over 90% of companies admit the contact data they have is not accurate.
Furthermore, the same data shows companies are aware of the problems of inaccurate data to a certain extent. Around 66% of the surveyed companies acknowledged the possibility that inaccurate data has negatively affected the business. In fact, the surveyed companies estimated the cost of inaccurate data stood on average at $8,200,000.
How does low quality data affect companies and cause such havoc? The dangers of inaccurate data can be divided into two main issues: financial costs and loss of reputation.
First, as the above survey results show, inaccurate data can cost money. Without data quality, you are making decisions, implementing operational strategies and planning your next moves based on wrong facts. For example, you might use data in order to create a marketing campaign. But if the data is inaccurate, you might end up marketing the product to the wrong target market or make wrong assumptions about consumer preferences. Hence, you might end up spending money on marketing which won’t have the desired effect.
On the other hand, inaccurate data can increase your operational costs. If you have inaccurate contact data for customers or third-party suppliers, you might spend too much time trying to find the correct information. In business, time is money.
But doesn’t implementing data quality cost money as well? Initially you might notice a spike in spending, as implementing a new data strategy can be expensive. But the cost of inaccurate data is much likely to be higher than any data quality strategy. This is because fixing mistakes tends to be costlier than limiting the risk of mistakes.
A 2011 study by Anders Haug, Frederik Zachariassen, and Dennis van Liempd found that businesses should consider calculating an optimal level of data maintenance. Businesses should find a balance where the cost of inaccurate data would not be more or less than the cost of data maintenance. The optimal level can depend on the business, as well as the industry it operates in. Nonetheless, the study did find the cost of inaccurate data tends to be higher than the cost of data quality maintenance.
Finally, inaccurate data can be damaging to the company’s brand. Inaccurate data can directly impact how you communicate with your consumers and your third-party suppliers. Examples such as sending an e-mail with the greeting “Dear Rich Bastard” might seem extreme, but they aren’t imaginary.
If you repeatedly provide customers with miss-information or call them accidently due to inaccurate data, your business reputation is going to suffer as a consequence. Customers might find your data inaccuracy a nuisance or simply start treating your information untrustworthy and look elsewhere for high quality data and customer service.
THE BENEFITS OF DATA QUALITY
Ignoring data quality can result in serious damage to the business. But aside from avoiding these obvious drawbacks of low quality data, there are more benefits of implementing data quality.
First, data quality can help an organization to reduce costs across different departments. Halo Business Intelligence research found organizations that introduced a data quality initiative managed to reduce:
- Corporate expenses by 10% to 20%
- IT costs by 40% to 50%
- Operating costs by 40%
Again, many of the above cost reductions are a result of the accurate use of data. Planning, decisions and actions are all more effectively conducted when the organization uses appropriate data. Your business can more efficiently remove inaccurate information from its database and therefore save time and money.
But your organization isn’t just able to reduce costs across the organization. The cost reductions and better utilization of data can also increase revenue and sales. Your organization won’t make costly mistakes that could potentially hurt the business. This can improve brand image and guarantee customer loyalty. Furthermore, since you are using accurate data, your marketing campaigns and sales strategies are able to achieve superior results. As mentioned in the example in the previous chapter, you are less likely to end up marketing to the wrong target market if you ensure data quality checks are in place.
Overall, this can help improve risk management. Mistakes become less likely as you can rely on the information you are using to make decisions. Data quality can reduce the risk in a variety of sectors from customer service to product development.
This can all lead to better strategic planning. Your organization can plan efficiently, conveniently and accurately because:
- You don’t spend time chasing the correct information, as data quality ensures data accuracy.
- You can easily access and find the necessary data, as data quality assures data is available and easy-to-use.
- You can make effective decisions, as data quality guarantees the information you use is correct.
As well as improving finances, data quality also provides plenty of support for organizations. Company effectiveness improves, as the organization uses accurate information. Customer service is based on better data and accountability is much easier to achieve. Once you have a data quality plan in place, finding the reason for mistakes is much easier. If data quality is ensured, you can’t blame the data for problems, but look elsewhere in operations for accountability. This makes life easier for employees as well since they don’t have to worry about double-checking every piece of information they use to make decisions.
Data quality can provide more information about the organization, which can help you prioritize the most sought after services. Data quality helps to identify data gaps, data inaccuracies and even the data usage. Overall, you are more able to direct resources to the areas most in need of attention. It’s important to understand that the quality of data is more important than the quantity of data. Data quality not only helps you to ensure data is correct, but also ensures you don’t waste energies on gathering information that is not useful to your business objectives.
THE CORE ELEMENTS OF EFFICIENT DATA QUALITY MANAGEMENT
How can you get your data in order? There are different ways to go about the quality of data. A business should firstly consider its needs carefully, as understanding of data goals and objectives is crucial for data quality success.
Overall, there are certain core elements data quality management must consider. These include: data governance, data quality assurance and data quality control.
Data governance
An organization should appoint a data governance team to monitor data quality. You want the organization to have a team in charge of data quality, with clearly defined roles and responsibilities. This ensures data is up-to-date and that there are sufficient procedures to guarantee this. Appropriate data governance will also guarantee the team is supported and accountability is at the core of your data quality process. Nowadays, it has become increasingly popular among companies to appoint a Chief Data Officer (CDO) who makes sure the board of directors is aware of all data issues within the organization.
The data management team should focus on business objectives, strategic goals and business drivers. You should ask questions such as: What are the key objectives for your business? How can your business meet them? The answers will help define the data your business needs to thrive and it can help prioritize data quality goals.
Once you’ve defined the objectives for your business in terms of the important data and datasets, you need to establish a proper database. A number of different data quality programs can be helpful at this point; or you could outsource this task to a remote database administrator. The most important thing is to focus on including data that leads towards the goal and strips off irrelevant and unnecessary data. Organizations are often wary of deleting data, but data quality requires you to only hold on to essential and accurate data.
Overall, remember that data quality is not only about implementing the strategies with the most sophisticated methods. As discussed above, you need to understand the trade-off between implementing data quality and the loss of finances and reputation due to inaccurate data. This means you must understand data and how your organization wants to use it. This could mean that you don’t need a costly data quality program, but simply better manual checking of data.
Since human error is generally the biggest problem behind inaccurate data, data governance should focus on educating employees about the importance of high quality data. You don’t want to create a separate section for dealing with data quality, but ensure everyone in the organization understands why data quality matters. This can be much more beneficial in removing inaccuracies and inconsistencies than any sophisticated software.
Data quality assurance
The other essential aspect of data quality is data quality assurance (QA). This refers to the process of profiling your data in order to identify inaccuracies in the data. When you are implementing data quality assurance, you are performing the following tasks:
- Profiling the data to find anomalies
- Cleansing the data by removing and deleting incorrect data
During data quality assurance you are ensuring the data in use is of the highest quality in terms of achieving the objectives outlined during the section on data quality governance. The process could be considered as the ‘deep-cleanse’ or ‘spring cleaning’ of data. You are ensuring your data sets are focused on achieving the objectives and the data you use for this purpose is of a high quality. Data quality assurance is the process of optimizing your data.
Data quality control
Finally, you’ll also need to implement data quality control protocols. Data quality control (QC) is performed after quality assurance has taken place, as it guarantees data is correct and consists only the important elements. Data quality control is essentially the process for controlling the use of data and ensuring it’s appropriately used within the organization.
During data quality assurance, you’ll learn the following information:
- The level of inconsistencies within the data
- The level of data incompleteness
- The level of data accuracy
During data quality control, this information is used to decide whether the data can be used. For example, if QA discovers the data is full of inconsistencies, QC would prevent the data from being used. Your organization could have an online template where customers need to apply their phone numbers. During quality assurance, the phone numbers could be found to be incomplete in many instances. Data quality control would then prevent the data from being used, for instance, at the customer service department. This would prevent your customer service representatives from wasting their time ringing numbers, which don’t exist.
Therefore, the data quality control process helps to prevent the incorrect use of data. It allows your organization and data management team to fix the inconsistencies and inaccuracies before they are used for planning, decision-making or operations.
Finally, you can watch the below video to understand data quality through the example of Chrysler. The video highlights the benefits of data quality and the processes it involves in a simple manner.
CONCLUSION
Data is important to businesses and your organization can benefit from data analysis in a number of different ways. But you should understand data alone isn’t necessarily a benefit, it can only provide advantages if data quality is ensured. Inappropriate data management can become a liability for businesses and data quality can limit the risk of this.
Data quality can help ensure you use your data appropriately and it guarantees decisions and strategies are based on accurate information. But instead of simply implementing costly and time-consuming data quality programs, the organization must understand the objectives of their data use. Data quality will be best guaranteed when you understand what data is crucial for business success and you eliminate incorrect and ineffective data from your datasets.
Comments are closed.
Related posts
Chris Yeh talking about Blitzscaling
In Palo Alto, we meet Chris Yeh who talks about how to blitzscale a company. Blitzscaling is very …
9 Tips to Reduce Employee Theft
You probably don’t know it yet, but one or more of your employees may be stealing from …
Leasing vs. Buying Office Equipment
Every office and business enterprise requires office equipment to ensure smooth running of the …