Here’s How Bad Data Affects Your Bottom Line



By:  David Kashmer, MD MBA MBB (@DavidKashmer)

LinkedIn Profile here.


Hello, and welcome to the Healthcare Quality Podcast.  My name is David Kashmer and my background is as a Lean Six Sigma Master Black Belt.  I am also a surgeon and an MBA, and my passion is data; data collection and using data to improve quality in healthcare.  Today I wanted to talk with you about data fidelity and certain thoughts on data quality.


40% Of Your Company’s Data Is Probably Inacurate


It turns out when we look at data by the numbers, approximately 40% of all company data is found to be inaccurate.  This is by halo business intelligence as seen on Data Science Central.  About 92% of businesses admit that their contact data is not accurate and about 66% of organizations believe they are negatively affected by inaccurate data.


Seems Like A Bigger Problem In Healthcare


In healthcare, experientially, our numbers are much higher.  Routinely when we go to pull charts and data, well, data fidelity and validation is a big problem.  One of the questions you may ask yourself is, now that we’re looking at just how good our data are, does this impact our businesses bottom line?  How does this impact our business, whether we’re a hospital or some other aspect of industry?  Well, it turns out there is evidence about what happens with a data fidelity or data quality initiative per  There’s a link to this article at our blog, and that’s  If you go there, you’ll find more information on data quality under an entry called 17,000 men are pregnant.  We’ll get to more on that in just a moment.


Again, it turns out that data quality initiatives show large changes in business end points, including 10-20% reduction in corporate budget, about a 40-50% reduction in IT budget, and 40% reduction in operating costs.  Increases are typically seen according to Data Science Central in both revenue and sales for industries where those end points are more applicable.


Dirty data can be damaging.  For example, the title for that blog, 17,000 men are pregnant, comes to us from the fact that due to incorrectly entered medical codes in certain British hospitals, thousands of men appear pregnant and seem to require obstetric and prenatal exams.  Those errors caused disastrous results in billing claims and compliance per the century at  So, there are a lot of factors that go into quality data, and having quality data, recording it and making it accessible and useful.  Again, experientially, this is something we teach and talk about when we talk about the use of statistical process control in our hospitals.


How Do We Fix This Problem?


Let me tell you how we do that.  One of the main focuses when we teach and work in quality in hospitals is to get data directly from the process.  When I say directly from the process, I don’t mean get data from a data warehouse the next day or query or registry for data.  Those things are all typically what we do.  We have registries and it’s great to have them, but it turns out it’s a lot more valuable to go to where the process is occurring and collect continuous or discrete data, depending on how you’ve set up your particular data collection plan, but to go right to the process and to collect those end points.  For more information about continuous versus discrete data, you can visit us on the blog,, and also where we talk about discreet versus continuous data.  The point here is, whichever data end point you use, focus on getting the data directly from the process at the point the process is occurring and doing so in a prospective way is key.  This is because we see that by the time data leaves the process, gets into the registry that you’re using, a lot of different things happen.


First, the operational definition for the end point you want to look at, the one that has meaning for your quality improvement project, doesn’t always line up with what the registry asks or wants.  So, because we know so well that often definitions, often fields that need to be entered, often those things don’t line up.  We focus on going right to the process and collecting data.  Now, there are a lot of challenges in that and one of them is resources.  Typically, what we hear in hospitals I’ve helped out with, hospitals where I’ve worked in the past, one of the things is, “boy, staffing to collect data is very challenging”.  It’s really just not valued.  Hospital staffing, a great amount of the costs to run a hospital comes from labour.


If you agree that approximately 60% of hospital costs are labour costs, and that’s broadly speaking what it is across organizations, it’s very challenging to make the argument for why you should have an FTE (full time employee) or a part time employee go to the place where the process is occurring and collect data.  It’s hard to make that argument, but I think you’ll understand based on the numbers we just shared about how data fidelity and poor data impacts our business end points.  I think you can agree now that it’s very worthwhile to have the best data you can.  If your decisions are based on data and you run a very data driven shop, you can probably intuit that its key that the data we use are accurate.  So, if you think that it’s too expensive to collect good data, well you’re likely incurring the costs and expense of not having good data and that tends to be much more significant than you anticipate.


Again, as noticed that when data quality projects are done, projects that focus on the quality of what we put into our registries or what comes to us in a timely fashion to make decisions, well in those projects we see again reductions in corporate budget of 10-20%, IT budget reductions, operating cost reductions, and we see increases in revenue.


So, for today’s entry, I wanted us to talk just a little bit about data fidelity and how it impacts our bottom line.  Again, a lot of what we talked about can be found on and there is a link to this article.  You can find this article with the link at, and we have a little gloss on it and then a link to the article.


Again, if you think it’s too expensive to collect good data, well you should try not collecting good data because that’s a lot more expensive.  So, again, in summary, we just wanted to highlight for you all today some of the really dramatic costs associated with bad data.  Again, our advice and my advice from having done many quality improvement projects over the years, and a typical teaching in healthcare quality improvement projects is go right to the place of the process you’ve teased out.  Go right to it, take the stopwatch, clipboard or what have you, and take a look at it.  Take a look and collect your data right from the process.  It won’t be as cleaned as the registry may make it, it won’t be fraught with the challenges of taking the operational definition that you want to look it and somehow shoehorning that into what the registry wants.


So, good luck with your data collection and your quality improvement projects, and if you have any questions or stories about the use of data in your healthcare system, whether it be a success story, a question about how to have a success story, or a warning for other data users out there, feel free to visit us at and share your experiences.  We are always happy to hear.  Have a great day!