Use Continuous Data (!)




For the purposes of quality improvement projects, I prefer continuous to discrete data.  Here, let’s discuss the importance of classifying data as discrete or continuous and the influence this can have over your quality improvement project.  For those of you who want to skip to the headlines: continuous data is preferable to discrete data for your quality improvement project because you can do a lot more with a lot less of it.


First let’s define continuous versus discrete data.  Continuous data is data that is infinitely divisible.  That means types of data that you can divide forever ad infinitum comprise the types of data we can label as continuous.  Examples of continuous data include time.  One hour can be divided into two groups of thirty minutes, minutes can be divided into seconds and seconds can continue to be divided on down the line.  Contrast this with discrete data:  discrete data is data which is, in short, not continuous.  (Revolutionary definition, I know.) Things like percentages, levels and colors comprise data that comes in divided packets and so can be called discrete.


Now that we have our definitions sorted, let’s talk about why discreet data can be so challenging.  First, when we go to sample directly from a system, discrete data often demands a larger sample size.  Consider our simple sample size equation for how big a sample we need of discreet data to detect a change:


(p)(1-p) (2 / delta)^2.


This sample size equation for discreet data has several important consequences.  First, consider the terms.  P is the probability of outcome of a certain event.  This is for percentage-type data where we have a yes or no, go or stop, etc.  The terms help determine our sample size.   As mentioned, p is the probability of the event occurring and the delta is the smallest change we want to be able to detect with our sample.


The 2 in the equation comes from the (approximate) z-score at the 95% level of confidence.  We round up from the true value of z to 2 because that gives us a whole number sample slightly larger than what’s required rather than a sample with a fraction in it.  (How do you have 29.2 of a patient, for example?) Rounding up is important too because rounding down would yield a sample that is slightly too small.


In truth, there are many other factors in sampling besides merely sampling size.  However, here, notice what happens when we work through this sample size equation for discrete data.  Let’s say we have an event that has a 5% probability of occurring. This would be fairly typical for many things in medicine, such as wound infections in contaminated wounds etc.  Working through the sample size equation, and in order to detect a 2% change in that percentage, we have 0.05 x 0.95 (2 / 0.02)^2.  This gives us approximately 475 samples required in order to detect a smallest possible decrease of 2%.  In other words, we have obtained a fairly large sample size to see a reasonable change.  We can’t detect a change of 1% with that sample size, so if we think we see a 4.8% as the new percentage after interventions to change wound infections…well, perforce of our sample size, we can’t really say if anything has changed.


One more thing:  don’t know the probability of an event because you’ve never measured it before?  Well, make your best guess.  Many of us use 0.5 as the p if we really have no idea.  Some sample size calculation is better than none, and you can always revise the p as you start to collect data from a system and you get a sense of what the p actually is.


Now let’s consider continuous data.  For continuous data, sample size required to detect some delta at 95% level of confidence can be represented as



( [2][historic standard deviation of the data] / delta)^2.


When we plug numbers into this simplified sample size equation we see very quickly that we have much smaller samples of data required to show significant change.  This is one of the main reasons why I prefer continuous to discrete data.  Smaller sample sizes can show meaningful change.  However, for many of the different end points you will be collecting in your quality project, you will need both.  Remember, as with the discrete data equation, you set the delta as the smallest change you want to be able to find with your data collection project.


Interesting trick:  if you don’t know the historic standard deviation of your data (or you don’t have one) take the highest value of your continuous data and subtract the lowest.  Then divide what you get by 3.  Viola…estimate of historic standard deviation.


Another reason why continuous data is preferable to discrete data is the number of powerful tools it unlocks.  Continuous data allows us to use many other quality tools such as the CPK, data power transforms, and useful hypothesis testing. This can be more challenging with discrete data.  Some of the best ways we have see to represent discrete data include a Pareto diagram.  For more information regarding a Pareto diagram visit here.


Other than the Pareto diagram and a few other useful ways, discrete data presents us with more challenges for analysis.  Yes, there are statistical tests such as the chi squared proportions test that can determine statistical significance.  However, continuous data plainly open up a wider array of options for us.


Having continuous data allows us to make often better visual representations and allows our team to achieve a vision of the robustness of the process along with the current level of variation in the process.  This can be more challenging with the discrete data endpoints.


In conclusion, I like continuous data more than discrete data and I use it wherever I can in a project.  Continuous data endpoints often allow better visualization of variation in a process.  They also require smaller sample sizes and unlock a more full cabinet of tools which we can use to demonstrate our current level of performance.  In your next healthcare quality improvement project be sure to use continuous data points where possible and life will be easier!


Disagree?  Do you like discrete data better or argue “proper data for proper questions”?  Let us know!