How To Take Charge Of Your Transformed Data

By:  David M. Kashmer MD MBA MBB (@DavidKashmer)


No Need To Tell You About The Importance Of Data Again…

You will see many posts on here that describe why it’s important to manage ourselves and our systems with data. You have likely also read about other techniques utilized to make improvement including Kotter’s eight steps to culture change, tools like SIPOC diagrams, and even project charters. If you’ve opened an entry like this, odds are there is no need to tell you of how much more useful it is to utilize good data than more typical way we do process improvement in healthcare such as only reviewing individual cases.  You likely already understand all the reasons why what we do with data has many advantages over those alternatives.  No need to beat that horse again!


Now take a moment to focus on a finer, and more interesting point about managing systems with data. Specifically, this entry touches on points regarding non-normal data and the meaning behind transformed data.  Here, let’s wrestle with some of the interesting things that happen when we start to examine healthcare systems with data and wind up with seemingly odd things to manage.  For example, what happens if we transform data and end up having to manage something counterintuitive as a result?


What does it mean to the team, exactly, when you wind up having to manage some variable like time squared?


Ut-oh, I Used Data & Found A Non-normal Distribution

Have you ever measured a system only to find out that the data are not normally distributed? In fact, we have talked about this issue earlier here. As you may remember, we have several options for treating non-normal data. One is distribution fitting. We can try to find what distribution, if any, the data follow. Many tools for this exist including Minitab and Sigma XL. Nowadays, good quality improvement often requires software and knowledge of how to use it. For more, look here.


What about situations where the data don’t fit some non-normal distribution? In those cases, you may opt to transform the data. More on data transforms here. One of the available data transforms most commonly used is the Box-Cox transformation.



Box Cox Can Create Some Things To Measure That Feel Strange


This transformation tries to find a function that changes the data set into a normally distributed data set by raising the each data point to some power. Let me say that again, the data are changed to fit the normal distribution according to some power function that the system discovers. Usually, a program will then test the new data set (with the Anderson Darling test) to make sure these data now do follow the normal distribution.  (More here.)


So, for example, imagine you are measuring time in a certain scenario such as response of trauma providers to the trauma bay. Let’s say the data are not normally distributed and do not fit any of the typical distributions. What now? Let’s say you’ve opted for data transformation. The Box-Cox transformation tells you that although the data regarding time are not normally distributed the square of the time variable is normally distributed.


Okay, now what? What does it mean to manage time squared? After all, it’s hard to feel time squared and so time squared may not have a great deal of meaning intuitively.  All those instances where a provider responds to the trauma bay are measured in minutes and seconds. So, what exactly does it mean to measure time squared and how does the team go about paying attention to it?  This is an interesting philosophic question with which I have struggled in project teams. Let me share how we usually resolve this…


“Transform” Is A Bad Word…It Can Make It Sound Like Cheating


One step involves the utilization of the word “transform”. It’s important for the team to know that the data aren’t somehow subverted or cheated when they become transformed. The use of the word “transform” just means that the tool determines what function makes the variable under study (such as time) into a normal distribution with that aforementioned power function.


In fact, it seems that the word “transform” itself seems to make people feel, at times, that the data are no longer what they were. However, I try to impress upon the team (when we utilize data transforms) that transforming the data simply allows us to make the data into a form which allows use of routine statistical tests. That’s the whole point of the transform:  we get to use tests that are comfortable and straightforward.  If you remember, from earlier entries here, that many of the typical statistical tests we use assume the data are normally distributed. Therefore, when we have data that are not normally distributed, we cannot use typical tests. For more on tests which do and don’t require normal data look here.


Therefore, when we are in a situation where the data are non-normal yet don’t follow any typical distribution, one of the only options left to make the data set workable is a power transform such as the Box-Cox transformation. We try to focus users on the fact that we’ll be comparing variables such as time squared to the same apples (time squared again) that we collect post changes in the system. We will be comparing time squared to time squared.  Apples to apples.


Obviously, the longer it takes to get to the trauma bay, the longer the time is and thus the longer the time squared value will be. Importantly, we try to impress on users this more intuitive way of thinking about variables like time. Does it really matter if we are talking about time or time squared if when one is longer so is the other?  We see that same logic with many transformed process variables.


Keeping The Team Aligned With Reassurance Is The Key


In the end, we find that the typical issues with managing system variables like time squared after power transformation evaporate when we focus the group and educate them regarding what these variables indicate in the system. It’s an interesting philosophic question, but is one that rarely changes what we do or outcomes of quality improvement projects. As mentioned, however, it is key to know whether data are normally distributed or not and how to utilize power transforms effectively for working with non-normal data. Keeping the team aligned is important when we run up against challenging concepts like data-transforms that create unusual variables to manage such as time squared.