Let’s Talk About Data


By:  DMKashmer, MD MBA MBB (@DavidKashmer)



A Car Isn’t Just About The Wheels…But It Needs The Wheels To Go.


We have spent a great deal of time running through the background of a typical Six Sigma project. This is because I have heard colleagues estimate that 80% of the Six Sigma process is about the people and the teamwork, and that seems about right to me.  Without a supportive team and a receptive administration, the Six Sigma pathway is slow to succeed if it succeeds at all. However, despite the fact that much of Six Sigma is not about the mathematics and statistics, the mathematics and statistics are a huge difference maker. Alone they are not sufficient for project success; however, they are a necessary ingredient. After all, a lot of the car is not about the wheels–yet you absolutely need the wheels. Now let’s take the time to discuss Six Sigma data collection plans and other important considerations so that we can get those wheels turning.


West Meets East


There are some important differences between classic Western management styles and Eastern management styles, and these play into what we do with data. First, as we have mentioned, the rebuilding of Japan post World War 2 allowed Western modern management thinkers to implement advanced techniques in a receptive culture. (West went East, and it flourished.)


This deployment resulted in an interesting blend of statistical process control and Eastern philosophy. Western classic management, by contrast, is very person and individual focused. It has often been described as being less data driven. Of course, counter examples spring to mind immediately. Instance, time-motion studies performed in early America, Henry Ford’s production line, and multiple counter examples clearly exist. We are speaking in generalities here not hard and fast rules.


That said, modern Western management styles are incorporating data driven decision making more and more. Not only is “big data” a catch phrase, but Six Sigma and Lean deployments have really changed the face of what we describe as Western management philosophy. Here, let’s describe some of how adding a data layer to your project really gets it moving in the right direction.  Let’s go to the nuts and bolts.


The Data Collection Plan Is Based On The SIPOC Diagram


We have previously discussed the SIPOC diagram as a high level process map (and beyond) for which ever process you are trying to improve. We mentioned the SIPOC diagram as one of the Six Sigma tollgates in the define phase. Now let’s move on to how this SIPOC diagram is woven into creating a data collection plan.


No Matter Which Data Points You Choose To Collect, Get The Data Right From The Process


First, some broad philosophic points. When we collect data for Six Sigma projects, we recommend avoiding the use of data from data warehouses or trauma registries whenever possible. Why? This is because data in warehouses and similar registries has often been cleaned, edited, or otherwise filtered. Whenever possible, we recommend getting data directly from the process. Spending time ‘on the factory floor’ is useful in that it leverages the Hawthorne effect and gives the team (as well as managers) a real feel for how the process works. We recommend going to the gemba whenever possible.


Let’s take a moment to continue the philosophy of why we collect data that way we do. First, the question has often come up of whether leveraging the Hawthorne effect is appropriate for these projects.


The answer:  we will take quality improvement anyway we can get it….as long as it is sustainable.


Hawthorne Effect?  We’ll Take It!


Let me explain what I mean. We don’t mind if the Hawthorne effect is a driver for quality improvement, as long as this quality improvement is sustainable. That’s why, whether the Hawthorne effect is at play or not, we look for improvement that is quantifiable, reproducible, and persists in the control phase (a later Six Sigma project phase) and beyond. We don’t care if the Hawthorne is one of the players that improves things; however, we want to make sure this is sustainable. So, the first two philosophic points are: 1. Take data directly from the process in real time if possible and 2. Hawthorne effect? So what…if it’s sustainable!


Specific Nuts And Bolts


Next let’s move into the nuts and bolts of how to turn your SIPOC diagram into your data collection scheme. Remember, the SIPOC diagram highlights Suppliers, Inputs, Process, Output, and Customer. In general, to fully characterize your system you will require six or seven endpoints. (Yup.  That’s it.  NOT 20 data points.  NOT 400–just six or seven for quality improvement projects.  We’re not talking about getting novel insights from data a la “big data” here.)


This is includes one or two input endpoints, 1 process endpoint, and two or three output endpoints. These endpoints are key to adequately describe your system.  Where do you get them?  You look at the SIPOC diagram to see each element of the process, and you choose endpoints that the group agrees have meaning for that element.  In healthcare, we suffer from an issue here:  we often can’t believe that (with all the data we see and track) that we need so few endpoints to make dramatic improvement.  Guess what healthcare colleagues:  we don’t need to be data rich and information poor for our quality improvement projects…even though we may be used to it in the rest of our work.  It’s always a challenge to focus healthcare teams on narrowing our view to the six or seven required endpoints.  It’s challenging…but rewarding when the project succeeds.


Another important note:  sometimes, we are able to ‘double dip’.  “Double dipping”, here, is a term used when one endpoint can actually represent two elements of your system. Yes, sometimes we do need to use trauma registry data or other data that has already been collected (because it’s very handy, easy to get, and does what we need.) Sometimes an element from the registry can serve as an input measure and a process measure.  Again, we try to avoid registry data whenever possible…yet, sometimes, not only do we use it but we allow it help us double dip.


How Much Data Do We Need?


The next question is how much data do you need? We have included sample size equations for discrete and continuous data here. These equations can help you determine how much data you will need to characterize your system in order to detect a certain size change. It is useful at this point to calculate what sample size you will need and to determine how small of a change you would like to be able to find. Figure those things out before you collect data.  Of course, positioning a sample to detect a smaller change means you need many more data points. Read more about it here.


The next important philosophic take home message is we try to use continuous data whenever possible. Why do we use continuous data? Read more about it here. As mentioned in that post, we can do a lot more with a lot less continuous data as opposed to discrete data. We encourage the use of continuous data whenever possible. And, if need be, it makes sense to turn your discrete data endpoints into a more continuous type of data with something like a Likert scale.


That’s The Setup, & More To Come


So, now you have the setup. You need to take your SIPOC diagram and establish which endpoints have meaning for you. It is a great idea to take endpoints that are easy to collect right from your process. This often makes data collection obtainable. All of that said, there are some other challenges to data collection.  More on that in the next post!


See you soon.  Questions?  Comments?  Let me know beneath.