By: David Kashmer MD MBA MBB (@DavidKashmer)
Lean Six Sigma Master Black Belt
Quick, what’s worse: to work in a healthcare system that pretends everything is ok when things obviously are not ok, or to work in a healthcare system that looks to correct problems that may not even exist? Both are difficult scenarios, and each makes meaningful quality improvement difficult for different reasons. In this entry, let’s explore some tools to help recognize when a problem actually exists and how to guard against each of the extremes mentioned above. Where is the balance between tampering with a system and under-controlling for problems? Would you know whether you were in either situation?
If you’re seen a process improvement system that seems to be based on someone’s gut feeling, one that is based on who complains the loudest, or one that is based more in politics than in actually improving things, read on…
It Starts With Data
One of the toughest elements of deciding whether an issue is real and needs correction is good data. Unfortunately, many times we stumble at this initial step in guarding against tampering with good systems or under-recognizing bad ones. We’ll talk more about this in the last section (A Tangible Example) but, for know, realize that there are barriers to being able to make an intelligent decision about whether the issue you’re looking to improve is real or imagined.
Using data from your system, as we’ll describe later, is much better than using just your gut feeling about whether you’re tampering with a system.
You Need Some Knowledge
It takes some extra training and knowledge, often, to understand the issue of tampering versus under-controlling. You may get the training you in a statistics course or lean six sigma coursework–that’s where I got it. Wherever you get the training, it may go something like this:
Statistical testing helps us guard against important errors. Like one where we think there is a difference in system performance after an intervention when there is no difference. It also helps us when we think there is NO difference in system performance after we’ve made changes yet something has changed. These errors have names.
A type 1 error is the probability of thinking a difference exists when there isn’t one (tampering) and a type 2 error is the probability of thinking there is no difference when in face on exists. Type 1 errors, also known as alpha errors, are prevented by using statistical testing and YOU making an important choice. More on that beneath.
Use Some Tools
The tools to help you avoid tampering with a system that is ok include statistical testing…but which test? Previously on the blog, I shared the most useful file that I use routinely to decide which statistical test to use for which data. You can see it here.
A great tool for understanding the problem with tampering with systems is here. This video (thanks Villanova) drove home for me the issues that occur (and how we make things much worse) when we tamper. If you’ve never heard of the quincunx (cool word) check out the video!
Tampering often happens when a healthcare system that is well-meaning and wants to do better, to grow or achieve greatness, embarks on making changes ad infinitum without meaningful improvement. Heart in right place for sure and a very difficult problem to avoid in healthcare systems that really care. It’s a much better problem, I’d say, to have than under-controlling (head-in-sand ostrich syndrome) that is sometimes seen in systems.
A Tangible Example
Pretend your process improvement system is very focused on an issue with time. Time for what? Let’s say it’s the time patients stay in the emergency department. You and the team decided to get some real data, and it wasn’t easy. There were all sorts of problems with deciding what to measure, and how, but (finally) you and the team decided on a clear operational definition: time in the ED would be measured from the time the patient is first logged in triage until the time the patient physically arrived at their disposition whether that was the OR, floor bed, ICU, or something similar. No using the time the orders to transfer went in for you! You go by actual arrival time. You & the group decided the time in ED would be measured based on patient’s triage until they were wherever they are going. The team likes the definition and you proceed.
You measured the data for patients for a month because, after all, the meetings with the higher-ups were once a month or so and they wanted data and the monthly meetings were the routine. So, ok, sure…you measured the data for a month and had some cases where patients were in the ED for more than six hours and those felt pretty bad. You discussed the cases at the meeting-of-the-month and some eminent physicians and well-regarded staff weighed in. You made changes based on what people thought and felt about those bad cases.
A few more months went by and finally the changes were achieved. You still collected the data and reported it. There were, you think, fewer patients in the ED over six hours but you aren’t too sure. I mean, after all, some patients were still in the ED too long. Eventually, other more important issues seemed to come up, and the monthly meetings didn’t focus too much on the patient time in ED. Did the changes work? Should you make more changes because, after all, you think things may be a little better? Hmmm…
Welcome, my friend, to the problem: how do you know whether things are better or worse? Should you do more? Do less? Wait and see? Is the time patients spend in the ED really better? Yeeesh, it’s a lot of questions. Let’s go back to the beginning and see how to tell…
Back In Time…
You and a group are looking at how long patients are in the ED. After getting together, you decide to measure time in ED in a certain way: the clock starts when the patient arrives at triage and ends when the patient arrives at his/her destination.
Because you’ve learned about DMAIC and data collection plans, you decide to calculate how big a sample you will eventually need to detect whether you have improved time in the ED for patients by at least 10 minutes overall. This sample size helps guide you that if you see a 5 minute decrease in your time in ED after you make changes to the system, well, it may just be fantasy. The sample size math you do calculates the size of sample that will allow you to detect ten minutes as the smallest meaningful change and no smaller. How do you do that calculation? Look here.
Ok so now you know how many patients you will need to look at after you make changes. By the way, how long will that take? It may not coincide with your monthly meetings. It may take longer than a month to get enough data. That would tell you not to make any more changes to the system until you’ve given any changes you already made enough time to show how they work. This is part of how sample size protects use from too many changes too fast. We need to collect enough data to tell whether any changes we’ve made already work!
So let’s say you and the project team make some changes and now know the sample size you need to collect to see how you’re doing. Well, after those data are collected, now what? Now it’s time to pick a test!
You can use the tools listed here, but be careful! The type of tool you pick (and what it means) may take some extra training. Not sure which tool to use? Email me or use the comment section below and I’ll help.
So you pick the proper tool and compare your data pre and post changes, and you notice that your new median patient time in ED shows a decrease by 25 minutes! The test statistic you receive, when you run a statistical test regarding the central tendency of your data (maybe a nice Mood’s median test) shows a p value < 0.05. Does that mean you’re doing better regarding the median time in the ED? Yup…especially because you chose ahead of time that you would accept (at most) a 10% chance of a type 1 error (tampering).
That’s what the p value and alpha risk (tampering) is all about: it’s about you saying ahead of checking the data that there is some risk of thinking there’s a difference when there is no difference. You further say that you will accept, let’s say, a 10% risk of making that wrong conclusion. You could be wrong in one of two ways: thinking the new median is significantly higher than the old one or significantly lower and so to be fair you split the 10% risk between two tails (too high and too low) of that distribution. And viola! A two tailed p value is created of 5% probability: if the p from your test is < 0.05 (5%) then you say well the difference I see is likely to be legitimate, after all there’s a less than 10% probability that it is due to chance alone because it’s less than 5% in this tail and so even if I accept the whole other tail (5%) then I’m still less than 10% alpha risk I set at the beginning…
So the bottom line is you set the alpha risk (tampering risk / thinking there’s a difference when there isn’t one) waaayyyy back when as you set up what you’re doing with patient time in the ED.
If you slogged through that, you may start to see the headlines on using statistics from the Six Sigma process to prevent tampering:
(1) calculate a sample size required to sort out the minimum amount data you need to be able to detect a certain size change in the system. The calculation will tell you, given your choice about the smallest change you want to be able to detect, how many data points you need. It will make you wait and get that data after you make changes to a system rather than constantly adjusting it or tampering with the system owing to pressure of having the latest meeting come up, etc. etc.
(2) perform statistical testing to determine whether you’re really seeing something different or whether it’s just a fantasy. This will help disabuse you of your gut telling you you’re doing better when you aren’t (and thus missing opportunities) or charging at windmills with lance drawn as you misidentify when things have actually improved and continue to charge on.
Call To Action
Now that you’ve heard about some strategies to guard against layering change after change on a poor, unsuspecting process, imagine how you can use tools to avoid making a lot of extra work that (as the quincrux teaches) can actually hurt your system performance. Want more info on how to get it done? Get in touch!
Have you seen any examples of tampering with a system or failing to appreciate issues? Let me know about your experiences with or thoughts on tampering and under-controlling beneath!