By now, you know how I feel about responding to one (or a few) cases with big changes to your system. Now I know that sometimes cases are so extra-ordinary, dramatic, or horrific that something has to be done owing to public pressure, legal issues, or some extenuating circumstances. But, guess what, it turns out that responding to one (or just a few) cases leads to a worsening of your system’s performance.
Remember…responding to just one case leads to a worsening of your system’s performance.
Don’t believe me? I don’t expect you to! I didn’t believe it either until I learned about statistical process control.
I remember a video I watched that really drove this point home for me. And I’ve spent more than 30 hours online looking for a copy of that video to share with you, my healthcare friends, so that you’d share in this revelation with me!
Beneath, I’ve shared a link to this incredible video. Remember, as you watch, that the process involved could be manufacturing a part, caring for a patient, or arriving on time for a trauma activation…the processes and data involved don’t know (or care) when it comes to using the powerful tools of statistical process control!
The process involved could be manufacturing a part, caring for a patient, or arriving on time for a trauma activation…the data involved don’t know or care when it comes to using statistical process control!
Bottom line…adjusting a system based on one extraordinary case leads to worse outcomes overall. That’s why it’s so important to use the tools of statistical process control and to improve your system as a whole…because doing that will get us much further and will eliminate those extra-ordinary cases in the future. Adjusting based on an attention-getting case will often just make things worse!
Click the link or photo beneath to view this incredibly useful demonstration. (Source: by the way, this video is from a Villanova University course on Lean Six Sigma…and it’s the best I’ve ever seen on this topic!)
Remember when you first heard of the Swiss-Cheese model of medical error? It sounds so good, right? When a bunch of items line up, we get a defect. It’s satisfying. We all share some knowledge of Swiss Cheese–it’s a common image.
That’s what makes it so attractive–and, of course, the Swiss Cheese model is a much better mental model than what came before, which was some more loose concept of a bunch of items that made an error or, worse yet, a profound emphasis on how someone screwed up and how that produced a bad outcome.
Models supplant each other over time. Sun goes around the Earth (geocentric) model was supplanted by the sun at the center (heliocentric) model–thank you Kepler and Copernicus!
Now, we can do better with our model of medical error and defect, because medical errors really don’t follow a Swiss Cheese model. So let’s get a better one and develop our shared model together.
In fact, medical errors are more like Frogger. Now that we have more Millenials in the workplace than ever (who seem to be much more comfortable as digital natives than I am as a Gen Xer), we can use a more refined idea of medical error that will resonate with the group who staff our hospitals. Here’s how medical errors are more like Frogger than Swiss Cheese:
(1) In Swiss Cheese, the holes stay still. That’s not how it is with medical errors. In fact, each layer of a system that a patient passes through has a probability of having an issue. Some layers are lower, and some are higher. Concepts like Rolled Throughput Yieldreflect this and are much more akin to how things actually work than the illusion that we have fixed holes…thinking of holes gives the illusion that, if only we could only identify and plug those holes, life would be perfect!
In Frogger, there are gaps in each line of cars that pass by. We need to get the frog to pass through each line safely and oh, by the way, the holes are moving and sometimes not in the same place. That kind of probabilistic thinking is much more akin to medical errors: each line of traffic has an associated chance of squishing us before we get to the other side. The trick is, of course, we can influence and modify the frequency and size of the holes…well, sometimes anyway. Can’t do that with Swiss Cheese for sure and, with Frogger, we can select Fast or Slower modes. (In real life, we have a lot more options. Sometimes I even label the lines of traffic as the 6M‘s.)
(2) In the Swiss Cheese Model, we imagine a block of cheese sort of sitting there. There’s no inherent urgency in cheese (unless you’re severely lactose intolerant or have a milk allergy I guess). It’s sort of a static model that doesn’t do much to indicate safety.
But ahhh, Frogger, well there’s a model that makes it obvious. If you don’t maneuver the frog carefully that’s it–you’re a goner. Of course, we have the advantage of engineering our systems to control the flow of traffic and both the size and presence of gaps. We basically have a cheat code. And, whether your cheat code is Lean, Lean Six Sigma, Six Sigma, Baldrige Excellence Framework, ISO, Lean Startup, or some combination…you have the ultimate ability unlike almost any Frogger player to change the game to your patient’s advantage. Of course, unlike Frogger, your patient only gets one chance to make it through unscathed–that’s very different than the video game world and, although another patient will be coming through the system soon, we’ll never get another chance to help the current patient have a perfect experience.
All of that is highlighted by Frogger and is not made obvious by a piece of cheese.
(3) In Frogger, the Frog starts anywhere. Meaning, not only does the traffic move but the Frog starts anywhere along the bottom of the screen. In Frogger we can control that position, but in real life the patients enter the system with certain positions we can not control easily and, for the purposes of their hospital course anyway, are unable to be changed. It may be their 80 pack year history of smoking, their morbid obesity, or their advanced age. However, the Frogger model recognizes the importance of initial position (which unlike real life we can control more easily) while the Swiss Cheese model doesn’t seem to make clear where we start. Luckily, in real life, I’ve had the great experience of helping systems “cheat” by modifying the initial position…you may not be able to change initial patient comorbid conditions but you can sometimes set them up with a better initial position for the system you’re trying to improve.
Like you, I hear about the Swiss Cheese model a lot. And, don’t get me wrong, it’s much better than what came before. Now, however, in order to recognize the importance of probability, motion, initial position, devising a safe path through traffic, and a host of other considerations, let’s opt for a model that recognizes uncertainty, probability, and safety. With more Millenials than ever in the workplace (even though Frogger predates them!) we have digital natives with whom game imagery is much more prone to resonate than a static piece of cheese.
Use Frogger next time you explain medical error because it embodies how to avoid medical errors MUCH better than cheese.
Dr. David Kashmer, a trauma and acute care surgeon, is a Fellow of the American College of Surgeons and is a nationally known healthcare expert. He serves as a member of the Board of Reviewers for the Malcolm Baldrige National Quality Award. In addition to his Medical Doctor degree from MCP Hahnemann University, now Drexel University College of Medicine, he holds an MBA degree from George Washington University. He also earned a Lean Six Sigma Master Black Belt Certification from Villanova University. Kashmer contributes to TheHill.com, Insights.TheSurgicalLab.com, and The Healthcare Quality Blog.com where the focus is on quality improvement and value in surgery and healthcare.
Of the many barriers we face while trying to improve quality in healthcare, none is perhaps more problematic than the lack of good data. Although everyone seems to love data (I see so much written about healthcare data) it is often very tough to get. And when we do get it, much of the data we get are junk. It’s not easy to make meaningful improvements based on junk data. So, what can we do to get meaningful data for healthcare quality improvement?
In this entry, I’ll share some tools, tips, & techniques for getting meaningful quality improvement data from your healthcare system. I’ll share how to do that by telling a story about Super Bowl LI…
The Super Bowl Data Collection
About ten minutes before kickoff, I had a few questions about the Super Bowl. I was wondering if there was a simple way to gauge the performance of each team and make some meaningful statements about that performance.
When we do quality improvement projects, it’s very important to make sure it’s as easy as possible to collect data. I recommend collecting data directly from the process rather than retrospectively or from a data warehouse. Why? For one, I was taught that the more filters the data pass through the more they are cleaned up or otherwise altered. They tend to lose fidelity and a direct representation of the system. Whether you agree or not, my experience has definitely substantiated that teaching.
The issue with that is how do I collect data directly from the system? Isn’t that cumbersome? We don’t have staff to collect data (!) Like you, I’ve heard each of those barriers before–and that’s what makes the tricks and tools I’m about to share so useful.
So back to me then, sitting on my couch with a plate of wings and a Coke ready to watch the Super Bowl. I wanted data on something that I thought would be meaningful. Remember, this wasn’t a DMAIC project…it was just something to see if I could quickly describe the game in a meaningful way. It would require me to collect data easily and quickly…especially if those wings were going to get eaten.
Decide Whether You’ll Collect Discrete or Continuous Data
So as the first few wings disappeared, I decided about what type of data I’d want to collect. I would definitely collect continuous data if at all possible. (Not discrete.) That part of the deciding was easy. (Wonder why? Don’t know the difference between continuous and discrete data? Look here.)
Ok, the next issue was these data had to be very easy for me to get. They needed to be something that I had a reasonable belief would correlate with something important. Hmmm…ok, scoring touchdowns. That’s the whole point of the game after all.
Get A Clear Operational Definition Of What You’ll Collect
As wings number three and four disappeared, and the players were introduced, I decided on my data collection plan:
collect how far away each offense was from scoring a touchdown when possession changed
each data point would come from where ball was at start of 4th down
interceptions, fumbles, or change of possession (like an interception) before 4th down would NOT recorded (I’ll get to why in a minute.)
touchdowns scored were recorded as “0 yards away”
a play where a field goal was attempted would be recorded as the where the ball was on the start of the down
Of course, for formal quality projects, we would collect more than just one data point. Additionally, we specify exactly the operational definition of each endpoint.
We’d also make a sample size calculation. Here, however, I intended to collect every touchdown and change of possession where a team kicked away on fourth down or went for it but didn’t make it. So this wasn’t a sample of those moments. It was going to be all of them. Of course, they don’t happen that often. That was a big help here, because they can also be anticipated. That was all very important so I could eat those wings.
Items like interceptions, fumbles, and other turnovers can not be anticipated as easily. They also would make me have to pay attention to where the ball was spotted at the beginning of every down. It was tough enough to pay attention to the spot of the ball for the downs I was going to record.
With those rules in mind, I set out to record the field position whenever possession changed. I thought that the position the offense wound up its possession at, over time, might correlate with who won the game. Less overall variance in final position might mean that team had less moments where it under-performed and lost possession nearer to its own endzone.
Of course, it could also mean that the team never reached the endzone for a touchdown. In fact, if the offense played the whole game between their opponents 45 and 50 yard line it would have little variation in field position…but also probably wouldn’t score much. Maybe a combination of better field position (higher median field position) and low variation in field position would indicate who won the game. I thought it might. Let’s see if I was right.
Data Collection: Nuts and Bolts
Next, I quickly drew a continuous data collection sheet. It looked like this:
Sounds fancy, but obviously it isn’t. That’s an important tool for you when you go to collect continuous data right from your process: the continuous data collection sheet can be very simple and very easy to use.
Really, that was about it. I went through the game watching, like you, the Patriots fall way behind for the first two quarters. After some Lady Gaga halftime show (with drones!) I looked at the data and noticed something interesting.
The Patriots data on distance from the endzone seemed to demonstrate less variance than the Falcons. (I’ll show you the actual data collection sheets in a moment.) It was odd. Yes, they were VERY far behind. Yes there had been two costly turnovers that lead to the Falcons opening up a huge lead. But, strangely, in terms of moving the ball and getting closer to the endzone based on their own offense, the Patriots were actually doing better than the Falcons. Three people around me pronounced the Patriots dead and one even said we should change the channel.
If you’ve read this blog before, you know that one of the key beliefs it describes is that data is most useful when it can change our minds. These data, at least, made me unsure if the game was over.
As you know (no spoiler alert really by now) the game was far from over and the Patriots executed one of the most impressive comebacks (if not the most impressive) in Super Bowl history. Data collected and wings eaten without difficulty! Check and check.
Here are the final data collection sheets:
Notice the number in parenthesis next to the distance from the endzone when possession changed? That number is the possession number the team had. So, 52(7) means the Falcons were 52 yards away from the Patriots endzone when they punted the ball on their seventh possession of the game. An entry like 0(10) would mean that the team scored a touchdown (0 yards from opposing team’s endzone) on their tenth possession.
Notice that collecting data this way and stacking similar numbers on top of each other makes a histogram over time. That’s what let me see how the variation of the Patriot’s final field position was smaller than the Falcon’s by about halfway through the game.
Anything To Learn From The Data Collection?
Recently, I put the data into Minitab to see what I could learn. Here are those same histograms for each offense’s performance:
Notice a few items. First, each set of data do NOT deviate from the normal distribution per the Anderson-Darling test. (More info on what that means here.) However, a word of caution: there are so few data points in each set that it can be difficult to tell which distribution they follow. I even performed distribution fitting to demonstrate that testing will likely show that these data do not deviate substantially from other distributions either. Again, it’s difficult to tell a difference because there just aren’t that many possessions for each team in a football game. In a Lean Six Sigma project, we would normally protect against this with a good sampling plan as part of our data collection plan but, hey, I had wings to eat! Here’s an example of checking the offense performance against other data distributions:
Just as with the initial Anderson-Darling test, we see here that the data do not deviate from many of these other distributions either. Bottom line: we can’t be sure which distribution it follows. Maybe the normal distribution, maybe not.
In any event, we are left with some important questions. Notice the variance exhibited by the Patriots offense versus the Falcons offense: this highlights that the Patriots in general were able to move the ball closer to the Falcons endzone by the time the possession changed (remember that turnovers aren’t included). Does that decreased variation correlate with the outcome of every football game? Can it be used to predict outcomes of games? I don’t know…at least not yet. After all, if stopping a team inside their own 10 yard line once or twice was a major factor in predicting who won a game, well, that would be very useful! If data is collected by the league on field position, we could apply this idea to previous games (maybe at half time) and see if it predicts the winner routinely. If it did, we could apply it to future games.
In the case of Super Bowl LI, the Patriots offense demonstrated a better median field position and less variation in overall field position compared to the Falcons.
Of course, remember this favorite quote:
All models are wrong, but some are useful. — George E.P. Box (of Box-Cox transform fame)
Final Recommendations (How To Eat Wings AND Collect Data)
More importantly, this entry highlights a few interesting tools for data collection for your healthcare quality project. At the end of the day, in order to continue all the things you have to do and collect good data for your project, here are my recommendations:
(1) get data right from the process, not a warehouse or after it has been cleaned.
(2) use continuous data!
(3) remember the continuous data check sheet can be very simple to set up and use
(4) when you create a data collection plan, remember the sample size calculation & operational definition!
(5) reward those who collect data…maybe with wings!
In the last entry, you saw a novel, straightforward metric to capture the value provided by a healthcare service called the Healthcare Value Process Index (HVPI). In this entry, let’s explore another example of exactly how to apply the metric to a healthcare service to demonstrate how to use the index.
At America’s Best Hospital, a recent quality improvement project focused on time patients spent in the waiting room of a certain physician group’s practice. The project group had already gone through the steps of creating a sample plan and collecting data that represents how well the system is working.
From a patient survey, sent out as part of the project, the team learned that patients were willing to wait, at most, 20 minutes before seeing the physician. So, the Voice of the Customer (VOC) was used to set the Upper Specification Limit (USL) of 20 minutes.
A normality test (the Anderson-Darling test) was performed, and the data collected follow the normal distribution as per Figure 1 beneath. (Wonder why the p >0.05 is a good thing when you use the Anderson-Darling test? Read about it here.)
The results of the data collection and USL were reviewed for that continuous data endpoint “Time Spent In Waiting Room” and were plotted as Figure 2 beneath.
The Cpk value for the waiting room system was noted to be 0.20, indicating that (long term) the system in place would produce more that 500,000 Defects Per Million Opportunities (DPMO) with the accompanying Sigma level of < 1.5. Is that a good level of performance for a system? Heck no. Look at how many patients wait more than 20 minutes in the system. There’s a quality issue there for sure.
What about the Costs of Poor Quality (COPQ) associated with waiting in the waiting room? Based on the four buckets of the COPQ, your team determines that the COPQ for the waiting room system (per year) is about $200,000. Surprisingly high, yes, but everyone realizes (when they think about it) that the time Ms. Smith fell in the waiting room after being there 22 minutes because she tried to raise the volume on the TV had gotten quite expensive. You and the team take special note of what you items you included from the Profit and Loss statement as part of the COPQ because you want to be able to go back and look after changes have been made to see if waste has been reduced.
In this case, for the physician waiting room you’re looking at, you calculate the HVPI as
(100)(0.20) / (200) or 0.1
That’s not very good! Remember, the COPQ is expressed in thousands of dollars to calculate the HVPI.
Just then, at the project meeting to review the data, your ears perk up when a practice manager named Jill says: “Well our patients never complain about the wait in our waiting room which I think is better than that data we are looking at. It feels like our patients wait less than 20 minutes routinely, AND I think we don’t have a much waste in the system. Maybe you we could do some things like we do them in our practice.”
As a quality improvement facilitator, you’re always looking for ideas, tools, and best practices to apply in projects like this one. So you and the team plan to look in on the waiting room run by the practice manager.
Just like before, the group samples the performance of the system. It runs the Anderson-Darling test on the data and they are found to be normally distributed. (By the way, we don’t see that routinely in waiting room times!)
Then, the team graphs the data as beneath:
Interestingly, it turns out that this system has a central tendency very similar to the first waiting room you looked at–about 18 minutes. Jill mentioned how most patients don’t wait more than 18 minutes and the data show that her instinct was spot on.
…but, you and the team notice that the performance of Jill’s waiting room is much worse than the first one you examined. The Cpk for that system is 0.06–ouch! Jill is disappointed, but you reassure her that it’s very common to see that how we feel about a system’s performance doesn’t match the data when we actually get them. (More on that here.) It’s ok because we are working together to improve.
When you calculate the COPQ for Jill’s waiting room, you notice that (although the performance is poor) there’s less as measured by the costs to deliver that performance. The COPQ for Jill’s waiting room system is $125,000. (It’s mostly owing to the wasted time the office staff spend trying to figure out who’s next, and some other specifics to how they run the system.) What is the HVPI for Jill’s waiting room?
(100)(0.06) / (125) = 0.048
Again, not good!
So, despite having lower costs associated with poor quality, Jill’s waiting room provides less value for patients than does the first waiting room that you all looked at. It doesn’t mean that the team can’t learn anything from Jill and her team (after all, they are wasting less as measured by the COPQ) but it does mean that both Jill’s waiting room and the earlier one have a LONG way to go to improve their quality and value!
Fortunately, after completing the waiting room quality improvement project, the Cpk for the first system studied increased to 1.3 and Jill’s waiting room Cpk increased to 1.2–MUCH better. The COPQ for each system decreased to $10,000 after the team made changes and went back to calculate the new COPQ based on the same items it had measured previously.
The new HVPI (with VOC from the patients) for the first waiting room? That increased to 13 and the HVPI for Jill’s room rose to 12. Each represents an awesomeincrease in value to the patients involved. Now, of course, the challenge is to maintain those levels of value over time.
This example highlights how value provided by a system by a healthcare system for any continuous data endpoint can be calculated and compared across systems. It can be tracked over time to demonstrate increases. The HVPI represents a unique value measure comprised of a system capability measure and the costs of poor quality.
Questions or thoughts about the HVPI? Let me know & let’s discuss!
You’ve probably heard the catchphrase “volume to value” to describe the current transition in healthcare. It’s based on the idea that healthcare volume of services should no longer be the focus when it comes to reimbursement and performance. Instead of being reimbursed a fee per service episode (volume of care), healthcare is transitioning toward reimbursement with a focus on value provided by the care given. The Department of Health and Human Services (HHS) has recently called for 50% or more of payments to health systems to be value-based by 2018.
Here’s a recent book I completed on just that topic: Volume to Value. Do you know what’s not in that book, by the way? One clear metric on how exactly to measure value across services! That matters because, after all
If you can’t measure it, you can’t manage it. –Peter Drucker
An entire book on value in healthcare and not one metric which points right to it! Why not? (By the way, some aren’t sure that Peter Drucker actually said that.)
Here’s why not: in healthcare, we don’t yet agree on what “value” means. For example, look here. Yeesh, that’s a lot of different definitions of value. We can talk about ways to improve value by decreasing cost of care and increasing value, but we don’t have one clear metric on value (in part) because we don’t yet agree on a definition of what value is.
In this entry, I’ll share a straightforward definition of value in healthcare and a straightforward metric to measure that value across services. Like all entries, this one is open for your discussion and consideration. I’m looking for feedback on it. An OVID, Google, and Pubmed search revealed nothing similar to the metric I propose beneath.
First, let’s start with a definition of value. Here’s a classic, arguably the classic, from Michael Porter (citation here).
Ok so there are several issues that prevent us from easily applying this definition in healthcare. Let’s talk about some of the barriers to making something measurable out of the definition. Here are some now:
(1) Remarkably, we often don’t know how much (exactly) everything costs in healthcare. Amazing, yes, but nonetheless true. With rare exception, most hospitals do not know exactly how much it costs to perform a hip replacement and perform the after-care in the hospital for the patient. The time spent by FTE employees, the equipment used, all of it…nope, they don’t know. There are, of course, exceptions to this. I know of at least one health system that knows how much it costs to perform a hip replacement down to the number and amount of gauze used in the OR. Amazing, but true.
(2) We don’t have a standardized way for assessing health outcomes. There are some attempts at this, such as QALYs, but one of the fundamental problems is: how do you express quality in situations where the outcome you’re looking for is different than quality & quantity of life? The QALY measures outcome, in part, in years of life, but how does that make sense for acute diseases like necrotizing soft tissue infections that are very acute (often in patients who won’t be alive many more years whether the disease is addressed or not), or other items to improve like days on the ventilator? It is VERY difficult to come up with a standard to demonstrate outcomes–especially across service lines.
(3) The entity that pays is not usually the person receiving the care. This is a huge problem when it comes to measuring value. To illustrate the point: imagine America’s Best Hospital (ABH) where every patient has the best outcome possible.
No matter what patient with what condition comes to the ABH, they will have the BEST outcome possible. By every outcome metric, it’s the best! It even spends little to nothing (compared to most centers) to achieve these incredible outcomes. One catch: the staff at ABH is so busy that they just never write anything down. ABH, of course, would likely not be in business for long. Why? Despite these incredible outcomes for patients, ABH would NEVER be re-imbursed. This thought experiment shows that valuable care must somehow include not just the attention to patients (the Voice of the Patient or Voice of the Customer in Lean & Six Sigma parlance), but also to the necessary mechanics required to be reimbursed by the third party payors. I’m not saying whether it’s a good or bad thing…only that it simply is.
So, where those are some of the barriers to creating a good value metric for healthcare, let’s discuss how one might look. What would be necessary to measure value across different services in healthcare? A useful value metric would
(1) Capture how well the system it is applied to is working. It would demonstrate the variation in that system. In order to determine “how well” the system is working, it would probably need to incorporate the Voice of the Customer or Voice of the Patient. The VOP/VOC often is the upper or lower specification limit for the system as my Lean Six Sigma and other quality improvement colleagues know. The ability to capture this performance would be key to represent the “health outcomes” portion of the definition.
(2) Be applicable across different service lines and perhaps even different hospitals. This requirement is very important for a useful metric. Can we create something that captures outcomes as disparate as time spent waiting in the ER and something like patients who have NOT had a colonoscopy (but should have)?
(3) Incorporate cost as an element. This item, also, is required for a useful metric. How can we incorporate cost if, as said earlier, most health systems can’t tell you exactly how much something costs?
With that, let’s discuss the proposed metric called the “Healthcare Value Process Index”:
Healthcare Value Process Index = (100) Cpk / COPQ
where Cpk = the Cpk value for the system being considered, COPQ is the Cost of Poor Quality for that same system in thousands of dollars, and 100 is an arbitrary constant. (You’ll see why that 100 is in there under the example later on.)
Yup, that’s it. Take a minute with me to discover the use of this new value metric.
First, Cpk is well-known in quality circles as a representation of how capable a system is at delivering a specified output long term. It gives a LOT of useful information in a tight package. The Cpk, in one number, describes the number of defects a process is creating. It incorporates the element of the Voice of the Patient (sometimes called the Voice of the Customer [VOC] as described earlier) and uses that important element to define what values in the system are acceptable and which are not. In essence, the Cpk tells us, clearly, how the system is performing versus specification limits set by the VOC. Of course, we could use sigma levels to represent the same concepts.
Weaknesses? Yes. For example, some systems follow non-normal data distributions. Box-Cox transformations or other tools could be used in those circumstances. So, for each Healthcare Value Process Index, it would make sense to specify where the VOC came from. Is it a patient-defined endpoint or a third party payor one?
That’s it. Not a lot of mess or fuss. That’s because when you say the Cpk is some number, we have a sense of the variation in the process compared to the specification limits of the process. We know how whatever process you are talking about is performing, from systems as different as time spent peeling bananas to others like time spent flying on a plane. Again, healthcare colleagues, here’s the bottom line: there’s a named measure for how well a system represented by continuous data (eg time, length, etc.) is performing. This system works for continuous data endpoints of all sorts. Let’s use what’s out there & not re-invent the wheel!
(By the way, wondering why I didn’t suggest the Cp or Ppk? Look here & here and have confidence you are way beyond the level most of us in healthcare are with process centering. Have a look at those links and pass along some comments on why you think one of those other measures would be better!)
Ok, and now for the denominator of the Healthcare Value Process Index: the Cost of Poor Quality. Remember how I said earlier that health systems often don’t know exactly how much services cost? They are often much more able to tell when costs decrease or something changes. In fact, the COPQ captures the Cost of Poor Quality very well according to four buckets. It’s often used in Lean Six Sigma and other quality improvement systems. With a P&L statement, and some time with the Finance team, the amount the healthcare system is spending on a certain system can usually be sorted out. For more info on the COPQ and 4 buckets, take a look at this article for the Healthcare Financial Management Association. The COPQ is much easier to get at than trying to calculate the cost of an entire system. When the COPQ is high, there’s lots of waste as represented by cost. When low, it means there is little waste as quantified by cost to achieve whichever outcome you’re looking at.
So, this metric checks all the boxes described earlier for exactly what a good metric for healthcare value would look like. It is applicable across service lines, captures how well the system is working, and represents the cost of the care that’s being rendered in that system. Let’s do an example.
Pretend you’re looking at a sample of the times that patients wait in the ER waiting room. The Voice of the Customer says that patients, no matter how not-sick they may seem, shouldn’t have to wait any more than two hours in the waiting room.
Of course, it’s just an example. That upper specification limit for wait time could have been anything that the Voice of the Customer said it was. And, by the way, who is the Voice of the Customer that determined that upper spec limit? It could be a regulatory agency, hospital policy, or even the director of the ER. Maybe you sent out a patient survey and the patients said no one should ever have to wait more than two hours!)
When you look at the data you collected, you find that 200 patients came through the ER waiting room in the time period studied. That means 2 defects per 200 opportunities, which is a DPMO (Defects Per Million Opportunities) of 10,000. Let’s look at the Cpk level associated with that level of defect:
Ok, that’s a Cpk of approximately 1.3 as per the table above. Now what about the costs?
We look at each of the four buckets associated with the Cost of Poor Quality. (Remember those four buckets?) First, the surveillance bucket: an FTE takes 10 minutes of their time every shift to check how long people have been waiting in the waiting room. (In real life, there are probably more surveillance costs than this.) Ok, so those are the costs required to check in on the system because of its level of function.
What about the second bucket, the cost of internal failures? That bucket includes all of the costs associated with issues that arise in the system but do not make it to the patient. In this example, it would be the costs attributed to problems with the amount of time a person is in the waiting room that don’t cause the patient any problems. For example, were there any events when one staff member from the waiting room had to walk back to the main ED because the phone didn’t work and so they didn’t know if it was time to send another patient back? Did the software crash and require IT to help repair it? These are problems with the system which may not have made it to the patient and yet did have legitimate costs.
The third bucket, often the most visible and high-profile, includes the costs associated with defects that make it to the patient. Did someone with chest pain somehow wind up waiting in the waiting room for too long, and require more care than they would have otherwise? Did someone wait more than the upper spec limit and then the system incurred some cost as a result? Those costs are waste and, of course, are due to external failure of waiting too long.
The last bucket, my favorite, is the costs of prevention. As you’ve probably learned before, this is the only portion of the COPQ that generates a positive Return On Investment (ROI) because money spent on prevention usually goes very far toward preventing many more costs downstream. In this example, if the health system spent money on preventing defects (eg some new computer system or process that freed up the ED to get patients out of the waiting room faster) that investment would still count in the COPQ and would be a cost of prevention. Yes, if there were no defects there would be no need to spend money on preventative measures; however, again, that does not mean funds spent on prevention are a bad idea!
After all of that time with the four buckets and the P&L, the total COPQ is discovered to be $325,000. Yes, that’s a very typical size for many quality improvement projects in healthcare.
Now, to calculate the Healthcare Value Process Index, we take the system’s performance (Cpk of 1.3), multiple it by 100, and divide by 325. We see a Healthcare Value Process Index of 0.4. We carefully remember that the upper spec limit was 120 and came from the VOC who we list when we report it out. The 100 is there to make the results easier to remember. It simply changes the size of the typical answer we get to something that’s easier to remember.
We would report this Healthcare Value Process Index as “Healthcare Value Process Index of 0.4 with VOC of 120 min from state regulation” or whomever (whichever VOC) gave us the specification limits to calculate the Cpk. Doing that allows us to compare a Healthcare Value Process Index from institution to institution, or to know when they should NOT be compared. It keeps it apples to apples!
Now imagine the same system performing worse: a Cpk of 0.7. It even costs more, with a COPQ of 425,000. The Healthcare Value Process Index (HVPI)? That’s 0.0165. Easy to see it’s bad!
How about a great system for getting patient screening colonoscopies in less that a certain amount of time or age? It performs really well with a Cpk of 1.9 (wow!) and has a COPQ of $200,000. It’s HVPI? That’s 0.95. Much better than those other systems!
Perhaps even more useful than comparing systems with the HVPI is tracking the HVPI for a service process. After all, no matter what costs were initially assigned to a service process, watching them change over time with improvements (or worsening of the costs) would likely prove more valuable. If the Cpk improves and costs go down, expect a higher HVPI next time you check the system.
At the end of the day, the HVPI is a simple, intuitive, straightforward measure to track value across a spectrum of healthcare services. The HVPI helps clarify when value can (and can not) be compared across services. Calculating the HVPI requires knowledge of system capability measures and clarity in assigning COPQ. Regardless of initial values for a given system and different ways in which costs may be assigned, trending HVPI may be more valuable to track the trend of value for a given system.
Questions? Thoughts? Hate the arbitrary 100 constant? Leave your thoughts in the comments and let’s discuss.
OR turnaround time is a classic opportunity for quality improvement in hospitals. The surgeons typically say it takes way too long to clean and prepare the ORs. The materials management and housekeeping staff often add that they’re doing everything they can to go as quickly as possible–without sacrificing their safety or doing a bad job for the patient. Anesthesia colleagues may add that they too are going as fast as possible while completely preparing the rooms and maintaining patient safety. However, the rest of administration will remind the team of an estimated cost of OR time so as to put a face on the costs associated with that downtime when no one is operating in the ORs. I’ve seen these range from as low as $50/minute to as high as $100/min!
Here’s a classic quality improvement project
Here, then, is a classic project that involves many stakeholders, shared OR governance, and an obvious opportunity to decrease what many hospitals consider non-value added time (VAT). I bet it’s a project that your healthcare system has performed before, will perform soon, or is eyeing as a potential for significant quality improvement.
And you know what? Even if you’ve gotten this challenging project done in your healthcare system, the issue may not be behind you my friend. Let me tell you why…
Once upon a time, at one hospital, the goal of an important quality improvement project was to reduce that turnaround time in the operating rooms. And wow had it ever worked!
The team had adopted a clear definition of turnaround time, and had used a DMAIC project to significantly decrease that time–it was almost like a NASCAR pit crew in there. It was safe, orchestrated, complete, and really helped the rest of the staff improve OR flow. The time required to turnover a room had also become much more predictable, and this decreased variation in turnover times was also a big help to patient flow and scheduling.
The team used several classic tools, including a spaghetti diagram to decrease wasted motion by the “pit crew” team, a kanban inventory system, and a visual control board to notify all of the players in the process (Anesthesia, Surgery, Pre-op Nursing, & the holding room) when the operating room was ready to go. They saved days worth of wasted motion (time spent walking) for the OR prep crew when projected out over a year’s worth of turnovers. The OR staff could complete about one extra case per room per day. Truly amazing.
…but only three months later, the turnaround time had crept back up again to where it had been before the changes–a median of 25 minutes per case.
Good quality projects never die. And if you plan them right, they don’t even fade away. –Anonymous
Nobody noticed, at first, that the turnaround times were slowing down from great to just pretty good again, until one day the OR got very backed up because a couple of turnarounds took 40 minutes. The Chief Surgeon wasn’t happy and didn’t hesitate to tell anyone she could how she felt.
What had kept the gains from being sustained? (You’ve probably seen these culprits before.) It was a combination of factors. Two new people started in the OR; one longtime employee in the facilities-services department had retired. The new people weren’t educated all that well about the turnaround system, and they also didn’t know exactly where everything was yet. But that wasn’t the real problem.
Failure is much more likely when there’s no control plan
In fact, the quality-improvement team hadn’t built a control plan into the system. The first sign they may have had a problem was when the Chief Surgeon fired off an angry e-mail to the rest administration and most of the staff. The signal should’ve come much earlier, when the variation in turnover times increased unexpectedly. That signal could’ve been noticed weeks before.
How? The team could’ve used an ImR control chart (more on that here) to notice that the range of times for room turnover had gone out of control. The team could’ve had someone, a process owner like the OR administration, positioned to sound the alarm that the process needed to be solidified when, weeks earlier, several other turnovers took an unexpectedly long time.
Fortunately, in this case, the project team recovered. They quickly deployed an ImR chart and also reviewed their data. The Chief Surgeon had been correct: yes, those cases did take an unexpectedly long time when viewed in the context of the OR’s data. A root cause analysis was performed and the quality team quickly realized that several issues lined up to make those times take so much longer.
After addressing the issues, the team was back in full swing only a week or two later. The pit crew was back at it, and the NASCAR-like precision had returned.
The lesson: creation of a control phase plan to maintain the good work you & the team have done is an essential part of quality improvement projects. Without an excellent control plan, it is very difficult to maintain the improvements you’ve made as a foundation for future improvements. Failure to plan a control phase is, unfortunately, planning to fail.
Have you ever worked at a hospital that wanted to improve its ED throughput? I bet you have, because almost all do! Here’s a story of how advanced quality tools lead a team to find at least one element that added 20 minutes to almost every ED stay…
Once upon a time…
At one hospital where I worked, a problem with admission delays in the emergency department led us far astray when we tried to solve it intuitively. In fact, we made the situation worse. Patients were spending too much time in the emergency room after the decision to admit them was made. There was a lot of consternation about why it took so long and why we were routinely running over the hospital guidelines for admission. We had a lot of case-by-case discussion, trying to pinpoint where the bottleneck was. Finally, we decided to stop discussing and start gathering data.
Follow a patient through the value stream…
We did a prospective study and had one of the residents walk through the system. The observer watched each step in the system after the team mapped out exactly what the system was. What we discovered was that a twenty-minute computer delay was built into the process for almost every patient that came through the ED.
The doctor would get into the computer system and admit the patient, but the software took twenty minutes to tell the patient-transport staff that it was time to wheel the patient upstairs. That was a completely unexpected answer. We had been sitting around in meetings trying to figure out why the admission process took too long. We were saying things like, “This particular doctor didn’t make a decision in a timely fashion.” Sometimes that was actually true, but not always. It took using statistical tools and a walk through the process to understand at least one hidden fact that cost almost every patient 20 minutes of waiting time. It’s amazing how much improvement you can see when you let the data (not just your gut) guide process improvement.
The issue is not personal
We went to the information-technology (IT) people and showed them the data. We asked what we could do to help them fix the problem. By taking this approach, instead of blaming them for creating the problem, we turned them into stakeholders. They were able to fix the software issue, and we were able to shave twenty minutes off most patients’ times in the ER. Looking back, we should probably have involved the IT department from the start!
Significant decrease in median wait time and variance of wait times
Fascinatingly, not only did the median time until admission decrease, but the variation in times decreased too. (We made several changes to the system, all based on the stakeholders’ suggestions.) In the end, we had a much higher quality system on our hands…all thanks to DMAIC and the data…
I was recently part of a team that was trying to decide how well residents in our hospital were supervised. The issue is important, because residency programs are required to have excellent oversight to maintain their certification. Senior physicians are supposed to supervise the residents as the residents care for patients. There are also supposed to be regular meetings with the residents and meaningful oversight during patient care. We had to be able to show accrediting agencies that supervision was happening effectively. Everyone on the team, myself included, felt we really did well with residents in terms of supervision. We would answer their questions, we’d help them out with patients in the middle of the night, we’d do everything we could to guide them in providing safe, excellent patient care. At least we thought we did . . . .
We’d have meetings and say, “The resident was supervised because we did this with them and we had that conversation about a patient.” None of this was captured anywhere; it was all subjective feelings on the part of the senior medical staff. The residents, however, were telling us that they felt supervision could have been better in the overnight shifts and also in some other specific situations. Still, we (especially the senior staff doing the supervising) would tell ourselves in the meetings, “We’re doing a good job. We know we’re supervising them well.”
We weren’t exactly lying to ourselves. We were supervising the residents pretty well. We just couldn’t demonstrate it in the ways that mattered, and we were concerned about any perceived lack in the overnight supervision. We were having plenty of medical decision-making conversations with the residents and helping them in all the ways we were supposed to, but we didn’t have a critical way to evaluate our efforts in terms of demonstrating how we were doing or having something tangible to improve.
When I say stop lying to ourselves, I mean that we tend to self-delude into thinking that things are OK, even when they’re not. How would we ever know? What changes our ability to think about our performance? Data. When good data tell us, objectively and without question, that something has to change–well, at least we are more likely to agree. Having good data prevents all of us from thinking we’re above average . . . a common misconception.
To improve our resident supervision, we first had to agree it needed improvement. To reach that point, we had to collect data prospectively and review it. But before we even thought about data collection, we had to deal with the unspoken issue of protection. We had to make sure all the attending physicians knew they were protected against being blamed, scapegoated, or even fired if the data turned out to show problems. We had to reassure everyone that we weren’t looking for someone to blame. We were looking for ways to make a good system better. There are ways to collect data that are anonymous. The way we chose did not include which attending or resident was involved at each data point. That protection was key (and is very important in quality improvement projects in healthcare) to allowing the project to move ahead.
I’ve found that it helps to bring the group to the understanding that, because we are so good, data collection on the process will show us that we’re just fine—maybe even that we are exceptionally good. Usually, once the data are in, that’s not the case. On the rare occasion when the system really is awesome, I help the group to go out of its way to celebrate and to focus on what can be replicated in other areas to get that same level of success.
When we collected the data on resident supervision, we asked ourselves the Five Whys. Why do we think we may not be supervising residents well? Why? What tells us that? The documentation’s not very good. Why is the documentation not very good? We can’t tell if it doesn’t reflect what we’re doing or if we don’t have some way to get what we’re doing on the chart. Why don’t we have some way to get it on the chart? Well, because . . . .
If you ask yourself the question “why” five times, chances are you’ll get to the root cause of why things are the way they are. It’s a tough series of questions. It requires self-examination. You have to be very honest and direct with yourself and your colleagues. You also have to know some of the different ways that things can be—you have to apply your experience and get ideas from others to see what is not going on in your system. Some sacred cows may lose their lives in the process. Other times you run up against something missing from a system (absence) rather than presence of something like a sacred cow. What protections are not there? As the saying goes, if your eyes haven’t seen it, your mind can’t know it.
As we asked ourselves the Five Whys, we asked why we felt we were doing a good job but an outsider wouldn’t be able to tell. We decided that the only way an outsider could ever know that we were supervising well was to make sure supervision was thoroughly documented in the patient charts.
The next step was to collect data on our documentation to see how good it was. We decided to rate it on a scale of one to five. One was terrible: no sign of any documentation of decision-making or senior physician support in the chart. Five was great: we can really see that what we said was happening, happened.
We focused on why the decision-making process wasn’t getting documented in the charts. There were lots of reasons: Because it’s midnight. Because we’re not near a computer. Because we were called away to another patient. Because the computers were down. Because the decision was complicated and it was difficult to record it accurately.
We developed a system for scoring the charts that I felt was pretty objective. The data were gathered prospectively; names were scrubbed, because we didn’t care which surgeon it was and we didn’t want to bias the scoring. To validate the scoring, we used a Gage Reproducibility and Reliability test, which (among other things) helps determine how much variability in the measurement system is caused by differences between operators. We chose thirty charts at random and had three doctors check them and give them a grade with the new system. Each doctor was blinded to the chart they rated (as much as you could be) and rated each chart three times. We found that most charts were graded at 2 or 2.5.
Once we were satisfied that the scoring system was valid, we applied it prospectively and scored a sample of charts according to the sample size calculation we had performed. Reading the chart to see if it documented supervision correctly only took about a second. We found, again, our score was about 2.5. That was little dismaying, because it showed we weren’t doing as well as we thought, although we weren’t doing terribly, either.
Then we came up with interventions that we thought would improve the score. We made poka-yoke changes—changes that made it easier to do the right thing without having to think about it. In this case, the poka-yoke answer was to make it easier to document resident oversight and demonstrate compliance with Physicians At Teaching Hospitals (PATH) rules; the changes made it harder to avoid documenting actions. By making success easier, we saw the scores rise to 5 and stay there. We added standard language and made it easy to access in the electronic medical record. We educated the staff. We demonstrated how, and why, it was easier to do the right thing and use the tool instead of skipping the documentation and getting all the work that resulted when the documentation was not present.
The project succeeded extremely well because we stopped lying to ourselves. We used data and the Five Whys to see that what we told ourselves didn’t align with what was happening. We didn’t start with the assumption that we were lying to ourselves. We thought we were doing a good job. We talked about what a good job looked like, how we’d know if we were doing a good job, and so on, but what really helped us put data on the questions was using a fishbone diagram. We used the diagram to find the six different factors of special cause variation…
Want to read more about how the team used the tools of statistical process control to vastly improve resident oversight? Read more about it in the Amazon best-seller: Volume To Value here.
Yup, Healthcare is going through a major transition and we all know it. Whether you’ve followed along with the blog, or even if you haven’t, you probably know that Health & Human Services is transitioning us to a focus on value delivered to patients rather than volume of services we deliver in healthcare. If you haven’t heard exactly what’s coming, look here.
So, in order to help prepare, I’m sharing tools and experiences with quality improvement that lead to improvements in value delivered to patients. Take a look at Volume to Value, coming soon on Amazon.
Now, more than ever, a clear focus on well-known quality improvement tools is paramount for success.
Have you ever wondered how to work with non-normal data sets? Data that are not normally distributed can prevent special problems, and this podcast helps explain how to deal with that special situation in healthcare quality improvement. Audio podcast version here or click the icon above. Video podcast version here.