Maybe not perfect, but still worthwhile.
It has been an educational experience to examine the Hospital Safety Scores released last month by the well respected Leapfrog Group. Leapfrog was originally organized by large employers and payers in an effort to insure that the large sums they were shelling out for healthcare were in fact buying something worth paying for. Such efforts have been incorporated into our national health policy.
The results for Louisville were not as good as we would all have liked, and for some hospitals worse than they would like to have to defend. Kentucky as a state ranked in the middle. Twenty per cent of scored Kentucky hospitals received an A, and in this regard as a state we ranked 28th. The American Hospital Association wrote a very critical letter in defense of their members and speculated publicly how it could even be possible that Yale-New Haven Hospital, one of the most famous teaching hospitals in the world, only was awarded a C.
For both academic reasons to explore the robustness of the Safety Scores, and because I too was surprised by some of the results in Kentucky, I undertook to analyze in more detail the individual measurements underlying the composite letter scores. I entered the individual scores for all four Louisville hospitals, as well as two of the several hospitals in Kentucky that received a Safety Score of A: the teaching hospital in Madisonville which I hold in high regard, and the Appalachian Regional Hospital in Harlan which has much been in the news lately because of its legal struggles over Medicaid Managed Care. Because the AHA had made an issue about Yale- New Haven, I included both it and St. Raphael, the other major hospital in New Haven. To provide comparison for UofL, I included the University of Kentucky Hospital.
I was particularly interested in what happened when a hospital did not participate in Leapfrog’s own Hospital Survey, the primary source of the first 10 components that included all of the Process and Structural Measures. (Only 10 of Kentucky’s 100+ hospitals were willing to participate in Leapfrogs evaluation.) I also wanted to see how the hospitals did on more objective Outcome Measures that were taken from Medicare’s Hospital Compare. This latter database comprises the core of most national attempts to assess hospital quality and its contents are currently being used to both reward and penalize hospitals for their performance. Obviously the stakes are high that the measurements actually mean something and are relatively impervious to hospital’s attempts to game the system. I do not yet have a strong opinion on either matter, but neither do I believe the effort is worthless.
I abstracted the actual component scores from the Hospital Safety Score website and entered them into a spreadsheet for easier comparison. You can review it yourself here. Each column represents a hospital, and each row the individual scores for all 26 measurements. I highlighted in green the best measurement in each row, and in red, the worst. Each second-worst I highlighted in orange. These 9 hospitals are an insufficient number on which to make any firm statistical comparisons but for the purposes of this summary article, I offer the following observations:
How much of a penalty was it not to participate in Leapfrog’s Survey?
Only the individual Norton Hospitals participated in the Survey. When a hospital chose not participate, it received a “Not Available” score for some ten items. A hospital was not penalized for this missing data, but it meant that the remaining items were given greater weight. Thus, ARH Harlan was given its A on the basis of only 10 of 26 items. Because it received scores of excellent or better-than-average on these scores, it got a very high aggregate score. Comparison with the New Haven hospitals is instructive. Neither of those completed Leapfrog’s survey. Most of Yale- New Haven’s remaining outcome scores were below the national average earning it a C. Saint. Raphael Hospital, which got better numbers than Yale in almost all the measures, received an A.
If a hospital failed both to complete Leapfrog’s Survey and got terrible scores, it got a bad grade as did both University Hospitals of Kentucky. I cannot render an opinion on whether participating in the Leapfrog Survey might improve a hospital’s chances. Only one of my comparison hospitals participated. Norton had worse than the national average scores in 9 of the 16 outcome measures, and above average in only 4. It received a C, the lowest discrete score awarded in this first iteration of the Safety Scores. I do not know how many combined D and Fs were awarded in the category “Score Pending.”.
Why do our principal University Hospitals appear to be so bad?
This demands additional study. Sadly for Kentucky, our two “flagship” teaching hospitals did very poorly in the outcomes measures. Most of the lowest and second lowest scores in my comparison group were claimed by UofL and UK Hospitals. We knew this was already likely for UofL, but UK was a disappointing surprise to me. Something is rotten in the state of Kentucky. Bucks for Brains (a.k.a. Earmarks for Business) may not have translated over into Doctors or Teachers for Bucks. Before another dollar of state education money is given out, our governments owe it to the rest of us to honestly and independently evaluate how our Flagship Universities have been using taxpayer money. I fear we are paying a heavy price for misdirected priorities.
I grant that it is conceivably possible that teaching hospitals will inherently do worse on such measures due to their patient populations or the fact that much care is provided by the least experienced physicians. I will ask Leapfrog if they can help me gain access to data that will address this issue specifically. Even a superficial review reveals that many large teaching and safety net hospitals with more problems than ours could receive an A.
Most hospitals neither all bad or all good.
Highs and lows were distributed among my 9 comparison hospitals. Every hospital held at least one of the extremes, some more than others. Some numbers seemed almost too good or too bad to be true! Jewish Hospital & St Mary’s scored in the worst one percent of all 2652 evaluated hospitals in the percent of their patients who had respiratory failure after surgery. Baptist East had higher than average numbers of surgical incisions that opened up again post-operatively. ARH Harlan had some perfect scores. If I were these hospitals and/or designing a better healthcare system, I would want to check to make sure the data was being collected properly, and if it was, trying to learn what was being done right or wrong so all can benefit.
Are all the measures worthwhile collecting?
I confess to having my doubts. They were selected by experts in the field. I am willing to be convinced. Take for instance “Air Embolism” which occurs when a major intravenous line is allowed to suck air into a person’s circulatory system. I am not even sure how such a thing would be observed. In any event, the national average is zero per 1000 patients and none of my 9 hospitals had any during the most recent reporting year. Is it worthwhile counting this event? I do not have enough knowledge to render an opinion. Foreign bodies (i.e. clamp or sponge) left inside a person after an operation are also relatively rare with an average of 2 per 100,000. Sixty-eight % of all hospitals will have a retained-object rate of between 0.5 and 9 per 100,000 patients. Although rare, even a single instance is, in theory, completely preventable when proper protocol and checklists are used. I can see why this item might be retained because it addresses how a hospital follows its own guidelines– if indeed it has any.
I worried before about how valuable some of these national measures are when I began to look at mortality rates after a myocardial infarction. (“Turn here for your best chance of surviving a heart attack!“) Of all the several thousand hospitals in the nation that were reporting, even after the best possible risk adjustment for disease severity was made, it could be stated that only a handful of hospitals had more deaths than were statistically predicted. Cardiac mortality is not part of the Safety Scores. However, if that is the technical state of affairs, it is worth asking the question, “is all the time and expense of collecting the data worth it?” I would like very much to be able to have an opinion on this!
When multiple hospitals are lumped together.
Although Leapfrog allows individual hospitals in systems to take its hospital survey separately, most ot the safety indicators come from Medicare which lumps together all hospitals that use a single provider number. Thus, the four Norton Hospitals (5 with Kosair Children’s) are reported as if they are one, as are Jewish Hospital and St. Mary’s. There is no reason to believe that the respective sister hospitals are equally safe or effective and in my opinion they are not. Thus the value of the Safety Score to the public is diminished. The good work of one hospital may be dragged down by a weaker sibling. [I am sure there are other business reasons, but hospitals may decide to share a medicare provider number to maximize their federal revenue from graduate medical education payments or disproportionate share bonuses for the care of the uninsured. Falling to the least common denominator in quality reports may be an undesirable consequence if the reimbursement system is gamed in this way.]
Is a “perfect” score always better?
The short answer is absolutely not! For example, one of the safety scores is the frequency of a pneumothorax (collapsed lung) following a medical procedure such as placing a central intra-venous line in the jugular or subclavian vein. This is frequently necessary for emergency resuscitation or critical care treatment. The vein is very close to the plural space where the lung is and is why a chest Xray is often taken following the procedure. Anyone who says they have never missed the vein and caused a collapsed lung may be lying to themselves and to us. Obviously it is better to have as low a rate as possible, but if it is too low, then it becomes likely that the hospital and its doctors are not placing enough central lines to meet the needs of their patients! A teaching hospital in which such procedures are commonly done by trainees may look bad in this quality measurement and others in which experience and skill count, and where senior physicians may be thin on the ground.
Potential problems using billing data for quality measurements.
Some if not most of the outcome measures gathered by Medicare come from the diagnoses on the bills submitted by the hospitals. This has the advantage of collecting all the available information and much can be learned. However, hospitals have a long and not completely untarnished history of manipulating their bills to maximize their reimbursement. For example, hospitals get paid more when their patients are sicker or when complications are present. I learned this first-hand when I did a study of hospital charges for simple pneumonia in Kentucky. An entire consulting business sprang up to teach hospitals how to make their patients seem as sick and complicated as possible to maximize revenue. I would like to say that such practices no longer occur, but I cannot. Obviously the tension between maximizing revenue while at the same time reporting clinical data accurately to reflect true quality and efficiency is still with us. It will be as long as we doctors and hospitals get paid by the case.
How important is cost?
Note that none of the above makes reference to how much is being charged by hospitals. Indeed, the worst scoring hospital in Louisville is also the most expensive. It is an article, still of faith, that better medical care will turn out to be economically more efficient if not cheaper. This is a contemporary presumption that to my knowledge remains to be proven. Even so, a search for more effective medical care that is also safely and efficiently delivered is a quest worth making.
So what should we make of all this? Would I personally consider these Safety Scores in any way if I had a choice of where to go? Yes! These scores, rightly or wrongly, confirm my other knowledge that there are hospitals I would rather not use. Would the scores be my only consideration? No! I still value the recommendation of my physicians whom I trust. Am I glad that the discussion is out in the open? Absolutely! Indeed, I believe that is all that Leapfrog is asking of us. What is absolutely certain about this process is that no matter how much money hospitals and other medical providers put into their advertising budgets, they can no longer make unsubstantiated marketing claims about their services at the neglect of their medical competence. I regret that some feel they will have to justify themselves in the bright light of outside review that they may not feel is justified. I am angry that some plaintiff attorneys are already using these Safety Scores to encourage people to consult them. This however is the world we have built for ourselves, not the system that I nor most of you would prefer. There are better ways than the hyper-competitive, uncoordinated, wasteful, expensive, and yes, unsafe healthcare system we have been frightened into accepting.
What do you think? Do you care?
Peter Hasselbacher, MD
Emeritus Professor of Medicine, UofL
July 3, 2012