Medical Evidence Blog: regression to the mean

Showing posts with label regression to the mean. Show all posts

Thursday, March 20, 2014

Sepsis Bungles: The Lessons of Early Goal Directed Therapy

On March 18th, the NEJM published early online three original trials of therapies for the critically ill that will serve as fodder for several posts. Here, I focus on the ProCESS trial of protocol guided therapy for early septic shock. This trial is in essence a multicenter version of the landmark 2001 trial of Early Goal Directed Therapy (EGDT) for severe sepsis by Rivers et al. That trial showed a stunning 16% absolute reduction in mortality in sepsis attributed to the use of a protocol based on physiological goals for hemodynamic management. That absolute reduction in mortality is perhaps the largest for any therapy in critical care medicine. If such a reduction were confirmed, it would make EGDT the single most important therapy in the field. If such reduction cannot be confirmed, there are several reasons why the Rivers results may have been misleading:

As I have blogged in the case of intensive insulin therapy, Single center studies inflate treatment effects when compared to multicenter studies for reasons that are unclear, but which may be related to bias especially in unblinded studies. (The revelation that Rivers was an investor in one of the devices used in the trial raised additional concerns about bias.)
Regression to the mean may lead to reduced effect sizes when trials are repeated, especially when the index trial has a very large effect size. In a similar vein, since large absolute mortality reductions are statistically unlikely in critical care medicine, Bayesian inference means that trials reporting large reductions are likely to represent type I statistical errors.

There were other concerns about the Rivers study and how it was later incorporated into practice, but I won't belabor them here. The ProCESS trial randomized about 1350 patients among three groups, one simulating the original Rivers protocol, one to a modified Rivers protocol, and one representing "standard care" that is, care directed by the treating physician without a protocol. The study had 80% power to demonstrate a mortality reduction of 6-7%. Before you read further, please wager, will the trial show any statistically significant differences in outcome that favor EGDT or protocolized care?

Over Easy? Trials of Prone Positioning in ARDS

Published May 20 in the NEJM to coincide with the ATS meeting is the (latest) Guerin et al study of Prone Positioning in ARDS. The editorialist was impressed. He thinks that we should start proning patients similar to those in the study. Indeed, the study results are impressive: a 16.8% absolute reduction in mortality between the study groups with a corresponding P-value of less than 0.001. But before we switch our tastes from sunny side up to over easy (or in some cases, over hard - referred to as the "turn of death" in ICU vernacular) we should consider some general principles as well as about a decade of other studies of prone positioning in ARDS.

First, a general principle: regression to the mean. Few, if any, therapies in critical care (or in medicine in general) confer a mortality benefit this large. I refer the reader (again) to our study of delta inflation which tabulated over 30 critical care trials in the top 5 medical journals over 10 years and showed that few critical care trials show mortality deltas (absolute mortality differences) greater than 10%. Almost all those that do are later refuted. Indeed it was our conclusion that searching for deltas greater than or equal to 10% is akin to a fool's errand, so unlikely is the probability of finding such a difference. Jimmy T. Sylvester, my attending at JHH in late 2001 had already recognized this. When the now infamous sentinel trail of intensive insulin therapy (IIT) was published, we discussed it at our ICU pre-rounds lecture and he said something like "Either these data are faked, or this is revolutionary." We now know that there was no revolution (although many ICUs continue to practice as if there had been one). He could have just as easily said that this is an anomaly that will regress to the mean, that there is inherent bias in this study, or that "trials stopped early for benefit...."

The CORTICUS Trial: Power, Priors, Effect Size, and Regression to the Mean

The long-awaited results of another trial in critical care were published in a recent NEJM: (http://content.nejm.org/cgi/content/abstract/358/2/111). Similar to the VASST trial, the CORTICUS trial was "negative" and low dose hydrocortisone was not demonstrated to be of benefit in septic shock. However, unlike VASST, in this case the results are in conflict with an earlier trial (Annane et al, JAMA, 2002) that generated much fanfare and which, like the Van den Berghe trial of the Leuven Insulin Protocol, led to widespread [and premature?] adoption of a new therapy. The CORTICUS trial, like VASST, raises some interesting questions about the design and interpretation of trials in which short-term mortality is the primary endpoint.

Jean Louis Vincent presented data at this year's SCCM conference with which he estimated that only about 10% of trials in critical care are "positive" in the traditional sense. (I was not present, so this is basically hearsay to me - if anyone has a reference, please e-mail me or post it as a comment.) Nonetheless, this estimate rings true. Few are the trials that show a statistically significant benefit in the primary outcome, fewer still are trials that confirm the results of those trials. This begs the question: are critical care trials chronically, consistently, and woefully underpowered? And if so, why? I will offer some speculative answers to these and other questions below.

The CORTICUS trial, like VASST, was powered to detect a 10% absolute reduction in mortality. Is this reasonable? At all? What is the precedent for a 10% ARR in mortality in a critical care trial? There are few, if any. No large, well-conducted trials in critical care that I am aware of have ever demonstrated (least of all consistently) a 10% or greater reduction in mortality of any therapy, at least not as a PRIMARY PROSPECTIVE OUTCOME. Low tidal volume ventilation? 9% ARR. Drotrecogin-alfa? 7% ARR in all-comers. So I therefore argue that all trials powered to detect an ARR in mortality of greater than 7-9% are ridiculously optimistic, and that the trials that spring from this unfortunate optimism are woefully underpowered. It is no wonder that, as JLV purportedly demonstrated, so few trials in critical care are "positive". The prior probability is is exceedingly low that ANY therapy will deliver a 10% mortality reduction. The designers of these trials are, by force of pragmatic constraints, rolling the proverbial trial dice and hoping for a lucky throw.

Then there is the issue of regression to the mean. Suppose that the alternative hypothesis (Ha) is indeed correct in the generic sense that hydrocortisone does beneficially influence mortality in septic shock. Suppose further that we interpret Annane's 2002 data as consistent with Ha. In that study, a subgroup of patients (non-responders) demonstrated a 10% ARR in mortality. We should be excused for getting excited about this result, because after all, we all want the best for our patients and eagerly await the next breaktrough, and the higher the ARR, the greater the clinical relevance, whatever the level of statistical significance. But shouldn't we regard that estimate with skepticism since no therapy in critical care has ever shown such a large reduction in mortality as a primary outcome? Since no such result has ever been consistently repeated? Even if we believe in Ha, shouldn't we also believe that the 10% Annane estimate will regress to the mean on repeated trials?

It may be true that therapies with robust data behind them become standard practice, equipoise dissapates, and the trials of the best therapies are not repeated - so they don't have a chance to be confirmed. But the knife cuts both ways - if you're repeating a trial, it stands to reason that the data in support of the therapy are not that robust and you should become more circumspect in your estimates of effect size - taking prior probability and regression to the mean into account.

Perhaps we need to rethink how we're powering these trials. And funding agencies need to rethink the budgets they will allow for them. It makes little sense to spend so much time, money, and effort on underpowered trials, and to establish the track record that we have established where the majority of our trials are "failures" in the traditional sence and which all include a sentence in the discussion section about how the current results should influence the design of subsequent trials. Wouldn't it make more sense to conduct one trial that is so robust that nobody would dare repeat it in the future? One that would provide a definitive answer to the quesiton that is posed? Is there something to be learned from the long arc of the steroid pendulum that has been swinging with frustrating periodicity for many a decade now?

This is not to denigrate in any way the quality of the trials that I have referred to. The Canadian group in particular as well as other groups (ARDSnet) are to be commended for producing work of the highest quality which is of great value to patients, medicine, and science. But in keeping with the advancement of knowledge, I propose that we take home another message from these trials - we may be chronically underpowering them.

Medical Evidence Blog

Thursday, March 20, 2014

Sepsis Bungles: The Lessons of Early Goal Directed Therapy

Friday, May 31, 2013

Over Easy? Trials of Prone Positioning in ARDS

Monday, March 10, 2008

The CORTICUS Trial: Power, Priors, Effect Size, and Regression to the Mean