Showing posts with label effect size. Show all posts
Showing posts with label effect size. Show all posts

Sunday, August 27, 2017

Just Do As I (Vaguely) Say: The Folly of Clinical Practice Guidelines

If you didn't care to know anything about finance, and you hired a financial adviser (paid hourly, not through commissions, of course) you would be happy to have him simply tell you to invest all of your assets into a Vanguard life cycle fund.  But you may then be surprised that a different adviser told one of your contemporaries that the approach was oversimple and that you should have several classes of assets in your portfolio that are not included in the life cycle funds, such as gold or commodities.  In light of the discrepancies, you may conclude that to make the best economic choices for yourself, you need to understand finance and the data upon which the advisers are basing their recommendations.

Making medical decisions optimally is akin to making economic decisions and is founded on a simple framework:  EUT, or Expected Utility Theory.  To determine whether to pursue a course of action versus another one, we add up the benefits of a course multiplied by their probability of accruing (that product is the positive utility of the course of action) and then subtract the product of the costs of the course of action and their probability of accruing (the negative utility).  If utility is positive, we pursue a course of action, and if options are available, we pursue the course with the highest positive utility.  Ideally, anybody helping you navigate such a decision framework would tell you the numbers so you could do the calculus.  Using the finance analogy again, if the adviser told you "Stocks have positive returns.  So do bonds.  Stocks are riskier than bonds" - without any quantification, you may conclude that a portfolio full of bonds is the best course of action - and usually it is not.

I regret to report that that is exactly what clinical practice guideline writers do:  provide summary information without any numerical data to support it, leaving the practitioner with two choices:

  1. Just do as the guideline writer says
  2. Go figure it out for herself with a primary data search

Thursday, April 6, 2017

Why Most True Research Findings Are Useless

In his provocative essay in PLOS Medicine over a decade ago, Ioannidis argued that most published research findings are false, owing to a variety of errors such as p-hacking, data dredging, fraud, selective publication, researcher degrees of freedom, and many more.  In my permutation of his essay, I will go a step further and suggest that even if we limit our scrutiny to tentatively true research findings (scientific truth being inherently tentative), most research findings are useless.

My choice of the word "useless" may seem provocative, and even untenable, but it is intended to have an exquisitely specific meaning:  I mean useless in an economic sense of "having zero or negligible net utility", in the tradition of Expected Utility Theory [EUT], for individual decision making.  This does not mean that true findings are useless for the incremental accrual of scientific knowledge and understanding.  True research findings may be very valuable from the perspective of scientific progress, but still useless for individual decision making, whether it is the individual trying to determine what to eat to promote a long healthy life, or the physician trying to decide what to do for a patient in the ICU with delirium.  When evaluating a research finding that is thought to be true, and may at first blush seem important and useful, it is necessary to make a distinction between scientific utility and decisional utility.  Here I will argue that while many "true" research findings may have scientific utility, they have little decisional utility, and thus are "useless".

Friday, May 31, 2013

Over Easy? Trials of Prone Positioning in ARDS

Published May 20 in the  NEJM to coincide with the ATS meeting is the (latest) Guerin et al study of Prone Positioning in ARDS.  The editorialist was impressed.  He thinks that we should start proning patients similar to those in the study.  Indeed, the study results are impressive:  a 16.8% absolute reduction in mortality between the study groups with a corresponding P-value of less than 0.001.  But before we switch our tastes from sunny side up to over easy (or in some cases, over hard - referred to as the "turn of death" in ICU vernacular) we should consider some general principles as well as about a decade of other studies of prone positioning in ARDS.

First, a general principle:  regression to the mean.  Few, if any, therapies in critical care (or in medicine in general) confer a mortality benefit this large.  I refer the reader (again) to our study of delta inflation which tabulated over 30 critical care trials in the top 5 medical journals over 10 years and showed that few critical care trials show mortality deltas (absolute mortality differences) greater than 10%.   Almost all those that do are later refuted.  Indeed it was our conclusion that searching for deltas greater than or equal to 10% is akin to a fool's errand, so unlikely is the probability of finding such a difference.  Jimmy T. Sylvester, my attending at JHH in late 2001 had already recognized this.  When the now infamous sentinel trail of intensive insulin therapy (IIT) was published, we discussed it at our ICU pre-rounds lecture and he said something like "Either these data are faked, or this is revolutionary."  We now know that there was no revolution (although many ICUs continue to practice as if there had been one).  He could have just as easily said that this is an anomaly that will regress to the mean, that there is inherent bias in this study, or that "trials stopped early for benefit...."

Monday, March 10, 2008

The CORTICUS Trial: Power, Priors, Effect Size, and Regression to the Mean

The long-awaited results of another trial in critical care were published in a recent NEJM: (http://content.nejm.org/cgi/content/abstract/358/2/111). Similar to the VASST trial, the CORTICUS trial was "negative" and low dose hydrocortisone was not demonstrated to be of benefit in septic shock. However, unlike VASST, in this case the results are in conflict with an earlier trial (Annane et al, JAMA, 2002) that generated much fanfare and which, like the Van den Berghe trial of the Leuven Insulin Protocol, led to widespread [and premature?] adoption of a new therapy. The CORTICUS trial, like VASST, raises some interesting questions about the design and interpretation of trials in which short-term mortality is the primary endpoint.

Jean Louis Vincent presented data at this year's SCCM conference with which he estimated that only about 10% of trials in critical care are "positive" in the traditional sense. (I was not present, so this is basically hearsay to me - if anyone has a reference, please e-mail me or post it as a comment.) Nonetheless, this estimate rings true. Few are the trials that show a statistically significant benefit in the primary outcome, fewer still are trials that confirm the results of those trials. This begs the question: are critical care trials chronically, consistently, and woefully underpowered? And if so, why? I will offer some speculative answers to these and other questions below.

The CORTICUS trial, like VASST, was powered to detect a 10% absolute reduction in mortality. Is this reasonable? At all? What is the precedent for a 10% ARR in mortality in a critical care trial? There are few, if any. No large, well-conducted trials in critical care that I am aware of have ever demonstrated (least of all consistently) a 10% or greater reduction in mortality of any therapy, at least not as a PRIMARY PROSPECTIVE OUTCOME. Low tidal volume ventilation? 9% ARR. Drotrecogin-alfa? 7% ARR in all-comers. So I therefore argue that all trials powered to detect an ARR in mortality of greater than 7-9% are ridiculously optimistic, and that the trials that spring from this unfortunate optimism are woefully underpowered. It is no wonder that, as JLV purportedly demonstrated, so few trials in critical care are "positive". The prior probability is is exceedingly low that ANY therapy will deliver a 10% mortality reduction. The designers of these trials are, by force of pragmatic constraints, rolling the proverbial trial dice and hoping for a lucky throw.

Then there is the issue of regression to the mean. Suppose that the alternative hypothesis (Ha) is indeed correct in the generic sense that hydrocortisone does beneficially influence mortality in septic shock. Suppose further that we interpret Annane's 2002 data as consistent with Ha. In that study, a subgroup of patients (non-responders) demonstrated a 10% ARR in mortality. We should be excused for getting excited about this result, because after all, we all want the best for our patients and eagerly await the next breaktrough, and the higher the ARR, the greater the clinical relevance, whatever the level of statistical significance. But shouldn't we regard that estimate with skepticism since no therapy in critical care has ever shown such a large reduction in mortality as a primary outcome? Since no such result has ever been consistently repeated? Even if we believe in Ha, shouldn't we also believe that the 10% Annane estimate will regress to the mean on repeated trials?

It may be true that therapies with robust data behind them become standard practice, equipoise dissapates, and the trials of the best therapies are not repeated - so they don't have a chance to be confirmed. But the knife cuts both ways - if you're repeating a trial, it stands to reason that the data in support of the therapy are not that robust and you should become more circumspect in your estimates of effect size - taking prior probability and regression to the mean into account.

Perhaps we need to rethink how we're powering these trials. And funding agencies need to rethink the budgets they will allow for them. It makes little sense to spend so much time, money, and effort on underpowered trials, and to establish the track record that we have established where the majority of our trials are "failures" in the traditional sence and which all include a sentence in the discussion section about how the current results should influence the design of subsequent trials. Wouldn't it make more sense to conduct one trial that is so robust that nobody would dare repeat it in the future? One that would provide a definitive answer to the quesiton that is posed? Is there something to be learned from the long arc of the steroid pendulum that has been swinging with frustrating periodicity for many a decade now?

This is not to denigrate in any way the quality of the trials that I have referred to. The Canadian group in particular as well as other groups (ARDSnet) are to be commended for producing work of the highest quality which is of great value to patients, medicine, and science. But in keeping with the advancement of knowledge, I propose that we take home another message from these trials - we may be chronically underpowering them.