Showing posts with label prior probability. Show all posts
Showing posts with label prior probability. Show all posts

Thursday, February 4, 2016

Diamox Results in Urine: General and Specific Lessons from the DIABOLO Acetazolamide Trial

The trial of acetazolamide to reduce duration of mechanical ventilation in COPD patients was published in JAMA this week.  I will use this trial to discuss some general principles about RCTs and make some comments specific to this trial.

My arguable but strong prior belief, before I even read the trial, is that Diamox (acetazolamide) is ineffectual in acute and chronic respiratory failure, or that it is harmful.  Its use is predicated on a "normalization fallacy" which guides practitioners to try attempt to achieve euboxia (normal numbers).  In chronic respiratory acidosis, the kidneys conserve bicarbonate to maintain normal pH.  There was a patient we saw at OSU in about 2008 who had severe COPD with a PaCO2 in the 70s and chronic renal failure with a bicarbonate under 20.  A well-intentioned but misguided resident checked an ABG and the patient's pH was on the order of 7.1.  We (the pulmonary service) were called to evaluate the patient for MICU transfer and intubation, and when we arrived we found him sitting at the bedside comfortably eating breakfast.  So it would appear that if the kidneys can't conserve enough bicarbonate to maintain normal pH, patients can get along with acidosis, but obviously evolution has created systems to maintain normal pH.  Why you would think that interfering with this highly conserved system to increase minute ventilation in a COPD patient you are trying to wean is beyond the reach of my imagination.  It just makes no sense.

This brings us to a major problem with a sizable proportion of RCTs that I read:  the background/introduction provides woefully insufficient justification for the hypothesis that the RCT seeks to test.  In the background of this paper, we are sent to references 4-14.  Here is a summary of each:

4.)  A review of metabolic alkalosis in a general population of critically ill patients
5.)  An RCT of acetazolamide for weaning COPD patients showing that it doesn't work
6.)  Incidence of alkalosis in hospitalized patients in 1980
7.)  A 1983 translational study to delineate the effect of acetazolamide on acid base parameters in 10 paitnets
8.)  A 1982 study of hemodynamic parameters after acetazolamide administration in 12 patients
9.)  A study of metabolic and acid base parameters in 14 patients with cystic fibrosis 
10.) A retrospective epidemiological descriptive study of serum bicarbonate in a large cohort of critically ill patients
11.)  A study of acetazolamide in anesthetized cats
12 - 14).  Commentary and pharmacodynamic studies of acetazolamide by the authors of the current study

Saturday, October 11, 2014

Enrolling Bad Patients After Good: Sunk Cost Bias and the Meta-Analytic Futility Stopping Rule

Four (relatively) large critical care randomized controlled trials were published early in the NEJM in the last week.  I was excited to blog on them, but then I realized they're all four old news, so there's nothing to blog about.  But alas, the fact that there is no news is the news.

In the last week, we "learned" that more transfusion is not helpful in septic shock, that EGDT (the ARISE trial) is not beneficial in sepsis, that simvastatin (HARP-2 trial) is not beneficial in ARDS, and that parental administration of nutrition is not superior to enteral administration in critical illness.  Any of that sound familiar?

I read the first two articles, then discovered the last two and I said to myself "I'm not reading these."  At first I felt bad about this decision, but then that I realized it is a rational one.  Here's why.

Sunday, April 6, 2014

Underperforming the Market: Why Researchers are Worse than Professional Stock Pickers and A Way Out

I was reading in the NYT yesterday a story about Warren Buffet and how the Oracle of Omaha has trailed the S&P 500 for four of the last five years.  It was based on an analysis done by a statistician who runs a blog called Statistical Ideas, which has a post on p-values that links to this Nature article a couple of months back that describes how we can be misled by P-values.  And all of this got me thinking.

We have a dual problem in medical research:  a.)  of conceiving alternative hypotheses which cannot be confirmed in large trials free of bias;  and b.) not being able to replicate the findings of positive trials.  What are the reasons for this?

Wednesday, October 24, 2012

A Centrum a Day Keeps the Cancer at Bay?


Alerted as usual by the lay press to the provocative results of a non-provocative study, I read with interest the article in the October 17th JAMA by Gaziano and colleagues: Multivitamins in the Prevention of Cancer in Men. From the lay press descriptions (see: NYT summary and a less sanguine NYT article published a few days later,) I knew only that it was a positive (statistically significant) study, that the reduction in cancer observed was 8%, that a multivitamin (Centrum Silver) was used, and the study population included 14,000 male physicians.

Needless to say, in spite of a dormant hope something so simple could prevent cancer, I was skeptical. Despite decades, perhaps eons of enthusiasm for the use of vitamins, minerals, and herbal remedies, there is, to my knowledge (please, dear reader, direct me to the data if this is an omission) no credible evidence of a durable health benefit from taking such supplements in the absence of deficiency. But supplements have a lure that can beguile even the geniuses among us (see: Linus Pauling). So before I read the abstract and methods to check for the level of statistical significance, the primary endpoint, the number of endpoints, and sources of bias, I asked myself: "What is the probability that taking a simple commercially available multivitamin can prevent cancer?" and "what kind of P-value or level of statistical significance would I require to believe the result?" Indeed, if you have not yet seen the study, you can ask yourself those same questions now.

Thursday, May 24, 2012

Fever, external cooling, biological precedent, and the epistemology of medical evidence

It is rare occasion that one article allows me to review so many aspects of the epistemology of medical evidence, but alas Schortgen et al afforded me that opportunity in the May 15th issue of AJRCCM.

The issues raised by this article are so numerous that I shall make subsections for each one. The authors of this RCT sought to determine the effect of external cooling of febrile septic patients on vasopressor requirements and mortality. Their conclusion was that "fever control using external cooling was safe and decreased vasopressor requirements and early mortality in septic shock." Let's explore the article and the issues it raises and see if this conclusion seems justified and how this study fits into current ICU practice.

PRIOR PROBABILITY, BIOLOGICAL PLAUSIBILITY, and BIOLOGICAL PRECEDENTS

These are related but distinct issues that are best considered both before a study is planned, and before its report is read. A clinical trial is in essence a diagnostic test of a hypothesis, and like a diagnostic test, its influence on what we already know depends not only on the characteristics of the test (sensitivity and specificity in a diagnostic test; alpha and power in the case of a clinical trial) but also on the strength of our prior beliefs. To quote Sagan [again], "extraordinary claims require extraordinary evidence." I like analogies of extremes: no trial result is sufficient to convince the skeptical observer that orange juice reduces mortality in sepsis by 30%; and no evidence, however cogently presented, is sufficient to convince him that the sun will not rise tomorrow. So when we read the title of this or any other study, we should pause to ask: What is my prior belief that external cooling will reduce mortality in septic shock? That it will reduce vasopressor requirements?

Saturday, September 5, 2009

Troponin I, Troponin T, Troponin is the Woe of Me

As a critical care physician, I have not infrequently been called to the emergency department to admit a patient on the basis of "abnormal laboratory tests" with no synthesis, no assimilation of the various results into any semblance of a unifying diagnosis. It is bad enough that patients' chests are no longer ausculted, respiratory rates and patterns not noted, neck veins not examined, etc. It is worse that the portable chest film (often incorrectly interpreted), the arterial blood gas (also often incorrectly interpteted), and the BNP level have supplanted any sort of logical and systematic approach to diagnosing a patient's problem. If we are going to replace physical examination with BNPs and d-dimers, we should at least insist that practitioners have one iota of familiarity with Bayes' Theorem and pre-test probabilities and the proper interpretation of test results.

Thus I raised at least one brow slightly on August 27th when the NEJM reported two studies of highly sensitive troponin assays for the "early diagnosis of myocardial infarction" (wasn't troponin sensitive enough already? see: http://content.nejm.org/cgi/content/abstract/361/9/858 and http://content.nejm.org/cgi/content/abstract/361/9/868 ). Without commenting on the studies' methodological quality specifically, I will emphasize some pitfalls and caveats related to the adoption of this "advance" in clinical practice, especially that outside of the setting of an appropriately aged person with risk factors who presents to an acute care setting with SYMPTOMS SUGGESTIVE OF MYOCARDIAL INFARCTION.

In such a patient, say a 59 year old male with hypertension, diabetes and a family history of coronary artery disease, who presents to the ED with chest pain, we (and our cardiology colleagues) are justified in having a high degree of confidence in the results of this test based on these and a decade or more of other data. But I suspect that only the MINORITY of cardiac troponin tests at my institution are ordered for that kind of indication. Rather, it is used as a screening test for just about any patient presenting to the ED who is ill enough to warrant admission. And that's where the problem has its roots. Our confidence in the diagnostic accuracy of this test in the APPROPRIATE SETTING (read appropriate clinical pre-test probability) should not extend to other scenarios, but all too often it does, and it makes a real conundrum when it is positive in those other scenarios. Here's why.
Suppose that we have a pregnancy test that is evaluated in women who have had a sexual encounter and who have missed two menstrual periods and it is found to be 99.9% sensitive and 99.9% specific. (I will bracket for now the possibility that you could have a 100% sensitive and/or specific test.) Now suppose that you administer this test to 10,000 MEN. Does a positive test mean that a man is pregnant? Heavens No! He probably has testicular cancer or some other malady. This somewhat silly example is actually quite useful to reinforce the principle that no matter how good a test is, if it is not used appropriately or in the appropriate scenario that the results are likely to be misleading. Likewise, consider this test's use in a woman who has not missed a menstrual cycle - does a negative test mean that she is not pregnant? Perhaps not, since the sensitivity was determined in a population that had missed 2 cycles. If a woman were obviously24 weeks pregnant and the test was negative, what would we think? It is important to bear in mind that these tests are NOT direct tests for the conditions we seek to diagnose, but are tests of ASSOCIATED biological phenomena, and insomuch as our understanding of those phenomena is limited or there is variation in them, the tests are liable to be fallible. A negative test in a woman with a fetus in utero may mean that the sample was mishandled, that the testing reagents were expired, that there is an interfering antibody, etc. Tests are not perfect, and indeed are highly prone to be misleading if not used in the appropriate clinical scenario.

And thus we return to cardiac troponins. In the patients I'm called to admit to the ICU who have sepsis, PE, COPD, pneumonia, respiratory failure, renal failure, metabolic acidosis, a mildly positive troponin which is a COMMON occurrence is almost ALWAYS an epiphenomenon of critical illness rather than an acute myocardial infarction. Moreover, the pursuit of diagnosis via cardiac catheterization or the empiric treatment with antiplatelet agents and anticoagulants almost always is a therapeutic misadventure in these patients who are at much greater risk of bleeding and renal failure via these interventions which are expected to have a much reduced positive utility for them. More often than not, I would just rather not know the results of a troponin test outside the setting of isolated acute chest pain. Other practitioners should be acutely aware of the patient populations in which these tests are performed, and the significant limitations of using these highly sensitive tests in other clinical scenarios.

Monday, March 10, 2008

The CORTICUS Trial: Power, Priors, Effect Size, and Regression to the Mean

The long-awaited results of another trial in critical care were published in a recent NEJM: (http://content.nejm.org/cgi/content/abstract/358/2/111). Similar to the VASST trial, the CORTICUS trial was "negative" and low dose hydrocortisone was not demonstrated to be of benefit in septic shock. However, unlike VASST, in this case the results are in conflict with an earlier trial (Annane et al, JAMA, 2002) that generated much fanfare and which, like the Van den Berghe trial of the Leuven Insulin Protocol, led to widespread [and premature?] adoption of a new therapy. The CORTICUS trial, like VASST, raises some interesting questions about the design and interpretation of trials in which short-term mortality is the primary endpoint.

Jean Louis Vincent presented data at this year's SCCM conference with which he estimated that only about 10% of trials in critical care are "positive" in the traditional sense. (I was not present, so this is basically hearsay to me - if anyone has a reference, please e-mail me or post it as a comment.) Nonetheless, this estimate rings true. Few are the trials that show a statistically significant benefit in the primary outcome, fewer still are trials that confirm the results of those trials. This begs the question: are critical care trials chronically, consistently, and woefully underpowered? And if so, why? I will offer some speculative answers to these and other questions below.

The CORTICUS trial, like VASST, was powered to detect a 10% absolute reduction in mortality. Is this reasonable? At all? What is the precedent for a 10% ARR in mortality in a critical care trial? There are few, if any. No large, well-conducted trials in critical care that I am aware of have ever demonstrated (least of all consistently) a 10% or greater reduction in mortality of any therapy, at least not as a PRIMARY PROSPECTIVE OUTCOME. Low tidal volume ventilation? 9% ARR. Drotrecogin-alfa? 7% ARR in all-comers. So I therefore argue that all trials powered to detect an ARR in mortality of greater than 7-9% are ridiculously optimistic, and that the trials that spring from this unfortunate optimism are woefully underpowered. It is no wonder that, as JLV purportedly demonstrated, so few trials in critical care are "positive". The prior probability is is exceedingly low that ANY therapy will deliver a 10% mortality reduction. The designers of these trials are, by force of pragmatic constraints, rolling the proverbial trial dice and hoping for a lucky throw.

Then there is the issue of regression to the mean. Suppose that the alternative hypothesis (Ha) is indeed correct in the generic sense that hydrocortisone does beneficially influence mortality in septic shock. Suppose further that we interpret Annane's 2002 data as consistent with Ha. In that study, a subgroup of patients (non-responders) demonstrated a 10% ARR in mortality. We should be excused for getting excited about this result, because after all, we all want the best for our patients and eagerly await the next breaktrough, and the higher the ARR, the greater the clinical relevance, whatever the level of statistical significance. But shouldn't we regard that estimate with skepticism since no therapy in critical care has ever shown such a large reduction in mortality as a primary outcome? Since no such result has ever been consistently repeated? Even if we believe in Ha, shouldn't we also believe that the 10% Annane estimate will regress to the mean on repeated trials?

It may be true that therapies with robust data behind them become standard practice, equipoise dissapates, and the trials of the best therapies are not repeated - so they don't have a chance to be confirmed. But the knife cuts both ways - if you're repeating a trial, it stands to reason that the data in support of the therapy are not that robust and you should become more circumspect in your estimates of effect size - taking prior probability and regression to the mean into account.

Perhaps we need to rethink how we're powering these trials. And funding agencies need to rethink the budgets they will allow for them. It makes little sense to spend so much time, money, and effort on underpowered trials, and to establish the track record that we have established where the majority of our trials are "failures" in the traditional sence and which all include a sentence in the discussion section about how the current results should influence the design of subsequent trials. Wouldn't it make more sense to conduct one trial that is so robust that nobody would dare repeat it in the future? One that would provide a definitive answer to the quesiton that is posed? Is there something to be learned from the long arc of the steroid pendulum that has been swinging with frustrating periodicity for many a decade now?

This is not to denigrate in any way the quality of the trials that I have referred to. The Canadian group in particular as well as other groups (ARDSnet) are to be commended for producing work of the highest quality which is of great value to patients, medicine, and science. But in keeping with the advancement of knowledge, I propose that we take home another message from these trials - we may be chronically underpowering them.