Showing posts with label Bayesian. Show all posts
Showing posts with label Bayesian. Show all posts

Thursday, December 26, 2024

No, CXR for Pediatric Pneumonia Does NOT have a 98% Negative Predictive Value


I was reading the current issue of the NEJM today and got to the article called Chest Radiography for Presumed Pneumonia in Children - it caught my attention as a medical decision making article. It's of the NEJM genre that poses a clinical quandary, and then asks two discussants each to defend a different management course. (Another memorable one was on whether to treat subsegmental PE.) A couple of things struck me about the discussants' remarks about CXRs for kids with possible pneumonia. The first discussant says that "a normal chest radiograph reliably rules out a diagnosis of pneumonia." That is certainly not true in adults where the CXR has on the order of 50-70% sensitivity for opacities or pneumonia. So I wondered if kids are different from adults. The second discussant then remarked that CXR has a 98% negative predictive value for pneumonia in kids. This number really should get your attention. Either the test is very very sensitive and specific, or the prior probability in the test sample was very low, something commonly done to inflate the reported number. (Or, worse, the number is wrong.) I teach trainees to always ignore PPV and NPV in reports and seek out the sensitivity and specificity, as they cannot be fudged by selecting a high or low prevalence population. It then struck me that this question of whether or not to get a CXR for PNA in kids is a classic problem in medical decision making that traces its origins to Ledley and Lusted (Science, 1959) and Pauker and Kassirer's Threshold Approach to Medical Decision Making. Surprisingly, neither discussant made mention of or reference to that perfectly applicable framework (but they did self-cite their own work). Here is the Threshold Approach applied to the decision to get a CT scan for PE (Klein, 2004) that is perfectly analogous to the pediatric CXR question. I was going to write a letter to the editor pointing out that 44 years ago the NEJM published a landmark article establishing a rational  framework for analyzing just this kind of question, but I decided to dig deeper and take a look at this 2018 paper in Pediatrics that both discussants referenced as the source for the NPV of 98% statistic.

In order to calculate the 98% NPV, we need to look a the n=683 kids in the study and see which cells they fall into in a classic epidemiological 2x2 table. The article's Figure 2 is the easiest way to get those numbers:



(Note that they exclude n=42 kids who were treated with antibiotics for other conditions despite not being diagnosed with pneumonia; I'm honestly unsure what else to do with those kids, so like the authors, I exclude them in the 2x2 tables below.) Here is a refresher on a 2x2 contingency table:


Here is the 2x2 table we can construct using the numbers from Figure 2 in the paper, before the follow-up of the 5 kids that were diagnosed with pneumonia 2 weeks later:



And here is the 2x2 table that accounts for the 5 kids that were initially called "no pneumonia" but were diagnosed with pneumonia within the next two weeks. Five from cell "d" (bottom right) must be moved to cell "c" (bottom left) because they were CXR-/PNA- kids that were moved into the CXR-/PNA+ column after the belated diagnosis:



The PPV has fallen trivially from 90% to 89%, but why are both so far away from the authors' claim of 98%? Because the authors conveniently ignored the 44 kids with an initially negative CXR that were nonetheless called PNA by the physicians in cell "c". They surely should be counted because, despite a negative CXR, they were still diagnosed with PNA, just 2 weeks earlier than the 5 that the authors concede were false negatives; there is no reason to make a distinction between these two groups of kids, as they are all clinically diagnosed pneumonia with a "falsely negative" CXR (cell "c").

It is peculiar, - rather, astonishing - that the NPV in this study, still being touted and referenced as a pivot for decision making, was miscalculated despite peer review. And while you may be tempted to say that 89% is pretty close to 98%, you would be making a mistake. Using the final sensitivity and specificity from this 2x2 table, we can calculate LR+ and LR- for CXR as a test for PNA: they are 10.8 and 0.26. We can also see from this table that the rate (some may say "prevalence") of PNA in this sample is 32%. What is the posterior probability of PNA based on the "correct" numbers if the pre-test probability (or the rate or prevalence of pneumonia) is 65% instead of 32%? The calculator in the Status Iatrogenicus sidebar can be used to easily calculate it: the NPV in that case is 68%, and of course 1-NPV (the output of the calculator, chosen to emphasize the residual probability of disease in the presence of a negative test) is 32%. Pneumonia in that circumstance is still far above the treatment threshold. By that I mean, if my child had probability of pneumonia of 32%, I would want them treated. (Because antibiotics are pretty benign, bro; resistance happens at the poultry farm.)

There are more fundamental problems. Like child abuse studies, there is a circular logic here: the kid has pneumonia because the doc says he has pneumonia, but the doc knows the CXR shows "pneumonia"; but then teh diagnosis of PNA leads to the CXR finding being classified as a true positive. How many of the pneumonia diagnoses were true/false positives/negatives? We can't know because we have no gold standard for pneumonia, just as we have no gold standard for child abuse - we are guessing which cells the numbers go in. This points to another violation of basic Bayesian assumptions: there must be conditional independence between the results of the test and the presence or absence of the disease in question. Here, there is very clearly dependence because the docs are making the pneumonia determination on the basis of the CXR. The study design is fundamentally flawed, and so are all conclusions that ramify from it.

I'm always a little surprised when I go digging into the studies that people bandy about as "evidence" for this and that, as I frequently find that they're either misunderstood, misrepresented, or just plain wrong. I can readily imagine a pediatrician (resident or attending) telling me with high confidence that the CXR can "rule out" pneumonia in my kid, because her attendings told her that on the basis of the 2018 Lipsett study, and yet none of them ever looked any deeper into the actual study to find its obvious mistakes and shortcomings.

As they say, "Trust, but verify." Or perhaps more apropos here: "Extraordinary claims require extraordinary evidence." An NPV of 98% (for CXR!) is an extraordinary claim indeed. The evidence for it, however, is not extraordinary. As a trusted mentor once told me "Scott, don't believe everything you read."

ETA: You can get a 98% NPV using the sensitivity and specificity from the Lipsett data (despite the erroneous assumptions that inhere in them) by using a prevalence of pneumonia of just 7%. To wit: if you want to get to a posterior probability of PNA of 2% (corresponding to the reported 98% NPV in the Lipsett study), you need to start with a population in which only 7 of 100 kids has pneumonia, and you need to do a CXR on all of them, to reduce it by 5 kids so that only 2 of them have PNA. 100 CXRs later, pneumonia cases in the cohort are reduced from 7 cases to 2. Is it worth it to do 100 CXRs to avoid 5 courses of antibiotics? We could make a formal Threshold analysis to answer this question, but apparently that was not the point of the "Clinical Decisions" section of this week's NEJM; rather, it was to highlight reference 1, which turns out to have conclusions based on a miscalculation.

Thursday, April 25, 2019

The EOLIA ECMO Bayesian Reanalysis in JAMA

A Hantavirus patient on ECMO, circa 2000
Spoiler alert:  I'm a Bayesian decision maker (although maybe not a Bayesian trialist) and I "believe" in ECMO as documented here.

My letter to the editor of JAMA was published today (and yeah I know, I write too many letters, but hey, I read a lot and regular peer review often doesn't cut it) and even when you come at them like a spider monkey, the authors of the original article still get the last word (and they deserve it - they have done far more work than the post-publication peer review hecklers with their quibbles and their niggling letters.)

But to set some thing clear, I will need some more words to elucidate some points about the study's interpretation.  The authors' response to my letter has five points.
  1. I (not they) committed confirmation bias, because I postulated harm from ECMO.  First, I do not have a personal prior for harm from ECMO, I actually think it is probably beneficial in properly selected patients, as is well documented in the blog post from 2011 describing my history of experience with it in hantavirus, and as well in a book chapter I wrote in Cardiopulmonary Bypass Principles and Practice circa 2006.  There is irony here - I "believe in" ECMO, I just don't think their Bayesian reanalysis supports my (or anybody's) beliefs in a rational way!  The point is that it was a post hoc unregistered Bayesian analysis after a pre-registered frequentist study which was "negative" (for all that's worth and not worth), and the authors clearly believe in the efficacy of ECMO as do I.  In finding shortcomings in their analysis, I seek to disconfirm or at least challenge no only their but my own beliefs.  And I think that if the EOLIA trial had been positive, that we would not be publishing Bayesian reanalyses showing how the frequentist trial may be a type I error.  We know from long experience that if EOLIA had been "positive" that success would have been declared for ECMO as it has been with prone positioning for ARDS.  (I prone patients too.)  The trend is to confirm rather than to disconfirm, but good science relies more on the latter.
  2. That a RR of 1.0 for ECMO is a "strongly skeptical" prior.  It may seem strong from a true believer standpoint, but not from a true nonbeliever standpoint.  Those are the true skeptics (I know some, but I'll not mention names - I'm not one of them) who think that ECMO is really harmful on the net, like intensive insulin therapy (IIT) probably is.  Regardless of all the preceding trials, if you ask the NICE-SUGAR investigators, they are likely to maintain that IIT is harmful.  Importantly, the authors skirt the issue of the emphasis they place on the only longstanding and widely regarded as positive ARDS trial (of low tidal volume).  There are three decades of trials in ARDS patients, scores of them, enrolling tens of thousands of patients, that show no effect of the various therapies.  Why would we give primacy to the the one trial which was positive, and equate ECMO to low tidal volume?  Why not equate it to high PEEP, or corticosteroids for ARDS?  A truly skeptical prior would have been centered on an aggregate point estimate and associated distribution of 30 years of all trials in ARDS of all therapies (the vast majority of them "negative").  The sheer magnitude of their numbers would narrow the width of the prior distribution with RR centered on 1.0 (the "severely skeptical" one), and it would pull the posterior more towards zero benefit, a null result.  Indeed, such a narrow prior distribution may have shown that low tidal volume is an outlier and likely to be a false positive (I won't go any farther down that perilous path).  The point is, even if you think a RR of 1.0 is severely skeptical, the width of the distribution counts for a lot too, and the uninitiated are likely to miss that important point.
  3. Priors are not used to "boost" the effect of ECMO.  (My original letter called it a Bayesian boost, borrowing from Mayo, but the adjective was edited out.) Maybe not always, but that was the effect in this case, and the respondents did not cite any examples of a positive frequentist result that was reanalyzed with Bayesian methods to "dampen" the observed effect.  It seems to only go one way, and that's why I alluded to confirmation bias.  The "data-driven priors" they published were tilted towards a positive result, as described above.
  4. Evidence and beliefs.  But as Russell said "The degree to which beliefs are based on evidence is very much less than believers suppose."  I support Russell's quip with the aforementioned.
  5. Judgment is subjective, etc.  I would welcome a poll, in the spirit of crowdsourcing, as we did here to better understand what the community thinks about ECMO (my guess is it's split ratherly evenly, with a trend, perhaps strong, for the efficacy of ECMO).  The authors' analysis is laudable, but it is not based on information not already available to the crowd; rather it transforms it in ways may not be transparent to the crowd and may magnify it in a biased fashion if people unfamiliar with Bayesian methods do not scrutinize the chosen prior distributions.

Wednesday, December 23, 2015

Narrated and Abridged: There is (No) Evidence for That: Epistemic Problems in Critical Care Medicine

Below is the narrated video of my powerpoint presentation on Epistemic Problems in Critical Care Medicine, which provides a framework for understanding why we have both false positives and false negatives in clinical trials in critical care medicine and why we should be circumspect about our "evidence base" and our "knowledge".  This is not trivial stuff, and is worth the 35 minutes required to watch the narration of the slideshow.  It is a provocative presentation which gives compelling reasons to challenge our "evidence base" in critical care and medicine in general, in ways that are not widely recognized but perhaps should be, with several suggestions about assumptions that need to be challenged and revised to make our models of reality more reliable.  Please contact me if you would like me to give an iteration of this presentation at your institution.