Showing posts with label Bayes theorem. Show all posts
Showing posts with label Bayes theorem. Show all posts

Thursday, December 26, 2024

No, CXR for Pediatric Pneumonia Does NOT have a 98% Negative Predictive Value


I was reading the current issue of the NEJM today and got to the article called Chest Radiography for Presumed Pneumonia in Children - it caught my attention as a medical decision making article. It's of the NEJM genre that poses a clinical quandary, and then asks two discussants each to defend a different management course. (Another memorable one was on whether to treat subsegmental PE.) A couple of things struck me about the discussants' remarks about CXRs for kids with possible pneumonia. The first discussant says that "a normal chest radiograph reliably rules out a diagnosis of pneumonia." That is certainly not true in adults where the CXR has on the order of 50-70% sensitivity for opacities or pneumonia. So I wondered if kids are different from adults. The second discussant then remarked that CXR has a 98% negative predictive value for pneumonia in kids. This number really should get your attention. Either the test is very very sensitive and specific, or the prior probability in the test sample was very low, something commonly done to inflate the reported number. (Or, worse, the number is wrong.) I teach trainees to always ignore PPV and NPV in reports and seek out the sensitivity and specificity, as they cannot be fudged by selecting a high or low prevalence population. It then struck me that this question of whether or not to get a CXR for PNA in kids is a classic problem in medical decision making that traces its origins to Ledley and Lusted (Science, 1959) and Pauker and Kassirer's Threshold Approach to Medical Decision Making. Surprisingly, neither discussant made mention of or reference to that perfectly applicable framework (but they did self-cite their own work). Here is the Threshold Approach applied to the decision to get a CT scan for PE (Klein, 2004) that is perfectly analogous to the pediatric CXR question. I was going to write a letter to the editor pointing out that 44 years ago the NEJM published a landmark article establishing a rational  framework for analyzing just this kind of question, but I decided to dig deeper and take a look at this 2018 paper in Pediatrics that both discussants referenced as the source for the NPV of 98% statistic.

In order to calculate the 98% NPV, we need to look a the n=683 kids in the study and see which cells they fall into in a classic epidemiological 2x2 table. The article's Figure 2 is the easiest way to get those numbers:



(Note that they exclude n=42 kids who were treated with antibiotics for other conditions despite not being diagnosed with pneumonia; I'm honestly unsure what else to do with those kids, so like the authors, I exclude them in the 2x2 tables below.) Here is a refresher on a 2x2 contingency table:


Here is the 2x2 table we can construct using the numbers from Figure 2 in the paper, before the follow-up of the 5 kids that were diagnosed with pneumonia 2 weeks later:



And here is the 2x2 table that accounts for the 5 kids that were initially called "no pneumonia" but were diagnosed with pneumonia within the next two weeks. Five from cell "d" (bottom right) must be moved to cell "c" (bottom left) because they were CXR-/PNA- kids that were moved into the CXR-/PNA+ column after the belated diagnosis:



The PPV has fallen trivially from 90% to 89%, but why are both so far away from the authors' claim of 98%? Because the authors conveniently ignored the 44 kids with an initially negative CXR that were nonetheless called PNA by the physicians in cell "c". They surely should be counted because, despite a negative CXR, they were still diagnosed with PNA, just 2 weeks earlier than the 5 that the authors concede were false negatives; there is no reason to make a distinction between these two groups of kids, as they are all clinically diagnosed pneumonia with a "falsely negative" CXR (cell "c").

It is peculiar, - rather, astonishing - that the NPV in this study, still being touted and referenced as a pivot for decision making, was miscalculated despite peer review. And while you may be tempted to say that 89% is pretty close to 98%, you would be making a mistake. Using the final sensitivity and specificity from this 2x2 table, we can calculate LR+ and LR- for CXR as a test for PNA: they are 10.8 and 0.26. We can also see from this table that the rate (some may say "prevalence") of PNA in this sample is 32%. What is the posterior probability of PNA based on the "correct" numbers if the pre-test probability (or the rate or prevalence of pneumonia) is 65% instead of 32%? The calculator in the Status Iatrogenicus sidebar can be used to easily calculate it: the NPV in that case is 68%, and of course 1-NPV (the output of the calculator, chosen to emphasize the residual probability of disease in the presence of a negative test) is 32%. Pneumonia in that circumstance is still far above the treatment threshold. By that I mean, if my child had probability of pneumonia of 32%, I would want them treated. (Because antibiotics are pretty benign, bro; resistance happens at the poultry farm.)

There are more fundamental problems. Like child abuse studies, there is a circular logic here: the kid has pneumonia because the doc says he has pneumonia, but the doc knows the CXR shows "pneumonia"; but then teh diagnosis of PNA leads to the CXR finding being classified as a true positive. How many of the pneumonia diagnoses were true/false positives/negatives? We can't know because we have no gold standard for pneumonia, just as we have no gold standard for child abuse - we are guessing which cells the numbers go in. This points to another violation of basic Bayesian assumptions: there must be conditional independence between the results of the test and the presence or absence of the disease in question. Here, there is very clearly dependence because the docs are making the pneumonia determination on the basis of the CXR. The study design is fundamentally flawed, and so are all conclusions that ramify from it.

I'm always a little surprised when I go digging into the studies that people bandy about as "evidence" for this and that, as I frequently find that they're either misunderstood, misrepresented, or just plain wrong. I can readily imagine a pediatrician (resident or attending) telling me with high confidence that the CXR can "rule out" pneumonia in my kid, because her attendings told her that on the basis of the 2018 Lipsett study, and yet none of them ever looked any deeper into the actual study to find its obvious mistakes and shortcomings.

As they say, "Trust, but verify." Or perhaps more apropos here: "Extraordinary claims require extraordinary evidence." An NPV of 98% (for CXR!) is an extraordinary claim indeed. The evidence for it, however, is not extraordinary. As a trusted mentor once told me "Scott, don't believe everything you read."

ETA: You can get a 98% NPV using the sensitivity and specificity from the Lipsett data (despite the erroneous assumptions that inhere in them) by using a prevalence of pneumonia of just 7%. To wit: if you want to get to a posterior probability of PNA of 2% (corresponding to the reported 98% NPV in the Lipsett study), you need to start with a population in which only 7 of 100 kids has pneumonia, and you need to do a CXR on all of them, to reduce it by 5 kids so that only 2 of them have PNA. 100 CXRs later, pneumonia cases in the cohort are reduced from 7 cases to 2. Is it worth it to do 100 CXRs to avoid 5 courses of antibiotics? We could make a formal Threshold analysis to answer this question, but apparently that was not the point of the "Clinical Decisions" section of this week's NEJM; rather, it was to highlight reference 1, which turns out to have conclusions based on a miscalculation.

Sunday, February 16, 2020

Misunderstanding and Misuse of Basic Clinical Decision Principles among Child Abuse Pediatricians

The previous post about Dr. Cox, ensnared in a CPT (Child Protection Team) witch hunt in Wisconsin, has led me to evaluate several more research reports on child abuse, including SBS (shaken baby syndrome), AHT (abusive head trauma), and sentinel injuries.  These reports are rife with critical assumptions, severe limitations, and gross errors which greatly limit the resulting conclusions in most studies I have reviewed.  However, one study that was pointed out to me today  takes the cake.  I don't know what the prevalence of this degree of misunderstanding is, but CPTs and child abuse pediatricians need make sure they have a proper understanding of sensitivity, specificity, positive and negative predictive value, base rates, etc.  And they should not be testifying about the probability of child abuse at all if they don't have this stuff down cold. And I think this means that some proportion of them needs to go back to school or stop testifying.

The article and associated correspondence at issue is entitled The Positive Predictive Value of Rib Fractures as an Indicator of Nonaccidental Trauma in Children published in 2004.  The authors looked at a series of rib fractures in children at a single Trauma Center in Colorado during a six year period and identified all patients with a rib fracture.  They then restricted their analysis to children less than 3 years of age.  There were 316 rib fractures among just 62 children in the series; the average number of rib fractures per child is ~5.  The proper unit of analysis for a study looking at positive predictive value is children, sorted into those with and without abuse, and with and without rib fracture(s) as seen in the 2x2 tables below.

Thursday, February 4, 2016

Diamox Results in Urine: General and Specific Lessons from the DIABOLO Acetazolamide Trial

The trial of acetazolamide to reduce duration of mechanical ventilation in COPD patients was published in JAMA this week.  I will use this trial to discuss some general principles about RCTs and make some comments specific to this trial.

My arguable but strong prior belief, before I even read the trial, is that Diamox (acetazolamide) is ineffectual in acute and chronic respiratory failure, or that it is harmful.  Its use is predicated on a "normalization fallacy" which guides practitioners to try attempt to achieve euboxia (normal numbers).  In chronic respiratory acidosis, the kidneys conserve bicarbonate to maintain normal pH.  There was a patient we saw at OSU in about 2008 who had severe COPD with a PaCO2 in the 70s and chronic renal failure with a bicarbonate under 20.  A well-intentioned but misguided resident checked an ABG and the patient's pH was on the order of 7.1.  We (the pulmonary service) were called to evaluate the patient for MICU transfer and intubation, and when we arrived we found him sitting at the bedside comfortably eating breakfast.  So it would appear that if the kidneys can't conserve enough bicarbonate to maintain normal pH, patients can get along with acidosis, but obviously evolution has created systems to maintain normal pH.  Why you would think that interfering with this highly conserved system to increase minute ventilation in a COPD patient you are trying to wean is beyond the reach of my imagination.  It just makes no sense.

This brings us to a major problem with a sizable proportion of RCTs that I read:  the background/introduction provides woefully insufficient justification for the hypothesis that the RCT seeks to test.  In the background of this paper, we are sent to references 4-14.  Here is a summary of each:

4.)  A review of metabolic alkalosis in a general population of critically ill patients
5.)  An RCT of acetazolamide for weaning COPD patients showing that it doesn't work
6.)  Incidence of alkalosis in hospitalized patients in 1980
7.)  A 1983 translational study to delineate the effect of acetazolamide on acid base parameters in 10 paitnets
8.)  A 1982 study of hemodynamic parameters after acetazolamide administration in 12 patients
9.)  A study of metabolic and acid base parameters in 14 patients with cystic fibrosis 
10.) A retrospective epidemiological descriptive study of serum bicarbonate in a large cohort of critically ill patients
11.)  A study of acetazolamide in anesthetized cats
12 - 14).  Commentary and pharmacodynamic studies of acetazolamide by the authors of the current study

Wednesday, December 23, 2015

Narrated and Abridged: There is (No) Evidence for That: Epistemic Problems in Critical Care Medicine

Below is the narrated video of my powerpoint presentation on Epistemic Problems in Critical Care Medicine, which provides a framework for understanding why we have both false positives and false negatives in clinical trials in critical care medicine and why we should be circumspect about our "evidence base" and our "knowledge".  This is not trivial stuff, and is worth the 35 minutes required to watch the narration of the slideshow.  It is a provocative presentation which gives compelling reasons to challenge our "evidence base" in critical care and medicine in general, in ways that are not widely recognized but perhaps should be, with several suggestions about assumptions that need to be challenged and revised to make our models of reality more reliable.  Please contact me if you would like me to give an iteration of this presentation at your institution.


Wednesday, October 7, 2015

Early Mobility in the ICU: The Trial That Should Not Be

I learned via twitter yesterday that momentum is building to conduct a trial of early mobility in critically ill patients.  While I greatly respect many of the investigators headed down this path, forthwith I will tell you why this trial should not be done, based on principles of rational decision making.

A trial is a diagnostic test of a hypothesis, a complicated and costly test of a hypothesis, and one that entails risk.  Diagnostic tests should not be used indiscriminately.  That the RCT is a "Gold Standard" in the hierarchy of testing hypotheses does not mean that we should hold it sacrosanct, nor does it follow that we need a gold standard in all cases.  Just like in clinical medicine, we should be judicious in our ordering of diagnostic tests.

The first reason that we should not do a trial of early mobility (or any mobility) in the ICU is because in the opinion of this author, experts in critical care, and many others, early mobility works.  We have a strong prior probability that this is a beneficial thing to be doing (which is why prominent centers have been doing it for years, sans RCT evidence).  When the prior probability is high enough, additional testing has decreasing yield and risks false negative results if people are not attuned to the prior.  Here's my analogy - a 35 year old woman with polycystic kidney disease who is taking birth control presents to the ED after collapsing with syncope.  She had shortness of breath and chest pain for 12 hours prior to syncope.  Her chest x-ray is clear and bedside ultrasound shows a dilated right ventricle.  The prior probability of pulmonary embolism is high enough that we don't really need further testing, we give anticoagulants right away.  Even if a V/Q scan (creatnine precludes CT) is "low probability" for pulmonary embolism, we still think she has it because the prior probability is so high.  Indeed, the prior probability is so high that we're willing to make decisions without further testing, hence we gave heparin.  This process follows the very rational Threshold Approach to Decision Making approach proposed by Pauker and Kasirrer in the NEJM in 1980, which is basically a reformulation of VonNeumann and Morganstern's Expected Utility Theory to adapt it to medical decisions.  Distilled it states in essence, "when you get to a threshold probability of disease where the benefits of treatment exceed the risks, you treat."  And so let it be with early mobility.  We already think the benefits exceed the risks, which is why we're doing it.  We don't need a RCT.  As I used to ask the housestaff over and over until I was cyanotic: "How will the results of that test influence what you're going to do?"

Notice that this logical approach to clinical decision making shines a blinding light upon "evidence based medicine" and the entire enterprise of testing hypotheses with frequentist methods that are deaf to prior probabilities.  Can you imagine using V/Q scanning to test for PE without prior probabilities?  Can you imagine what a mess you would find yourself in with regard to false negatives and false positives?  You would be the neophyte medical student who thinks "test positive, disease present; test negative, disease absent."  So why do we continue ad nauseum in critical care medicine to dismiss prior probabilities and decision thresholds and blindly test hypotheses in a purist vacuum?

The next reasons this trial should not be conducted flow from the first.  The trial will not have a high enough likelihood ratio to sway the high prior below the decision threshold; if the trial is "positive" we will have spent millions of dollars to "prove" something we already knew at a threshold above our treatment threshold; if the trial is positive, some will squawk "It wasn't blinded" yada yada yada in an attempt to dismiss the results as false positives; if the trial is negative, some will, like the tyro medical student, declare that "there is no evidence for early mobility" and similar hoopla and poppycock; or the worst case:  the trial shows harm from early mobility, which will get the naysayers of early mobility very agitated.  But of course, our prior probability that early mobility is harmful is hopelessly low, making such a result highly likely to be spurious.  When we clamor about "evidence" we are in essence clamoring about "testing hypotheses with RCTs" and eschewing our responsibility to use clinical judgment, recognize the limits of testing, and practice in the face of uncertainty using our "untested" prior probabilities.

Consider a trial of exercise on cardiovascular outcomes in community dwelling adults - what good can possibly come of such a trial?  Don't we already know that exercise is good for you?  If so, a positive trial reinforces what we already know (but does little to convince sedentary folks to exercise, as they too already know they should exercise), but a negative trial risks sending the message to people that exercise is of no use to you, or that the number needed to treat is too small for you to worry about.

Or consider the recent trials of EGDT which "refuted" the Rivers trial from 14 years ago.  Now, everybody is saying, "Well, we know it works, maybe not the catheters and the ScVO2 and all those minutaie , but in general, rapid early resuscitation works.  And the trials show that we've already incorporated what works into general practice!"

I don't know the solutions to these difficult quandries that we repeatedly find ourselves in trial after trial in critical care medicine.  I'm confused too.  That's why I'm thinking very hard and very critically about the limits of our methods and our models and our routines.  But if we can anticipate not only the results of the trials, but also the community reaction to them, then we have guidance about how to proceed in the future.  Because what value does a mega-trial have, if not to guide care after its completion?  And even if that is not its goal, (maybe its goal is just to inform the science), can we turn a blind eye to the fact that it will guide practice after its completion, even if that guidance is premature?

It is my worry that, given the high prior probability that a trial in critical care medicine will be "negative", the most likely result is a negative trial which will embolden those who wish to dismiss the probable benefits of early mobility and give them an excuse to not do it.

Diagnostic tests have risks.  A false negative test is one such risk.

Saturday, October 11, 2014

Enrolling Bad Patients After Good: Sunk Cost Bias and the Meta-Analytic Futility Stopping Rule

Four (relatively) large critical care randomized controlled trials were published early in the NEJM in the last week.  I was excited to blog on them, but then I realized they're all four old news, so there's nothing to blog about.  But alas, the fact that there is no news is the news.

In the last week, we "learned" that more transfusion is not helpful in septic shock, that EGDT (the ARISE trial) is not beneficial in sepsis, that simvastatin (HARP-2 trial) is not beneficial in ARDS, and that parental administration of nutrition is not superior to enteral administration in critical illness.  Any of that sound familiar?

I read the first two articles, then discovered the last two and I said to myself "I'm not reading these."  At first I felt bad about this decision, but then that I realized it is a rational one.  Here's why.