Showing posts with label Bayes' Theorem. Show all posts
Showing posts with label Bayes' Theorem. Show all posts

Tuesday, January 28, 2020

Bad Science + Zealotry = The Wisconsin Witch Hunts. The Case of John Cox, MD

John Cox, MD
I stumbled upon a very disturbing report on NBC News today of a physician couple in Wisconsin accused of abusing their adopted infant daughter.  It is surreal and horrifying and worth a read - not because these physicians abused their daughter, but because they almost assuredly did not.  One driving force behind the case appears to be a well-meaning and perfervid, if misguided and perfidious, pediatrician at University of Wisconsin who, with her group, coined the term "sentinel injuries" (SI) to describe small injuries such as bruises and oral injuries that they posit portend future larger scale abuse.  It was the finding of SI on the adopted infant in the story that in part led to charges of abuse against the father, Dr. Cox, got his child put in protective services, got him arrested, and threatens his career.  Interested readers can reference the link above for sordid and sundry details of the case.

Before delving into the 2013 study in Pediatrics upon which many contentions about SI rest, we should start with the fundamentals.  First, is it plausible that the thesis is correct, that before serious abuse, minor abuse is detectable by small bruises or oral injuries?  Of course it is, and it sounds like a good narrative.  But being a good plausible narrative does not make it true and it is likewise possible that bruises seen in kids who are and are not abused reflect nothing more than accidental injuries from rolling off a support, something falling or dropping on them, somebody dropping them, a sibling jabbing at them with a toy, and a number of things.  To my knowledge, the authors offer no direct evidence that the SIs they or others report have been directly traced to abuse.  They are doing nothing more than inferring that facial bruising is a precursor to Abusive Head Trauma (AHT), and based on their bibliography they have gone out of their way to promote this notion.

Thursday, February 15, 2018

Ruling Out PE in the ED: Critical Analysis of the PROPER Trial

This post is going to be an in-depth "journal club" style analysis of the PROPER trial.

In this week's JAMA, Freund et al report the results of the PROPER randomized controlled trial of the PERC (pulmonary embolism rule -out criteria) rule for safely excluding pulmonary embolism (PE) in the emergency department (ED) among patients with a "low clinical gestalt" of having PE.  All things pulmonary and all things noninferiority being pet topics of mine, I had to delve deeper into this article because frankly the abstract confused me.

This was a cluster randomized noninferiority trial, but for most purposes, the cluster part can be ignored when looking at the data.  Each of 14 EDs in France was randomized such that during the "PERC period" PE was excluded in patients with a "low gestalt clinical probability" (not yet defined in the abstract) if all of the 8 items of the PERC rule were excluded.  In the "control period" usual procedures for exclusion of PE were followed.  The primary end point was occurrence of a [venous] thromboembolic event (VTE) during 3 months of follow-up.  The delta (pre-specified margin of noninferiority) for the endpoint was 1.5%.  This is a pleasingly low number.  In our meta-research study of 163 noninferiority trials including those in JAMA from 2010-1016, we found that the average delta for those using an absolute risk difference (n=137) was 8.7%, almost 6 times higher!  This is laudable, but was aided by a low estimated event rate in the control group which means that the sample size of ~1900 was feasible given what I assume were relatively low costs of the study.  Kudos to the authors too, for concretely justifying delta in the methods section.

Wednesday, January 11, 2017

Don't Get Soaked: The Practical Utility of Predicting Fluid Responsiveness

In this article in the September 27th issue of JAMA, the authors discuss the rationale and evidence for predicting fluid responsiveness in hemodynamically unstable patients.  While this is a popular academic topic, its practical importance is not as clear.  Some things, such as predicting performance on a SBT with a Yang-Tobin f/Vt,  don't make much sense - just do the SBT if that's the result you're really interested in.  The prediction of whether it will rain today is not very important if the difference in what I do is as small as tucking an umbrella into my bag or not.  Neither the inconvenience of getting a little wet walking from the parking garage nor that of carrying the umbrella is very great.  Similarly, a prediction of whether or not it will rain two months from now when I'm planning a trip to Cancun is not very valuable to me because the confidence intervals about the estimate are too wide to rely upon.  Better to just stick with the base rates:  how much rainfall is there in March in the Caribbean on an average year?

Our letter to the editor was not published in JAMA, so I will post it here:

To the Editor:  A couple of issues relating to the article about predicting responsiveness to fluid bolus1 deserve attention.  First, the authors made a mathematical error that may cause confusion among readers attempting to duplicate the Bayesian calculations described in article.  The negative predictive value (NPV) of a test is the proportion of patients with a negative test who do not have the condition – the true negative rate.2  In each of the instances in which NPV is mentioned in the article, the authors mistakenly report the proportion of patients with a negative test who do have the condition.  This value, 1-NPV, is the false negative rate - the posterior probability of the condition in those with a negative test.

Second, in the examples that discuss NPV, the authors use a prior probability of fluid responsiveness of 50%.  A clinician who appropriately uses a threshold approach to decision making3 must determine a probability threshold above which treatment is warranted, considering the net utility of all possible outcomes with and without treatment given that treatment’s risks and benefits4Because the risk of fluid administration in judicious quantities is low5, the threshold for fluid administration is correspondingly low and fluid bolus may be warranted based on prior probability alone, thus obviating additional testing.  Even if additional testing is negative and suggests a posterior probability of fluid responsiveness of only 10% (with an upper 95% confidence limit of 18%), many clinicians would still judge a trial of fluids to be justified because fluids are considered to be largely benign and untreated hypovolemia is not4.  (The upper confidence limit will be higher still if the prior probability was underestimated.)  Finally, the posterior probabilities hinge critically on the estimates of prior probabilities, which are notoriously nebulous and subjective.  Clinicians are likely intuitively aware of these quandaries, which may explain why empiric fluid bolus is favored over passive leg raise testing outside of academic treatises6.


1.            Bentzer P, Griesdale DE, Boyd J, MacLean K, Sirounis D, Ayas NT. WIll this hemodynamically unstable patient respond to a bolus of intravenous fluids? JAMA. 2016;316(12):1298-1309.
2.            Fischer JE, Bachmann LM, Jaeschke R. A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis. Intensive Care Med. 2003;29(7):1043-1051.
3.            Pauker SG, Kassirer JP. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109-1117.
4.            Tsalatsanis A, Hozo I, Kumar A, Djulbegovic B. Dual Processing Model for Medical Decision-Making: An Extension to Diagnostic Testing. PLoS One. 2015;10(8):e0134800.
5.            Investigators TP. A Randomized Trial of Protocol-Based Care for Early Septic Shock. N Engl J Med. 2014;370(18):1683-1693.
6.            Marik PE, Monnet X, Teboul J-L. Hemodynamic parameters to guide fluid therapy. Annals of Intensive Care. 2011;1:1-1.


Scott K Aberegg, MD, MPH
Andrew M Hersh, MD
The University of Utah School of Medicine
Salt Lake City, Utah


Wednesday, July 22, 2015

There is (No) Evidence For That: Epistemic Problems in Evidence Based Medicine

Below is a Power Point Presentation that I have delivered several times recently including one iteration at the SMACC conference in Chicago.  It addresses epistemic problems in our therapeutic knowledge, and calls into question all claims of "there is evidence for ABC" and "there is no evidence for ABC."  Such claims cannot be taken at face value and need deeper consideration and evaluation considering all possible states of reality - gone is the cookbook or algorithmic approach to evidence appraisal as promulgated by the User's Guides.  Considered in the presentation are therapies for which we have no evidence, but they undoubtedly work (Category 1 - Parachutes) and therapies for which we have evidence of efficacy or lack thereof (Category 2) but that evidence is subject to false positives and false negatives, for numerous reasons including: the Ludic Fallacy, study bias (See: Why Most Published Research Findings Are False), type 1 and 2 errors, the "alpha bet" (the arbitrary and lax standard used for alpha, namely 0.05), Bayesian interpretations, stochastic dominance of the null hypothesis, inadequate study power in general and that due to delta inflation and subversion of double significance hypothesis testing.  These are all topics that have been previously addressed to some degree on this blog, but this presentation presents them together as a framework for understanding the epistemic problems that arise within our "evidence base."  It also provides insights into why we have a generation of trials in critical care the results of which converge on the null and why positive studies in this field cannot be replicated.

Sunday, April 6, 2014

Underperforming the Market: Why Researchers are Worse than Professional Stock Pickers and A Way Out

I was reading in the NYT yesterday a story about Warren Buffet and how the Oracle of Omaha has trailed the S&P 500 for four of the last five years.  It was based on an analysis done by a statistician who runs a blog called Statistical Ideas, which has a post on p-values that links to this Nature article a couple of months back that describes how we can be misled by P-values.  And all of this got me thinking.

We have a dual problem in medical research:  a.)  of conceiving alternative hypotheses which cannot be confirmed in large trials free of bias;  and b.) not being able to replicate the findings of positive trials.  What are the reasons for this?

Thursday, May 24, 2012

Fever, external cooling, biological precedent, and the epistemology of medical evidence

It is rare occasion that one article allows me to review so many aspects of the epistemology of medical evidence, but alas Schortgen et al afforded me that opportunity in the May 15th issue of AJRCCM.

The issues raised by this article are so numerous that I shall make subsections for each one. The authors of this RCT sought to determine the effect of external cooling of febrile septic patients on vasopressor requirements and mortality. Their conclusion was that "fever control using external cooling was safe and decreased vasopressor requirements and early mortality in septic shock." Let's explore the article and the issues it raises and see if this conclusion seems justified and how this study fits into current ICU practice.

PRIOR PROBABILITY, BIOLOGICAL PLAUSIBILITY, and BIOLOGICAL PRECEDENTS

These are related but distinct issues that are best considered both before a study is planned, and before its report is read. A clinical trial is in essence a diagnostic test of a hypothesis, and like a diagnostic test, its influence on what we already know depends not only on the characteristics of the test (sensitivity and specificity in a diagnostic test; alpha and power in the case of a clinical trial) but also on the strength of our prior beliefs. To quote Sagan [again], "extraordinary claims require extraordinary evidence." I like analogies of extremes: no trial result is sufficient to convince the skeptical observer that orange juice reduces mortality in sepsis by 30%; and no evidence, however cogently presented, is sufficient to convince him that the sun will not rise tomorrow. So when we read the title of this or any other study, we should pause to ask: What is my prior belief that external cooling will reduce mortality in septic shock? That it will reduce vasopressor requirements?

Saturday, September 5, 2009

Troponin I, Troponin T, Troponin is the Woe of Me

As a critical care physician, I have not infrequently been called to the emergency department to admit a patient on the basis of "abnormal laboratory tests" with no synthesis, no assimilation of the various results into any semblance of a unifying diagnosis. It is bad enough that patients' chests are no longer ausculted, respiratory rates and patterns not noted, neck veins not examined, etc. It is worse that the portable chest film (often incorrectly interpreted), the arterial blood gas (also often incorrectly interpteted), and the BNP level have supplanted any sort of logical and systematic approach to diagnosing a patient's problem. If we are going to replace physical examination with BNPs and d-dimers, we should at least insist that practitioners have one iota of familiarity with Bayes' Theorem and pre-test probabilities and the proper interpretation of test results.

Thus I raised at least one brow slightly on August 27th when the NEJM reported two studies of highly sensitive troponin assays for the "early diagnosis of myocardial infarction" (wasn't troponin sensitive enough already? see: http://content.nejm.org/cgi/content/abstract/361/9/858 and http://content.nejm.org/cgi/content/abstract/361/9/868 ). Without commenting on the studies' methodological quality specifically, I will emphasize some pitfalls and caveats related to the adoption of this "advance" in clinical practice, especially that outside of the setting of an appropriately aged person with risk factors who presents to an acute care setting with SYMPTOMS SUGGESTIVE OF MYOCARDIAL INFARCTION.

In such a patient, say a 59 year old male with hypertension, diabetes and a family history of coronary artery disease, who presents to the ED with chest pain, we (and our cardiology colleagues) are justified in having a high degree of confidence in the results of this test based on these and a decade or more of other data. But I suspect that only the MINORITY of cardiac troponin tests at my institution are ordered for that kind of indication. Rather, it is used as a screening test for just about any patient presenting to the ED who is ill enough to warrant admission. And that's where the problem has its roots. Our confidence in the diagnostic accuracy of this test in the APPROPRIATE SETTING (read appropriate clinical pre-test probability) should not extend to other scenarios, but all too often it does, and it makes a real conundrum when it is positive in those other scenarios. Here's why.
Suppose that we have a pregnancy test that is evaluated in women who have had a sexual encounter and who have missed two menstrual periods and it is found to be 99.9% sensitive and 99.9% specific. (I will bracket for now the possibility that you could have a 100% sensitive and/or specific test.) Now suppose that you administer this test to 10,000 MEN. Does a positive test mean that a man is pregnant? Heavens No! He probably has testicular cancer or some other malady. This somewhat silly example is actually quite useful to reinforce the principle that no matter how good a test is, if it is not used appropriately or in the appropriate scenario that the results are likely to be misleading. Likewise, consider this test's use in a woman who has not missed a menstrual cycle - does a negative test mean that she is not pregnant? Perhaps not, since the sensitivity was determined in a population that had missed 2 cycles. If a woman were obviously24 weeks pregnant and the test was negative, what would we think? It is important to bear in mind that these tests are NOT direct tests for the conditions we seek to diagnose, but are tests of ASSOCIATED biological phenomena, and insomuch as our understanding of those phenomena is limited or there is variation in them, the tests are liable to be fallible. A negative test in a woman with a fetus in utero may mean that the sample was mishandled, that the testing reagents were expired, that there is an interfering antibody, etc. Tests are not perfect, and indeed are highly prone to be misleading if not used in the appropriate clinical scenario.

And thus we return to cardiac troponins. In the patients I'm called to admit to the ICU who have sepsis, PE, COPD, pneumonia, respiratory failure, renal failure, metabolic acidosis, a mildly positive troponin which is a COMMON occurrence is almost ALWAYS an epiphenomenon of critical illness rather than an acute myocardial infarction. Moreover, the pursuit of diagnosis via cardiac catheterization or the empiric treatment with antiplatelet agents and anticoagulants almost always is a therapeutic misadventure in these patients who are at much greater risk of bleeding and renal failure via these interventions which are expected to have a much reduced positive utility for them. More often than not, I would just rather not know the results of a troponin test outside the setting of isolated acute chest pain. Other practitioners should be acutely aware of the patient populations in which these tests are performed, and the significant limitations of using these highly sensitive tests in other clinical scenarios.

Saturday, March 14, 2009

"Statistical Slop": What billiards can teach us about multiple comparisons and the need to assign primary endpoints

Anyone who has played pool knows that you have to call your shots before you make them. This rule is intended to decrease probability of "getting lucky" from just hitting the cue ball as hard as you can, expecting that the more it bounces around the table, the more likely it is that one of your many balls will fall through chance alone. Sinking a ball without first calling it is referred to coloquially as "slop" or a "slop shot".

The underlying logic is that you know best which shot you're MOST likely to successfully make, so not only does that increase the prior probability of a skilled versus a lucky shot (especially if it is a complex shot, such as one "off the rail"), but also it effectively reduces the number of chances the cue ball has to sink one of your balls without you losing your turn. It reduces those multiple chances to one single chance.

Likewise, a clinical trialist must focus on one "primary outcome" for two reasons: 1.) because preliminary data, if available, background knowledge, and logic will allow him to select the variable with the highest "pre-test probability" of causing the null hypothesis to be rejected, meaning that the post-test probability of the alternative hypothesis is enhanced; and 2.) because it reduces the probaility to find "significant" associations among multiple variables through chance alone. Today I came across a cute little experiment that drives this point home quite well. The abstract can be found here on pubmed: http://www.ncbi.nlm.nih.gov/pubmed/16895820?ordinalpos=4&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum .


In it, the authors describe "dredging" a Canadian database and looking for correlations between astrological signs and various diagnoses. Significant associations were found between the Leo sign and gastrointestinal hemorrhage, and the Saggitarius sign and humerous fracture. With this "analogy of extremes" as I like to call them, you can clearly see how the failure to define a prospective primary endpoint can lead to statistical slop. (Nobody would have been able to predict a priori that it would be THOSE two diagnoses associated with THOSE two signs!) Failure to PROSPECTIVELY identify ONE primary endpoint led to multiple chances for chance associations. Moreover, because there were no preliminary data upon which to base a primary hypothesis, the prior probability of any given alternative hypothesis is markedly reduced, and thus the posterior probability of the alternative hypothesis remains low IN SPITE OF the statistically significant result.

It is for this very reason that "positive" or significant associations among non-primary endpoint variables in clinical trials are considered "hypothesis generating" rather than hypothesis confirming. Requiring additional studies of these associations as primary endpoints is like telling your slop shot partner in the pool hall "that's great, but I need to see you do that double rail shot again to believe that it's skill rather than luck."

Reproducibility of results is indeed the hallmark of good science.