Showing posts with label pre-test probability. Show all posts
Showing posts with label pre-test probability. Show all posts

Thursday, February 15, 2018

Ruling Out PE in the ED: Critical Analysis of the PROPER Trial

This post is going to be an in-depth "journal club" style analysis of the PROPER trial.

In this week's JAMA, Freund et al report the results of the PROPER randomized controlled trial of the PERC (pulmonary embolism rule -out criteria) rule for safely excluding pulmonary embolism (PE) in the emergency department (ED) among patients with a "low clinical gestalt" of having PE.  All things pulmonary and all things noninferiority being pet topics of mine, I had to delve deeper into this article because frankly the abstract confused me.

This was a cluster randomized noninferiority trial, but for most purposes, the cluster part can be ignored when looking at the data.  Each of 14 EDs in France was randomized such that during the "PERC period" PE was excluded in patients with a "low gestalt clinical probability" (not yet defined in the abstract) if all of the 8 items of the PERC rule were excluded.  In the "control period" usual procedures for exclusion of PE were followed.  The primary end point was occurrence of a [venous] thromboembolic event (VTE) during 3 months of follow-up.  The delta (pre-specified margin of noninferiority) for the endpoint was 1.5%.  This is a pleasingly low number.  In our meta-research study of 163 noninferiority trials including those in JAMA from 2010-1016, we found that the average delta for those using an absolute risk difference (n=137) was 8.7%, almost 6 times higher!  This is laudable, but was aided by a low estimated event rate in the control group which means that the sample size of ~1900 was feasible given what I assume were relatively low costs of the study.  Kudos to the authors too, for concretely justifying delta in the methods section.

Sunday, April 6, 2014

Underperforming the Market: Why Researchers are Worse than Professional Stock Pickers and A Way Out

I was reading in the NYT yesterday a story about Warren Buffet and how the Oracle of Omaha has trailed the S&P 500 for four of the last five years.  It was based on an analysis done by a statistician who runs a blog called Statistical Ideas, which has a post on p-values that links to this Nature article a couple of months back that describes how we can be misled by P-values.  And all of this got me thinking.

We have a dual problem in medical research:  a.)  of conceiving alternative hypotheses which cannot be confirmed in large trials free of bias;  and b.) not being able to replicate the findings of positive trials.  What are the reasons for this?

Saturday, September 5, 2009

Troponin I, Troponin T, Troponin is the Woe of Me

As a critical care physician, I have not infrequently been called to the emergency department to admit a patient on the basis of "abnormal laboratory tests" with no synthesis, no assimilation of the various results into any semblance of a unifying diagnosis. It is bad enough that patients' chests are no longer ausculted, respiratory rates and patterns not noted, neck veins not examined, etc. It is worse that the portable chest film (often incorrectly interpreted), the arterial blood gas (also often incorrectly interpteted), and the BNP level have supplanted any sort of logical and systematic approach to diagnosing a patient's problem. If we are going to replace physical examination with BNPs and d-dimers, we should at least insist that practitioners have one iota of familiarity with Bayes' Theorem and pre-test probabilities and the proper interpretation of test results.

Thus I raised at least one brow slightly on August 27th when the NEJM reported two studies of highly sensitive troponin assays for the "early diagnosis of myocardial infarction" (wasn't troponin sensitive enough already? see: http://content.nejm.org/cgi/content/abstract/361/9/858 and http://content.nejm.org/cgi/content/abstract/361/9/868 ). Without commenting on the studies' methodological quality specifically, I will emphasize some pitfalls and caveats related to the adoption of this "advance" in clinical practice, especially that outside of the setting of an appropriately aged person with risk factors who presents to an acute care setting with SYMPTOMS SUGGESTIVE OF MYOCARDIAL INFARCTION.

In such a patient, say a 59 year old male with hypertension, diabetes and a family history of coronary artery disease, who presents to the ED with chest pain, we (and our cardiology colleagues) are justified in having a high degree of confidence in the results of this test based on these and a decade or more of other data. But I suspect that only the MINORITY of cardiac troponin tests at my institution are ordered for that kind of indication. Rather, it is used as a screening test for just about any patient presenting to the ED who is ill enough to warrant admission. And that's where the problem has its roots. Our confidence in the diagnostic accuracy of this test in the APPROPRIATE SETTING (read appropriate clinical pre-test probability) should not extend to other scenarios, but all too often it does, and it makes a real conundrum when it is positive in those other scenarios. Here's why.
Suppose that we have a pregnancy test that is evaluated in women who have had a sexual encounter and who have missed two menstrual periods and it is found to be 99.9% sensitive and 99.9% specific. (I will bracket for now the possibility that you could have a 100% sensitive and/or specific test.) Now suppose that you administer this test to 10,000 MEN. Does a positive test mean that a man is pregnant? Heavens No! He probably has testicular cancer or some other malady. This somewhat silly example is actually quite useful to reinforce the principle that no matter how good a test is, if it is not used appropriately or in the appropriate scenario that the results are likely to be misleading. Likewise, consider this test's use in a woman who has not missed a menstrual cycle - does a negative test mean that she is not pregnant? Perhaps not, since the sensitivity was determined in a population that had missed 2 cycles. If a woman were obviously24 weeks pregnant and the test was negative, what would we think? It is important to bear in mind that these tests are NOT direct tests for the conditions we seek to diagnose, but are tests of ASSOCIATED biological phenomena, and insomuch as our understanding of those phenomena is limited or there is variation in them, the tests are liable to be fallible. A negative test in a woman with a fetus in utero may mean that the sample was mishandled, that the testing reagents were expired, that there is an interfering antibody, etc. Tests are not perfect, and indeed are highly prone to be misleading if not used in the appropriate clinical scenario.

And thus we return to cardiac troponins. In the patients I'm called to admit to the ICU who have sepsis, PE, COPD, pneumonia, respiratory failure, renal failure, metabolic acidosis, a mildly positive troponin which is a COMMON occurrence is almost ALWAYS an epiphenomenon of critical illness rather than an acute myocardial infarction. Moreover, the pursuit of diagnosis via cardiac catheterization or the empiric treatment with antiplatelet agents and anticoagulants almost always is a therapeutic misadventure in these patients who are at much greater risk of bleeding and renal failure via these interventions which are expected to have a much reduced positive utility for them. More often than not, I would just rather not know the results of a troponin test outside the setting of isolated acute chest pain. Other practitioners should be acutely aware of the patient populations in which these tests are performed, and the significant limitations of using these highly sensitive tests in other clinical scenarios.

Saturday, March 14, 2009

"Statistical Slop": What billiards can teach us about multiple comparisons and the need to assign primary endpoints

Anyone who has played pool knows that you have to call your shots before you make them. This rule is intended to decrease probability of "getting lucky" from just hitting the cue ball as hard as you can, expecting that the more it bounces around the table, the more likely it is that one of your many balls will fall through chance alone. Sinking a ball without first calling it is referred to coloquially as "slop" or a "slop shot".

The underlying logic is that you know best which shot you're MOST likely to successfully make, so not only does that increase the prior probability of a skilled versus a lucky shot (especially if it is a complex shot, such as one "off the rail"), but also it effectively reduces the number of chances the cue ball has to sink one of your balls without you losing your turn. It reduces those multiple chances to one single chance.

Likewise, a clinical trialist must focus on one "primary outcome" for two reasons: 1.) because preliminary data, if available, background knowledge, and logic will allow him to select the variable with the highest "pre-test probability" of causing the null hypothesis to be rejected, meaning that the post-test probability of the alternative hypothesis is enhanced; and 2.) because it reduces the probaility to find "significant" associations among multiple variables through chance alone. Today I came across a cute little experiment that drives this point home quite well. The abstract can be found here on pubmed: http://www.ncbi.nlm.nih.gov/pubmed/16895820?ordinalpos=4&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum .


In it, the authors describe "dredging" a Canadian database and looking for correlations between astrological signs and various diagnoses. Significant associations were found between the Leo sign and gastrointestinal hemorrhage, and the Saggitarius sign and humerous fracture. With this "analogy of extremes" as I like to call them, you can clearly see how the failure to define a prospective primary endpoint can lead to statistical slop. (Nobody would have been able to predict a priori that it would be THOSE two diagnoses associated with THOSE two signs!) Failure to PROSPECTIVELY identify ONE primary endpoint led to multiple chances for chance associations. Moreover, because there were no preliminary data upon which to base a primary hypothesis, the prior probability of any given alternative hypothesis is markedly reduced, and thus the posterior probability of the alternative hypothesis remains low IN SPITE OF the statistically significant result.

It is for this very reason that "positive" or significant associations among non-primary endpoint variables in clinical trials are considered "hypothesis generating" rather than hypothesis confirming. Requiring additional studies of these associations as primary endpoints is like telling your slop shot partner in the pool hall "that's great, but I need to see you do that double rail shot again to believe that it's skill rather than luck."

Reproducibility of results is indeed the hallmark of good science.