Showing posts with label omission bias. Show all posts
Showing posts with label omission bias. Show all posts

Sunday, July 21, 2019

Move Over Feckless Extubation, Make Room For Reckless Extubation

Following the theme of some recent posts on Status Iatrogenicus (here and here) about testing and treatment thresholds, one of our stellar fellows Meghan Cirulis MD and I wrote a letter to the editor of JAMA about the recent article by Subira et al comparing shorter duration Pressure Support Ventilation to longer duration T-piece trials.  Despite adhering to my well hewn formula for letters to the editor, it was not accepted, so as is my custom, I will publish it here.

Spoiler alert - when the patients you enroll in your weaning trial have a base rate of extubation success of 93%, you should not be doing an SBT - you should be extubating them all, and figuring out why your enrollment criteria are too stringent and how many extubatable patients your enrollment criteria are missing because of low sensitivity and high specificity.

Friday, March 5, 2010

Levo your Dopa at the Door - how study design influences our interpretation of reality

Another excellent critical care article was published this week in NEJM, the SOAP II study: http://content.nejm.org/cgi/content/short/362/9/779 . In this RCT of norepinephrine (norepi, levophed, or "levo" for short) versus dopamine ("dopa" for short) for the treatment of shock, the authors tried to resolve the longstanding uncertainty and debate surrounding the treatment of patients in various shock states. Proponents of any agent in this debate have often hung their hats on extrapolations of physiological and pharmacological principles to intact humans, leading to colloquialisms such as "leave-em-dead" for levophed and "renal-dose dopamine". This blog has previously emphasized the frailty of pathophysiological reasoning, the same reasoning which has irresistibly drawn cardiologists and nephrologists to dopamine because of its presumed beneficial effects on cardiac and urine output, and, by association, outcomes.

Hopefully all docs with a horse in this race will take note of the outcome of this study. In its simplest and most straightforward and technically correct interpretation, levo was not superior to dopa in terms of an effect on mortality, but was indeed superior in terms of side effects, particularly cardiac arrhythmias (a secondary endpoint). The direction of the mortality trend was in favor of levo, consistent with observational data (the SOAP I study by many of the same authors) showing reduced mortality with levo compared with dopa in the treatment of shock. As followers of this blog also know, the interpretation of "negative" studies (that is, MOST studies in critical care medicine - more on that in a future post) can be more challenging than the interpretation of positive studies, because "absence of evidence is not evidence of absence".

We could go to the statistical analysis section, and I could harp on the choice of delta, the decision to base it on a relative risk reduction, the failure to predict a baseline mortality, etc. (I will note that at least the authors defended their delta based on prior data, something that is a rarity - again, a future post will focus on this.) But, let's just be practical and examine the 95% CI of the mortality difference (the primary endpoint) and try to determine whether it contains or excludes any clinically meaningful values that may allow us to compare these two treatments. First, we have to go to the raw data and find the 95% CI of the ARR, because the Odds Ratio can inflate small differences as you know. That is, if the baseline is 1%, then a statistically significant increase in odds of 1.4 is not meaningful because it represents only a 0.4% increase in the outcome - miniscule. With Stata, we find that the ARR is 4.0%, with a 95% CI of -0.76% (favors dopamine) to +8.8% (favors levo). Wowza! Suppose we say that a 3% difference in mortality in either direction is our threshold for CLINICAL significance. This 95% CI includes a whole swath of values between 3% and 8.8% that are of interest to us and they are all in favor of levo. (Recall that perhaps the most lauded trial in critical care medicine, the ARDSnet ARMA study, reduced mortality by just 9%.) On the other side of the spectrum, the range of values in favor of dopa is quite narrow indeed - from 0% to -0.76%, all well below our threshold for clinical significance (that is, the minimal clinically important difference or MCID) of 3%. So indeed, this study surely seems to suggest that if we ever choose between these two widely available and commonly used agents, the cake goes to levo, hands down. I hardly need a statistically significant result with a 95% CI like this one!

So, then, why was the study deemed "negative"? There are a few reasons. Firstly, the trial is probably guilty of "delta inflation" whereby investigators seek a pre-specified delta that is larger than is realistic. While they used, ostensibly, 7%, the value found in the observational SOAP I study, they did not account for regression to the mean, or allow any buffer for the finding of a smaller difference. However, one can hardly blame them. Had they looked instead for 6%, and had the 4% trend continued for additional enrollees, 300 additional patients in each group (or about 1150 in each arm) would have been required and the final P-value would have still fallen short at 0.06. Only if they had sought a 5% delta, which would have DOUBLED the sample size to 1600 per arm, would they have achieved a statistically significant result with 4% ARR, with P=0.024. Such is the magnitude of the necessary increase in sample size as you seek smaller and smaller deltas.

Which brings me to the second issue. If delta inflation leads to negative studies, and logistical and financial constraints prohibit the enrollment of massive numbers of patients, what is an investigator to do? Sadly, the poor investigator wishing to publish in the NEJM or indeed any peer reviewed journal is hamstrung by conventions that few these days even really understand anymore: namely, the mandatory use of 0.05 for alpha and "doubly significant" power calculations for hypothesis testing. I will not comment more on the latter other than to say that interested readers can google this and find some interesting, if arcane, material. As regards the former, a few comments.

The choice of 0.05 for the type 1 error rate, that is, the probability that we will reject the null hypothesis based on the data and falsely conclude that one therapy is superior to the other; and the choice of 10-20% for the type 2 error rate (power 80-90%), that is the probability that the alternative hypothesis is really true and we will reject it based on the data; derive from the traditional assumption, which is itself an omission bias, that it is better in the name of safety to keep new agents out of practice by having a more stringent requirement for accepting efficacy than the requirement for rejecting it. This asymmetry is the design of trials is of dubious rationality from the outset (because it is an omission bias), but it is especially nettlesome when the trial is comparing two agents already in widespread use. As opposed to the trial of a new drug compared to placebo, where we want to set the hurdle high for declaring efficacy, especially when the drug might have side effects - with levo versus dopa, the real risk is that we'll continue to consider them to be equivalent choices when there is strong reason to favor one over the other based either on previous or current data. This is NOT a trial of treatment versus no treatment of shock, this trial assumes that you're going to treat the shock with SOMETHING. In a trial such as this one, one could make a strong argument that a P-value of 0.10 should be the threshold for statistical significance. In my mind it should have been.

But as long as the perspicacious consumer of the literature and reader of this blog takes P-values with a grain of salt and pays careful attention to the confidence intervals and the MCID (whatever that may be for the individual), s/he will not be misled by the deeply entrenched convention of alpha at 0.05, power at 90%, and delta wildly inflated to keep the editors and funding agencies mollified.

Sunday, August 5, 2007

AVANDIA and Omission Bias

Amid all the hype about Avandia recently, a few relatively clear-cut observations are apparent (most of which are described better than I could hope to do in the July 5 issue of NEJM. Drazen et al, Dean, and Psaty each wrote wonderful editorials available at www.nejm.org).

1.) Avandia appears to have NO benefits besides the surrogate endpoint of improved glycemic control (and engorging the coffers of GSK, the manufacturer).

2.) Avandia may well increase the risk of CHF, MI, raise LDL cholesterol, cause weight gain and increase the risk of fractures (the latter in women).

3.) Numerous alternative agents exist, some of which improve primary outcomes (think UKPDS and metformin), and most of which appear to be safer.

So, what physician in his right mind would start a patient on Avandia (especially in light of #3)? And if you would not START a patient on Avandia, then you should STOP Avandia in patients who are already taking it.


To not do so would be to commit OMISSION BIAS - which refers to the tendency (in medicine and in life) to view the risks and/or consequences of doing nothing as superior to the risks and/or consequences of acting, even when the converse is true (i.e., the risks and/or consequences of acting are superior to those related to inaction). (For a reference, indulge me: Aberegg et al http://www.chestjournal.org/cgi/content/abstract/128/3/1497.)

This situation is reminiscent of recommendations relating to the overall (read "net") health benefits of ethanol consumption - physicians are told to not discourage moderate alcohol consumption in patients who already consume, but not to encourage it in those who currently abstain. Well, alcohol is either good for you, or it is not. And since it appears to be good for you, the recommendation on its consumption should not hinge one iota on an arbitrarily established status quo (whether for reasons completely unrelated to health a person currently drinks).
(For a reference, see Malinski et al: http://archinte.ama-assn.org/cgi/content/abstract/164/6/623; the last paragraph in the discussion could serve as an expose on omission bias.)

So, let me go out on a limb here: Nobody should be taking Avandia, and use of this medication should not resume until some study demonstrates a substantive benefit in a meaningful outcome which outweighs any risks associated with the drug. Until we do this, we are the victims of OMISSION BIAS (+/- status quo bias) and the profiteering conspiracy of GSK which is beautifully alluded to, along with a poignant description of the probably intentional shortcomings in the design and conduct of the RECORD trial here: Psaty and Furberg http://content.nejm.org/cgi/content/extract/356/24/2522.

Thursday, July 19, 2007

The WAVE trial: The Canadians set the standard once again

Today's NEJM contains the report of an exemplary trial (the WAVE trial) comparing aspirin to aspirin and warfarin combined in the prevention of cardiovascular events in patients with peripheral vascular disease (http://content.nejm.org/cgi/reprint/357/3/217.pdf). Though this was a "negative" trial in that there was no statistically significant difference in the outcomes between the two treatment groups, I am struck by several features of its design that are worth mentioning.

Although the trial was the beneficiary of pharmaceutical funding, the authors state:

"None of the corporate sponsors had any role in the design or conduct of the trial, analysis of the data, or preparation of the manuscript".

Ideally, this would be true of all clinical trials, but right now it's a precocious idea.



One way to remove any potential or perceived conflicts of interest might be to mandate that no phase 3 study be designed, conducted, or analyzed by its sponsor. Rather, phase 3 trials could be funded by a sponsor, but are mandated to be designed, conducted, analyzed, and reported by an independent agency consisting of clinical trials experts, biostatisticians, etc. Such an agency might also receive infrastructural support from governmental agencies. It would have to be large enough to handle the volume of clinical trials, and large enough that a sponsor would not be able to know to what ad hoc design committee the trial would be assigned, thereby preventing unscrupulous sponsors from "stacking the deck" in favor of the agent in which they have an interest.

The authors of the current article also clearly define and describe inclusion and exclusion criteria for the trial, and these are not overly restrictive, increasing the generalizability of the results. Moreover, the ratinoale for the parsimonious inclusion and exclusion criteria are intuitively obvious, unlike some trials where the reader is left to guess why the authors excluded a particular subgroup. Was it because it was thought that the agent would not work in that group? Because increased risk was expected in that group? Because study was too difficult (ethically or logistically) in that group (e.g., pregnancy). Inadequate justification of inclusion and exclusion criteria make it difficult for practitioners to determine how to incorporate the findings into clinical practice. For example, were pregnant patients excluded from trials of therapeutic hypothermia after cardiac arrest (http://content.nejm.org/cgi/reprint/346/8/549.pdf) for ethical reasons, because of an increased risk to the mother or fetus, because small numbers of pregnant patients were expected, because the IRB frowns upon their inclusion or for some other reason? Without knowing this, it is difficult to know what to do with a pregnant woman who is comatose following cardiac arrest. Obviously, their lack of inclusion in the trial does not mean that this therapy is not efficacious for them (absense of evidence is not evidence of absense). If I knew that they were excluded because of a biologically plausible concern for harm to the fetus (and I can think of at least one) rather than because of IRB concerns, I would be better prepared to make a decision about this therapy when faced with pregnant patient after cardiac arrest. Improving the reporting and justification of inclusion and exclusion criteria should be part of efforts to improve the quality of reporting of clinical trials.

Interestingly, the authors also present an analysis of the composite endpoints (coprimary endpoints 1 and 2) that excludes fatal bleeding or hemorrhagic stroke. When these side effects are excluded from the composite endpoints, there is a trend favoring combination therapy (p values 0.11 and 0.09 respectively). Composite endpoints are useful because they allow a trial of a given number of patients to have greater statistical power, and it is rational to include side effects in them, as side effects reduce the net value of the therapy. However, an economist or a person versed in expected utility theory (EUT) would say that it is not fair to combine these endpoints without first weighting them based on their relative (positive or negative value). Not weighting them implies that an episode of severe bleeding in this trial is as bad (negative value or utility) as a death - a contention that I for one would not support. I would much rather bleed than die, or have a heart attack for that matter. Bleeding can usually be readily and effectively treated.

In the future, it may be worthwhile to think more about composite endpoints if we are really interested in the net value/utility of a therapy. While it is often difficult to assign a relative value to different outcomes, methods (such as standard gambles) exist and such assignment may be useful in determining the true net value (to society or to a patient) of a new therapy.