Showing posts with label delta inflation. Show all posts
Showing posts with label delta inflation. Show all posts

Thursday, January 5, 2017

RCT Autopsy: The Differential Diagnosis of a Negative Trial

At many institutions, Journal Clubs meet to dissect a trial after its results are published to look for flaws, biases, shortcomings, limitations.  Beyond the dissemination of the informational content of the articles that are reviewed, Journal Clubs serve as a reiteration and extension of the limitations part of the article discussion.  Unless they result in a letter to the editor, or a new peer-reviewed article about the limitations of the trial that was discussed, the debates of Journal Club begin a headlong recession into obscurity soon after the meeting adjourns.

The proliferation and popularity of online media has led to what amounts to a real-time, longitudinally documented Journal Club.  Named “post-publication peer review” (PPPR), it consists of blog posts, podcasts and videocasts, comments on research journal websites, remarks on online media outlets, and websites dedicated specifically to PPPR.  Like a traditional Journal Club, PPPR seeks to redress any deficiencies in the traditional peer review process that lead to shortcomings or errors in the reporting or interpretation of a research study.

PPPR following publication of a “positive” trial, that is one where the authors conclude that their a priori criteria for rejecting the null hypothesis were met, is oftentimes directed at the identification of a host of biases in the design, conduct, and analysis of the trial that may have led to a “false positive” trial.  False positive trials are those in which either a type I error has occurred (the null hypothesis was rejected even though it is true and no difference between groups exists), or the structure of the experiment was biased in such a way as that the experiment and its statistics cannot be informative.  The biases that cause structural problems in a trial are manifold, and I may attempt to delineate them at some point in the future.  Because it is a simpler task, I will here attempt to list a differential diagnosis that people may use in PPPRs of “negative” trials.

Thursday, September 8, 2016

Hiding the Evidence in Plain Sight: One-sided Confidence Intervals and Noninferiority Trials

In the last post, I linked a video podcast of my explaining non-inferiority trials and their inherent biases.  In this videocast, I revisit noninferiority trials and the use of one-sided confidence intervals.  I review the Salminen et al noninferiority trial of antibiotics versus appendectomy for the treatment of acute appendicitis in adults.  This trial uses a very large delta of 24%.  The criteria for non-inferiority were not met even with this promiscuous delta.  But the use of a 1-sided 95% confidence interval concealed a more damning revelation in the data.  Watch the 13 minute videocast to learn what was hidden in plain sight!

Erratum:  at 1:36 I say "excludes an absolute risk difference of 1" and I meant to say "excludes an absolute risk difference of ZERO."  Similarly, at 1:42 I say "you can declare non-inferiority".  Well, that's true, you can declare noninferiority if your entire 95% confidence interval falls to the left of an ARD of 0 or a HR of 1, but what I meant to say is that if that is the case "you can declare superiority."

Also, at 7:29, I struggle to remember the numbers (woe is my memory!) and I place the point estimate of the difference, 0.27, to the right of the delta dashed line at .24.  This was a mistake which I correct a few minutes later at 10:44 in the video.  Do not let it confuse you, the 0.27 point estimates were just drawn slightly to the right of delta and they should have been marked slightly to the left of it.  I would re-record the video (labor intensive) or edit it, but I'm a novice with this technological stuff, so please do forgive me.

Finally, at 13:25 I say "within which you can hide evidence of non-inferiority" and I meant "within which you can hide evidence of inferiority."

Again, I apologize for these gaffes.  My struggle (and I think about this stuff a lot) in speaking about and accurately describing these confidence intervals and the conclusions that derive from them result from the arbitrariness of the CONSORT "rules" about interpretation and the arbitrariness of the valences (some articles use negative valence for differences favoring "new" some journals use positive values to favor "new").  If I struggle with it, many other readers, I'm sure, also struggle in keeping things straight.  This is fodder for the argument that these "rules" ought to be changed and made more uniform, for equity and ease of understanding and interpretation of non-inferiority trials.

It made me feel better to see this diagram in Annals of Internal Medicine (Perkins et al July 3, 2012, online ACLS training) where they incorrectly place the point estimate at slightly less than -6% (to the left of the dashed delta line in the Figure 2), when it should have been placed slightly greater than -6% (to the right of the dashed delta line).  Clicking on the image will enlarge it.






Wednesday, December 23, 2015

Narrated and Abridged: There is (No) Evidence for That: Epistemic Problems in Critical Care Medicine

Below is the narrated video of my powerpoint presentation on Epistemic Problems in Critical Care Medicine, which provides a framework for understanding why we have both false positives and false negatives in clinical trials in critical care medicine and why we should be circumspect about our "evidence base" and our "knowledge".  This is not trivial stuff, and is worth the 35 minutes required to watch the narration of the slideshow.  It is a provocative presentation which gives compelling reasons to challenge our "evidence base" in critical care and medicine in general, in ways that are not widely recognized but perhaps should be, with several suggestions about assumptions that need to be challenged and revised to make our models of reality more reliable.  Please contact me if you would like me to give an iteration of this presentation at your institution.


Wednesday, July 22, 2015

There is (No) Evidence For That: Epistemic Problems in Evidence Based Medicine

Below is a Power Point Presentation that I have delivered several times recently including one iteration at the SMACC conference in Chicago.  It addresses epistemic problems in our therapeutic knowledge, and calls into question all claims of "there is evidence for ABC" and "there is no evidence for ABC."  Such claims cannot be taken at face value and need deeper consideration and evaluation considering all possible states of reality - gone is the cookbook or algorithmic approach to evidence appraisal as promulgated by the User's Guides.  Considered in the presentation are therapies for which we have no evidence, but they undoubtedly work (Category 1 - Parachutes) and therapies for which we have evidence of efficacy or lack thereof (Category 2) but that evidence is subject to false positives and false negatives, for numerous reasons including: the Ludic Fallacy, study bias (See: Why Most Published Research Findings Are False), type 1 and 2 errors, the "alpha bet" (the arbitrary and lax standard used for alpha, namely 0.05), Bayesian interpretations, stochastic dominance of the null hypothesis, inadequate study power in general and that due to delta inflation and subversion of double significance hypothesis testing.  These are all topics that have been previously addressed to some degree on this blog, but this presentation presents them together as a framework for understanding the epistemic problems that arise within our "evidence base."  It also provides insights into why we have a generation of trials in critical care the results of which converge on the null and why positive studies in this field cannot be replicated.

Friday, May 31, 2013

Over Easy? Trials of Prone Positioning in ARDS

Published May 20 in the  NEJM to coincide with the ATS meeting is the (latest) Guerin et al study of Prone Positioning in ARDS.  The editorialist was impressed.  He thinks that we should start proning patients similar to those in the study.  Indeed, the study results are impressive:  a 16.8% absolute reduction in mortality between the study groups with a corresponding P-value of less than 0.001.  But before we switch our tastes from sunny side up to over easy (or in some cases, over hard - referred to as the "turn of death" in ICU vernacular) we should consider some general principles as well as about a decade of other studies of prone positioning in ARDS.

First, a general principle:  regression to the mean.  Few, if any, therapies in critical care (or in medicine in general) confer a mortality benefit this large.  I refer the reader (again) to our study of delta inflation which tabulated over 30 critical care trials in the top 5 medical journals over 10 years and showed that few critical care trials show mortality deltas (absolute mortality differences) greater than 10%.   Almost all those that do are later refuted.  Indeed it was our conclusion that searching for deltas greater than or equal to 10% is akin to a fool's errand, so unlikely is the probability of finding such a difference.  Jimmy T. Sylvester, my attending at JHH in late 2001 had already recognized this.  When the now infamous sentinel trail of intensive insulin therapy (IIT) was published, we discussed it at our ICU pre-rounds lecture and he said something like "Either these data are faked, or this is revolutionary."  We now know that there was no revolution (although many ICUs continue to practice as if there had been one).  He could have just as easily said that this is an anomaly that will regress to the mean, that there is inherent bias in this study, or that "trials stopped early for benefit...."

Monday, May 20, 2013

It All Hinges on the Premises: Prophylactic Platelet Transfusion in Hematologic Malignancy


A quick update before I proceed with the current post:  The Institute of Medicine has met and they agree with me that sodium restriction is for the birds.  (Click here for a New York Times summary article.)  In other news, the oh-so-natural Omega-3 fatty acid panacea did not improve cardiovascular outcomes as reported in the NEJM on May 9th, 2013.

An article by the TOPPS investigators in the May 9th NEJM is very useful to remind us not to believe everything we read, to always check our premises, and that some data are so dependent on the perspective from which they're interpreted or the method or stipulations of analysis that they can be used to support just about any viewpoint.

The authors sought to determine if a strategy of withholding prophylactic platelet transfusions for platelet counts below 10,000 in patients with hematologic malignancy was non-inferior to giving prophylactic platelet transfusions.  I like this idea, because I like "less is more" and I think the body is basically antifragile.  But non-inferior how?  And what do we mean by non-inferior in this trial?

Friday, March 5, 2010

Levo your Dopa at the Door - how study design influences our interpretation of reality

Another excellent critical care article was published this week in NEJM, the SOAP II study: http://content.nejm.org/cgi/content/short/362/9/779 . In this RCT of norepinephrine (norepi, levophed, or "levo" for short) versus dopamine ("dopa" for short) for the treatment of shock, the authors tried to resolve the longstanding uncertainty and debate surrounding the treatment of patients in various shock states. Proponents of any agent in this debate have often hung their hats on extrapolations of physiological and pharmacological principles to intact humans, leading to colloquialisms such as "leave-em-dead" for levophed and "renal-dose dopamine". This blog has previously emphasized the frailty of pathophysiological reasoning, the same reasoning which has irresistibly drawn cardiologists and nephrologists to dopamine because of its presumed beneficial effects on cardiac and urine output, and, by association, outcomes.

Hopefully all docs with a horse in this race will take note of the outcome of this study. In its simplest and most straightforward and technically correct interpretation, levo was not superior to dopa in terms of an effect on mortality, but was indeed superior in terms of side effects, particularly cardiac arrhythmias (a secondary endpoint). The direction of the mortality trend was in favor of levo, consistent with observational data (the SOAP I study by many of the same authors) showing reduced mortality with levo compared with dopa in the treatment of shock. As followers of this blog also know, the interpretation of "negative" studies (that is, MOST studies in critical care medicine - more on that in a future post) can be more challenging than the interpretation of positive studies, because "absence of evidence is not evidence of absence".

We could go to the statistical analysis section, and I could harp on the choice of delta, the decision to base it on a relative risk reduction, the failure to predict a baseline mortality, etc. (I will note that at least the authors defended their delta based on prior data, something that is a rarity - again, a future post will focus on this.) But, let's just be practical and examine the 95% CI of the mortality difference (the primary endpoint) and try to determine whether it contains or excludes any clinically meaningful values that may allow us to compare these two treatments. First, we have to go to the raw data and find the 95% CI of the ARR, because the Odds Ratio can inflate small differences as you know. That is, if the baseline is 1%, then a statistically significant increase in odds of 1.4 is not meaningful because it represents only a 0.4% increase in the outcome - miniscule. With Stata, we find that the ARR is 4.0%, with a 95% CI of -0.76% (favors dopamine) to +8.8% (favors levo). Wowza! Suppose we say that a 3% difference in mortality in either direction is our threshold for CLINICAL significance. This 95% CI includes a whole swath of values between 3% and 8.8% that are of interest to us and they are all in favor of levo. (Recall that perhaps the most lauded trial in critical care medicine, the ARDSnet ARMA study, reduced mortality by just 9%.) On the other side of the spectrum, the range of values in favor of dopa is quite narrow indeed - from 0% to -0.76%, all well below our threshold for clinical significance (that is, the minimal clinically important difference or MCID) of 3%. So indeed, this study surely seems to suggest that if we ever choose between these two widely available and commonly used agents, the cake goes to levo, hands down. I hardly need a statistically significant result with a 95% CI like this one!

So, then, why was the study deemed "negative"? There are a few reasons. Firstly, the trial is probably guilty of "delta inflation" whereby investigators seek a pre-specified delta that is larger than is realistic. While they used, ostensibly, 7%, the value found in the observational SOAP I study, they did not account for regression to the mean, or allow any buffer for the finding of a smaller difference. However, one can hardly blame them. Had they looked instead for 6%, and had the 4% trend continued for additional enrollees, 300 additional patients in each group (or about 1150 in each arm) would have been required and the final P-value would have still fallen short at 0.06. Only if they had sought a 5% delta, which would have DOUBLED the sample size to 1600 per arm, would they have achieved a statistically significant result with 4% ARR, with P=0.024. Such is the magnitude of the necessary increase in sample size as you seek smaller and smaller deltas.

Which brings me to the second issue. If delta inflation leads to negative studies, and logistical and financial constraints prohibit the enrollment of massive numbers of patients, what is an investigator to do? Sadly, the poor investigator wishing to publish in the NEJM or indeed any peer reviewed journal is hamstrung by conventions that few these days even really understand anymore: namely, the mandatory use of 0.05 for alpha and "doubly significant" power calculations for hypothesis testing. I will not comment more on the latter other than to say that interested readers can google this and find some interesting, if arcane, material. As regards the former, a few comments.

The choice of 0.05 for the type 1 error rate, that is, the probability that we will reject the null hypothesis based on the data and falsely conclude that one therapy is superior to the other; and the choice of 10-20% for the type 2 error rate (power 80-90%), that is the probability that the alternative hypothesis is really true and we will reject it based on the data; derive from the traditional assumption, which is itself an omission bias, that it is better in the name of safety to keep new agents out of practice by having a more stringent requirement for accepting efficacy than the requirement for rejecting it. This asymmetry is the design of trials is of dubious rationality from the outset (because it is an omission bias), but it is especially nettlesome when the trial is comparing two agents already in widespread use. As opposed to the trial of a new drug compared to placebo, where we want to set the hurdle high for declaring efficacy, especially when the drug might have side effects - with levo versus dopa, the real risk is that we'll continue to consider them to be equivalent choices when there is strong reason to favor one over the other based either on previous or current data. This is NOT a trial of treatment versus no treatment of shock, this trial assumes that you're going to treat the shock with SOMETHING. In a trial such as this one, one could make a strong argument that a P-value of 0.10 should be the threshold for statistical significance. In my mind it should have been.

But as long as the perspicacious consumer of the literature and reader of this blog takes P-values with a grain of salt and pays careful attention to the confidence intervals and the MCID (whatever that may be for the individual), s/he will not be misled by the deeply entrenched convention of alpha at 0.05, power at 90%, and delta wildly inflated to keep the editors and funding agencies mollified.

Monday, September 21, 2009

The unreliable assymmetric design of the RE-LY trial of Dabigatran: Heads I win, tails you lose



I'm growing weary of this. I hope it stops. We can adapt the diagram of non-inferiority shenanigans from the Gefitinib trial (see http://medicalevidence.blogspot.com/2009/09/theres-no-such-thing-as-free-lunch.html ) to last week's trial of dabigatran, which came on the scene of the NEJM with another ridiculously designed non-inferiority trial (see http://content.nejm.org/cgi/content/short/361/12/1139 ). Here we go again.

These jokers, lulled by the corporate siren song of Boehringer Ingelheim, had the utter unmitigated gall to declare a delta of 1.46 (relative risk) as the margin of non-inferiority! Unbelievable! To say that a 46% difference in the rate of stroke or arterial clot is clinically non-significant! Seriously!?

They justified this felonious choice on the basis of trials comparing warfarin to PLACEBO as analyzed in a 10-year-old meta-analysis. It is obvious (or should be to the sentient) that an ex-post difference between a therapy and placebo in superiority trials does not apply to non-inferiority trials of two active agents. Any ex-post finding could be simply fortuitously large and may have nothing to do with the MCID (minimal clinically important difference) that is SUPPOSED to guide the choice of delta in a non-inferiority trial (NIT). That warfarin SMOKED placebo in terms of stroke prevention does NOT mean that something that does not SMOKE warfarin is non-inferior to warfarin. This kind of duplicitious justification is surely not what the CONSORT authors had in mind when they recommended a referenced justification for delta.

That aside, on to the study and the figure. First, we're testing two doses, so there are multiple comparisons, but we'll let that slide for our purposes. Look at the point estimate and 95% CI for the 110 mg dose in the figure (let's bracket the fact that they used one-sided 97.5% CIs - it's immaterial to this discussion). There is a non-statistically significant difference between dabigatran and warfarin for this dose, with a P-value of 0.34. But note that in Table 2 of the article, they declare that the P-value for "non-inferiority" is <0.001 [I've never even seen this done before, and I will have to look to see if we can find a precedent for reporting a P-value for "non-inferiority"]. Well, apparently this just means that the RR point estimate for 110 mg versus warfarin is statistically significantly different from a RR of 1.46. It does NOT mean, but it is misleadingly suggested that the comparison between the two drugs on stroke and arterial clot is highly clinically significant, but it is not. This "P-value for non-inferiority" is just an artifical comparison: had we set the margin of non-inferiority at a [even more ridiculously "P-value for non-inferiority" as small as we like by just inflating the margin of non-inferiority! So this is a useless number, unless your goal is to create an artificial and exaggerated impression of the difference between these two agents.

Now let's look at the 150 mg dose. Indeed, it is statistically significantly different than warfarin (I shall resist using the term "superior" here), and thus the authors claim superiority. But here again, the 95% CI is narrower than the margin of non-inferiority, and had the results gone the other direction, as in Scenarios 3 and 4, (in favor of warfarin), we would have still claimed non-inferiority, even though warfarin would have been statistically significantly "better than" dabigatran! So it is unfair to claim superiority on the basis of a statistically significant result favoring dabigatran, but that's what they do. This is the problem that is likely to crop up when you make your margin of non-inferiority excessively wide, which you are wont to do if you wish to stack the deck in favor of your therapy.

But here's the real rub. Imagine if the world were the mirror image of what it is now and dabigatran were the existing agent for prevention of stroke in A-fib, and warfarin were the new kid on the block. If the makers of warfarin had designed this trial AND GOTTEN THE EXACT SAME DATA, they would have said (look at the left of the figure and the dashed red line there) that warfarin is non-inferior to the 110 mg dose of dabigatran, but that it was not non-inferior to the 150 mg dose of dabigatran. They would NOT have claimed that dabigatran was superior to warfarin, nor that warfarin was inferior to dabigatran, because the 95% CI of the difference between warfarin and dabigatran 150 mg crosses the pre-specified margin of non-inferiority. And to claim superiority of dabigatran, the 95% CI of the difference would have to fall all the way to the left of the dashed red line on the left. (See Piaggio, JAMA, 2006.)

The claims that result from a given dataset should not depend on who designs the trial, and which way the asymmetry of interpretation goes. But as long as we allow asymmetry in the interpretation of data, they shall. Heads they win, tails we lose.

Sunday, September 6, 2009

There's no such thing as a free lunch - unless you're running a non-inferiority trial. Gefitinib for pulmonary adenocarcinoma


A 20% difference in some outcome is either clinically relevant, or it is not. If A is worse than B by 19% and that's NOT clinically relevant and significant, then A being better than B by 19% must also NOT be clinically relevant and significant. But that is not how the authors of trials such as this one see it: http://content.nejm.org/cgi/content/short/361/10/947 . According to Mok and co-conspirators, if gefitinib is no worse in regard to progression free survival than Carboplatin-Paclitaxel based on a 95% confidence interval that does not include 20% (that is, it may be up to 19.9% worse, but not more worse), then they call the battle a draw and say that the two competitors are equally efficacious. However, if the trend is in the other direction, that is, in favor of gefitinib BY ANY AMOUNT HOWEVER SMALL (as long as it's statistically significant), they declare gefinitib the victor and call it a day. It is only because of widespread lack of familiarity with non-inferiority methods that they can get away with a free lunch like this. A 19% difference is either significant, or it is not. I have commented on this before, and it should come as no surprise that these trials are usually used to test proprietary agents (http://content.nejm.org/cgi/content/extract/357/13/1347 ). Note also that in trials of adult critical illness, the most commonly sought mortality benefit is about 10% (more data on this forthcoming in a article soon to be submitted and hopefully published). So it's a difficult argument to subtend to say that something is "non-inferior" if it is less than 20% worse than something else. Designers of critical care trials will tell you that a 10% difference, often much less, is clinically significant.

I have created a figure to demonstrate the important nuances of non-inferiority trials using the gefitinib trial as an example. (I have adapted this from the Piaggio 2006 JAMA article of the CONSORT statement for the reporting of non-inferiority trials - a statement that has been largely ignored: http://jama.ama-assn.org/cgi/content/abstract/295/10/1152?lookupType=volpage&vol=295&fp=1152&view=short .) The authors specified delta, or the margin of non-inferiority, to be 20%. I have already made it clear that I don't buy this, but we needn't challenge this value to make war with their conclusions, although challenging it is certainly worthwhile, even if it is not my current focus. This 20% delta corresponds to a hazard ratio of 1.2, as seen in the figure demarcated by a dashed red line on the right. If the hazard ratio (for progression or death) demonstrated by the data in the trial were 1.2, that would mean that gefitinib is 20% worse than comparator. The purpose of a non-inferiority trial is to EXCLUDE a difference as large as delta, the pre-specified margin of non-inferiority. So, to demonstrate non-inferiority, the authors must show that the 95% confidence interval for the hazard ratio falls all the way to the left of that dashed red line at HR of 1.2 on the right. They certainly achieved this goal. Their data, represented by the lowermost point estimate and 95% CI, falls entirely to the left of the pre-specified margin of non-inferiority (the right red dashed line). I have no arguments with this. Accepting ANY margin of non-inferiority (delta), gefitinib is non-inferior to the comparator. What I take exception to is the conclusion that gefitinib is SUPERIOR to comparator, a conclusion that is predicated in part on the chosen delta, to which we are beholden as we make such conclusions.

First, let's look at [hypothetical] Scenario 1. Because the chosen delta was 20% wide (and that's pretty wide - coincidentally, that's the exact width of the confidence interval of the observed data), it is entirely possible that the point estimate could have fallen as pictured for Scenario 1 with the entire CI between an HR of 1 and 1.2, the pre-specified margin of non-inferiority. This creates the highly uncomfortable situation in which the criterion for non-inferiority is fulfilled, AND the comparator is statistically significantly better than gefitinib!!! This could have happened! And it's more likely to happen the larger you make delta. The lesson here is that the wider you make delta, the more dubious your conclusions are. Deltas of 20% in a non-inferiority trial are ludicrous.

Now let's look at Scenarios 2 and 3. In these hypothetical scenarios, comparator is again statistically significantly better than gefitinib, but now we cannot claim non-inferiority because the upper CI falls to the right of delta (red dashed line on the right). But because our 95% confidence interval includes values of HR less than 1.2 and our delta of 20% implies (or rather states) that we consider differences of less than 20% to be clinically irrelevant, we cannot technically claim superiority of comparator over gefitinib either. The result is dubious. While there is a statistically significant difference in the point estimate, the 95% CI contains clinically irrelevant values and we are left in limbo, groping for a situation like Scenario 4, in which comparator is clearly superior to gefitinib, and the 95% CI lies all the way to the right of the HR of 1.2.

Pretend you're in Organic Chemistry again, and visualize the mirror image (enantiomer) of scenario 4. That is what is required to show superiority of gefitinib over comparator - a point estimate for the HR whose 95% CI does not include delta or -20%, an HR of 0.8. The actual results come close to Scenario 5, but not quite, and therefore, the authors are NOT justified in claiming superiority. To do so is to try to have a free lunch, to have their cake and eat it too.

You see, the larger you make delta, the easier it is to achieve non-inferiority. But the more likely it is also that you might find a statistically significant difference favoring comparator rather than the preferred drug which creates a serious conundrum and paradox for you. At the very least, if you're going to make delta large, you should be bound by your honor and your allegiance to logic and science to make damned sure that to claim superiority, your 95% confidence interval must not include negative delta. If not, shame on you. Eat your free lunch if you will, but know that the ireful brow of logic and reason is bent unfavorably upon you.


Monday, February 9, 2009

More Data on Dexmedetomidine - moving in the direction of a new standard

A follow-up study of dexmedetomidine (see previous blog: http://medicalevidence.blogspot.com/2007/12/dexmedetomidine-new-standard-in_16.html )
was published in last week's JAMA (http://jama.ama-assn.org/cgi/content/abstract/301/5/489 ) and hopefully serves as a prelude to future studies of this agent and indeed all studies in critical care. The recent study addresses one of my biggest concerns of the previous one, namely that routine interruptions of sedatives were not employed.

Ironically, it may be this difference between the studies that led to the failure to show a difference in the primary endpoint in the current study. The primary endpoint, namely the percentage of time within the target RASS, was presumably chosen not only on the basis of its pragmatic utility, but also because it was one of the most statistically significant differences found among secondary analyses in the previous study (percent of patients with a RASS [Richmond Agitation and Sedation Scale] score within one point of the physician goal; 67% versus 55%, p=0.008). It is possible, and I reason likely, that daily interruptions in the current study obliterated that difference which was found in the previous study.


But that failure does not undermine the usefulness of the current study which showed that sedation comparable to routinely used benzos can be achieved with dexmed, probably with less delirium, and perhaps with shorter time on the ventilator and fewer infections. What I would like to see now, and what is probably in the works, is a study of dexmed which shows shorter time on the ventilator and/or reductions in nosocomial infections as primary study endpoints.

But to show endpoints such as these, we are going to need to carefully standardize our ascertainment of infections (difficult to say the least) and also to standardize our approach to discontinuation of mechanical ventilation. In regard to the latter, I propose that we challenge some of our current assumptions about liberation from mechanical ventilation - namely, that a patient must be fully awake and following commands prior to extubation. I think that a status quo bias is at work here. We have many a patient with delirium in the ICU who is not already intubated and we do not intubate them for delirium alone. Why, then, should we fail to extubate a patient in whom all indicators show reaolution of critical illness, but who remains delirious? Is it possible that this is the main player in the causal pathway between sedation and extubation and perhaps even nosocomial infections and mortality? (The protocols or lack thereof for assessing extubation readiness were not described in the current study, unless I missed them.) It would certainly be interesting and perhaps mandatory to know the extubation practices in the centers involved in this study, especially if we are going to take great stock in this secondary outcome of this study.

Another thing I am interested in knowing is what PATIENT experiences are like in each group - whether there is greater recall or other differences in psychological outcomes between patients who receive different sedatives during their ICU experience.

I hope this study and others like it serve as a wake-up call to the critical care research community which has heretofore been brainwashed into thinking that a therapy is only worthwhile if it improves mortality, a feat that is difficult to achieve not only because it is often unrealistic and because absurd power calculations and delta inflation run rampant in trial design, but because of limitations in funding and logistical difficulties. This group has shown us repeatedly that useful therapies in critical care need not be predicated upon a mortality reduction. It's past time to start buying some stock in shorter times on the blower and in the ICU.