"When in doubt, cut it out" is one simplified heuristic (rule of thumb) of surgery. Extension (via inductive thinking) of the observation that removing a necrotic gallbladder or correcting some other anatomic aberration causes improvement in patient outcomes to other situations has misled us before. It is simply not always that simple. While it makes sense that arthroscopic removal of scar tissue in an osteoarthritic knee will improve patients' symptoms, alas, some investigators had the courage to challenge that assumption, and reported in 2002 that when compared to sham surgery, knee arthroscopy did not benefit patients. (See http://content.nejm.org/cgi/content/abstract/347/2/81.)
In a beautiful extension of that line of critical thinking, two groups of investigators in last week's NEJM challenged the widely and ardently held assumption that vertebroplasty improves patient pain and symptom scores. (See http://content.nejm.org/cgi/content/abstract/361/6/557 ; and http://content.nejm.org/cgi/content/abstract/361/6/569 .) These two similar studies compared vertebroplasty to a sham procedure (control group) in order to control for the powerful placebo effect that accounts for part of the benefit of many medical and surgical interventions, and which is almost assuredly responsible for the reported and observed benefits of such "alternative and complementary medicines" as accupuncture.
There is no difference. In these adequately powered trials (80% power to detect a 2.5 and a 1.5 point difference on the pain scales respectively), the 95% confidence intervals for delta (the difference between the groups in pain scores) were -0.7 to +1.8 at 3 months in the first study and -0.3 to + 1.7 at 1 month in the second study. Given that the minimal clinically important difference in the pain score is considered to be 1.5 points, these two studies all but rule out a clinically significant difference between the procedure and sham. They also show that there is no statistically significant difference between the two, but the former is more important to us as clinicians given that the study is negative. And this is exactly how we should approach a negative study: by asking "does the 95% confidence interval for the observed delta include a clinically important difference?" If it does not, we can be reasonably assured that the study was adequately powered to answer the question that we as practitioners are most interested in. If it does include such a value, we must assume that for us given our judgment of clinical value, the study is not helpful and essentially underpowered. Note also that by looking at delta this way, we can determine the statistical precision (power) of the study - powerful studies will result in narrow(er) confidence intervals, and underpowered studies will result in wide(r) ones.
These results reinforce the importance of the placebo effect in medical care, and the limitations of inductive thinking in determining the efficacy of a therapy. We must be careful - things that "make sense" do not always work.
But there is a twist of irony in this saga, and something a bit concerning about this whole approach to determining the truth using studies such as these with impeccable internal validity: they lead beguillingly to the message that because the therapy is not beneficial compared to sham that it is of no use. But, very unfortunately and very importantly, that is not a clinically relevant question because we will not now adopt sham procedures as an alternative to vertebroplasty! These data will either be ignored by the true-believers of vertebroplasty, or touted by devotees of evidence based medicine as confimation that "vertebroplasty doesn't work". If we fall in the latter camp, we will give patients medical therapy that, I wager, will not have as strong a placebo effect as surgery. And thus, an immaculately conceived study such as this becomes its own bugaboo, because in achieving unassailable internal validity, it estranges its relevance to clinical practice insomuch as the placebo effect is powerful and useful and desireable. What a shame, and what a quandry from which there is no obvious escape.
If I were a patient with such a fracture (and ironically I have indeed suffered 2 vertebral fractures [oh, the pain!]), I would try to talk my surgeon into performing a sham procedure (to avoid the costs and potential side effects of the cement).....but then I would know, and would the "placebo" really work?
This is discussion forum for physicians, researchers, and other healthcare professionals interested in the epistemology of medical knowledge, the limitations of the evidence, how clinical trials evidence is generated, disseminated, and incorporated into clinical practice, how the evidence should optimally be incorporated into practice, and what the value of the evidence is to science, individual patients, and society.
Showing posts with label sham. Show all posts
Showing posts with label sham. Show all posts
Tuesday, August 11, 2009
Thursday, July 9, 2009
No Sham Needed in Sham Trials: Polymyxin B Hemoperfusion in Abdominal Septic Shock (Alternative Title: How Meddling Ethicists Ruin Everything)
This a superlative article to jab at to demonstrate some interesting points about randomized controlled trials that have more basis in hope than reason and whose very design threatens to invalidate their findings: http://jama.ama-assn.org/cgi/content/abstract/301/23/2445?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&fulltext=polymyxin&searchid=1&FIRSTINDEX=0&resourcetype=HWCIT . Because endotoxin has an important role in the pathogenesis of gram-negative sepsis, there has been interest in interfering with it or removing it in the hopes of abating the untoward effects of the sepsis inflammatory cascade. Learning from previous experiences/studies (e.g., http://content.nejm.org/cgi/content/abstract/324/7/429 ) that taking a poorly defined and heterogenous illness (namely sepsis) and using therapy that is expected to work in only a subset of patients with the illness (gram-negative source), the authors chose to study abdominal sepsis because they expected that the majority of patients will have gram-negatives as a causative or contributory source of infection. They randomized such patients to receive standard care (not well defined) or the insertion of a dialysis catheter with subsequent hemoperfusion over a Polymyxin B impregnated surface because this agent is known to adsorb endotoxin. The basic biological hypothesis is that removing the endotoxin in this fashion will cause amelioration of the untoward effects of the sepsis inflammatory cascade in such a way as to improve blood pressure, other phyisological parameters, and hopefully, mortality as well. There is reason to begin one's reading of this report with robust skepticism. The history of modern molecular medicine, for well over 25 years, has been polluted with the vast detritus of innumerable failed sepsis trials founded on hypotheses related to modulation of the sepsis cascade. During this period, only one agent has been shown to be efficacious, and even its efficacy remains highly doubtful to perhaps the majority of intensivists (myself excluded; see: http://content.nejm.org/cgi/content/abstract/344/10/699 ).
Mortality was not the primary endpoint in this trial, but rather was used for the early stopping rule. Even though I am currently writing an article suggesting that mortality may not be a good endpoint for trials of critical illness, this trial reminds me why the critical care community has selected this endpoint as the bona fide gold standard. Who cares if this invasive therapy increases your MAP from the already acceptable level of ~77mmHg to the supertarget level of 86? Who cares if it reduces your pressor requirements? Why would a patient, upon awakening from critical illness, thank his doctors for inserting a large dialysis catheter in him to keep his BP a little higher than it otherwise would have been? Why would he rather have a giant hole in his neck (or worse - GROIN!) than a little more levophed? If it doesn't save your life or make your life better when you recover, why do you care? We desperately need to begin to study concepts such as "return to full functionality at three (or six) months" or "recovery without persistent organ failures at x,y,z months". (This latter term I would define as not needing ongoing therapy for the support of any lingering organ failure after critical illness [that did not exist in the premorbid state], such as oxygen therapy, tracheostomy, dialysis, etc.). Should I be counted as a "save" if my existence after the interventions of the "saviors" is constituted by residence in a nursing home dependent on others for my care with waxing and waning lucidity? What does society think about these questions? We should begin to ask.
And we segue to the stopping issue which I find especially intriguing. Basing the stopping rule on a mortality difference seems to validate my points above, namely that the primary endpoint (MAP) is basically a worthless one - if it were not, or if it were not trumped by mortality, why would we not base stopping of the trial on MAP? (And if this is a Phase II or pilot trial, it should be named accordingly, methinks.) This small trial was stopped on the basis of a mortality difference significant at P=0.026 with the stopping boundary at P<0.029. I will point out again on this blog for those not familiar with it this pivotal article warning of the hazards of early stopping rules (http://jama.ama-assn.org/cgi/content/abstract/294/17/2203 ). But here's the real rub. When they got these results at the first and only planned interim analysis, (deep breath), they consulted with an ethicist. The ethicist said that it is unethical to continue the trial because to do so would be to deny this presumably effective therapy to the control group. But does ANYONE in his or her right state of mind agree that this therapy is effective on the basis of these data? And if these data are not conclusive, does not that condemn future participants in a future trial to the same unfair treatment, namely randomization to placebo? Does not stopping the trial early just shift the burden to other people? It does worse. It invalidates to large degree the altruistic motives of the participants (or their surrogates) in the current trial because stopping it early invalidated it scientifically (per the above referenced article) and because stopping it early necessitates the performance of yet another larger trial where participants will be randomized to placebo, and which, it is fair to suspect, will demonstrate this therapy to be useless, which is tantamount to harmful in the net because of the risk of catheters and wasted resources in performing yet another trial. Likewise, if we assume that this therapy IS beneficial, stopping it has reduced NET utility to current participants, because now NOBODY is receiving the therapy. So, from a consequentialist or utilitarian standpoint, overall utility is reduced and net harm has resulted from stopping the trial. What if the investigators of this trial had made it more scientifically valid from the outset by using a sham hemoperfusion device (an approach that itself would have caused an ethical maelstrom)? And what if the sham group proved superior in terms of mortality - would the ethicists have argued for stopping the trial because continuing it would mean depriving patients of sham therapy? Would there have been a call for providing sham therapy to all patients with surgically intervened abdominal sepsis? I write this with my tongue in my cheek, but the ludicrousness of it does seem to drive home the point that the premature stopping of this trial is neither ethically clear-cut nor obligatory, and that from a utilitarian standpoint, net negative utility (for society and for participants - for everyone!) has resulted from this move. And that segues me to the issue of sham procedures. It is abundantly obvious that patients with a dialysis catheter inserted for this trial (probably put in by an investigator, but not stated in the manuscript) will be likely to receive more vigilant care. This is the whole reason that protocols were developed in critical care research, as a result of the early ECMO trials (Morris et al 1994) where it was recognized that you would have all sorts of confounding by the inability to blind treating physicians in such a study. While it is not feasible to blind an ECMO study, the investigators of this study do little to convince us that blinding was not possible and feasible, and they make light of the differences in care that may have resulted from lack of blinding. Moreover, they do not report on the use of protocols for patient care that may/could have minimized the impact of lack of blinding, and in a GLARING omission, they do not describe fluid balance in these patients, a highly discretionary aspect of care that clearly could have influenced the primary outcome and which could have been differential between groups because of the lack of blinding and sham procedures. Unbelievable! (As an afterthought, even the mere increased stimulation [tactile, auditory, or visual] of patients in the intervention group, by more nursing presence or physician presence in the room may have led to increases in blood pressure.) There are also some smaller points, such as the fact that by my count 10 patients (not accounting for multiple organisms) in the intervention group had gram positive or fungal infections making it difficult to imagine how the therapy could have influenced these patients. What if patients without gram-negative organisms isolated are excluded from the analysis? Does the effect persist? What is the p-value for mortality then? And that point segues me to a final point - if our biologically plausible hypothesis is that reducing endotoxin levels with this therapy leads to improvements in parameters of interest, why, for the love of God, did we not measure and report endotoxin levels and perform secondary analyses of the effect of the therapy as a function of endotoxin levels and also report data on whether these levels were reduced by the therapy, thus supporting the most fundamental assumption of the biological hypothesis upon which the entire study is predicated?
Mortality was not the primary endpoint in this trial, but rather was used for the early stopping rule. Even though I am currently writing an article suggesting that mortality may not be a good endpoint for trials of critical illness, this trial reminds me why the critical care community has selected this endpoint as the bona fide gold standard. Who cares if this invasive therapy increases your MAP from the already acceptable level of ~77mmHg to the supertarget level of 86? Who cares if it reduces your pressor requirements? Why would a patient, upon awakening from critical illness, thank his doctors for inserting a large dialysis catheter in him to keep his BP a little higher than it otherwise would have been? Why would he rather have a giant hole in his neck (or worse - GROIN!) than a little more levophed? If it doesn't save your life or make your life better when you recover, why do you care? We desperately need to begin to study concepts such as "return to full functionality at three (or six) months" or "recovery without persistent organ failures at x,y,z months". (This latter term I would define as not needing ongoing therapy for the support of any lingering organ failure after critical illness [that did not exist in the premorbid state], such as oxygen therapy, tracheostomy, dialysis, etc.). Should I be counted as a "save" if my existence after the interventions of the "saviors" is constituted by residence in a nursing home dependent on others for my care with waxing and waning lucidity? What does society think about these questions? We should begin to ask.
And we segue to the stopping issue which I find especially intriguing. Basing the stopping rule on a mortality difference seems to validate my points above, namely that the primary endpoint (MAP) is basically a worthless one - if it were not, or if it were not trumped by mortality, why would we not base stopping of the trial on MAP? (And if this is a Phase II or pilot trial, it should be named accordingly, methinks.) This small trial was stopped on the basis of a mortality difference significant at P=0.026 with the stopping boundary at P<0.029. I will point out again on this blog for those not familiar with it this pivotal article warning of the hazards of early stopping rules (http://jama.ama-assn.org/cgi/content/abstract/294/17/2203 ). But here's the real rub. When they got these results at the first and only planned interim analysis, (deep breath), they consulted with an ethicist. The ethicist said that it is unethical to continue the trial because to do so would be to deny this presumably effective therapy to the control group. But does ANYONE in his or her right state of mind agree that this therapy is effective on the basis of these data? And if these data are not conclusive, does not that condemn future participants in a future trial to the same unfair treatment, namely randomization to placebo? Does not stopping the trial early just shift the burden to other people? It does worse. It invalidates to large degree the altruistic motives of the participants (or their surrogates) in the current trial because stopping it early invalidated it scientifically (per the above referenced article) and because stopping it early necessitates the performance of yet another larger trial where participants will be randomized to placebo, and which, it is fair to suspect, will demonstrate this therapy to be useless, which is tantamount to harmful in the net because of the risk of catheters and wasted resources in performing yet another trial. Likewise, if we assume that this therapy IS beneficial, stopping it has reduced NET utility to current participants, because now NOBODY is receiving the therapy. So, from a consequentialist or utilitarian standpoint, overall utility is reduced and net harm has resulted from stopping the trial. What if the investigators of this trial had made it more scientifically valid from the outset by using a sham hemoperfusion device (an approach that itself would have caused an ethical maelstrom)? And what if the sham group proved superior in terms of mortality - would the ethicists have argued for stopping the trial because continuing it would mean depriving patients of sham therapy? Would there have been a call for providing sham therapy to all patients with surgically intervened abdominal sepsis? I write this with my tongue in my cheek, but the ludicrousness of it does seem to drive home the point that the premature stopping of this trial is neither ethically clear-cut nor obligatory, and that from a utilitarian standpoint, net negative utility (for society and for participants - for everyone!) has resulted from this move. And that segues me to the issue of sham procedures. It is abundantly obvious that patients with a dialysis catheter inserted for this trial (probably put in by an investigator, but not stated in the manuscript) will be likely to receive more vigilant care. This is the whole reason that protocols were developed in critical care research, as a result of the early ECMO trials (Morris et al 1994) where it was recognized that you would have all sorts of confounding by the inability to blind treating physicians in such a study. While it is not feasible to blind an ECMO study, the investigators of this study do little to convince us that blinding was not possible and feasible, and they make light of the differences in care that may have resulted from lack of blinding. Moreover, they do not report on the use of protocols for patient care that may/could have minimized the impact of lack of blinding, and in a GLARING omission, they do not describe fluid balance in these patients, a highly discretionary aspect of care that clearly could have influenced the primary outcome and which could have been differential between groups because of the lack of blinding and sham procedures. Unbelievable! (As an afterthought, even the mere increased stimulation [tactile, auditory, or visual] of patients in the intervention group, by more nursing presence or physician presence in the room may have led to increases in blood pressure.) There are also some smaller points, such as the fact that by my count 10 patients (not accounting for multiple organisms) in the intervention group had gram positive or fungal infections making it difficult to imagine how the therapy could have influenced these patients. What if patients without gram-negative organisms isolated are excluded from the analysis? Does the effect persist? What is the p-value for mortality then? And that point segues me to a final point - if our biologically plausible hypothesis is that reducing endotoxin levels with this therapy leads to improvements in parameters of interest, why, for the love of God, did we not measure and report endotoxin levels and perform secondary analyses of the effect of the therapy as a function of endotoxin levels and also report data on whether these levels were reduced by the therapy, thus supporting the most fundamental assumption of the biological hypothesis upon which the entire study is predicated?
Monday, August 20, 2007
Prophylactic Cranial Irradiation: a matter of blinding, ascertainment, side effects, and preferences
Slotman et al (August 16 issue of NEJM: http://content.nejm.org/cgi/content/short/357/7/664) report a multicenter RCT of prophylactic cranial irradiation for extensive small cell carcinoma of the lung and conclude that it not only reduces symptomatic brain metastases, but also prolongs progression-free and overall survival. This is a well designed and conducted non-industry-sponsored RCT, but several aspects of the trial warrant scrutiny and temper my enthusiasm for this therapy. Among them:
The trial is not blinded (masked is a more sensitive term) from a patient perspective and no effort was made to create a sham irradiation procedure. While unintentional unmasking due to side effects may have limited the effectiveness of a sham procedure, it may not have rendered it entirely ineffective. This issue is of importance because meeting the primary endpoint was contingent on patient symptoms, and a placebo effect may have impacted participants’ reporting of symptoms. Some investigators have gone to great lengths to tease out placebo effects using sham procedures, and the results have been surprising (e.g., knee arthroscopy; see: https://content.nejm.org/cgi/content/abstract/347/2/81?ck=nck).
We are not told if investigators, the patient’s other physicians, radiologists, and statisticians were masked to the treatment assignment. Lack of masking may have led to other differences in patient management, or to differences in the threshold for ordering CT/MRI scans. We are not told about the number of CT/MRI scans in each group. In a nutshell: possible ascertainment bias (see http://www.consort-statement.org/?o=1123).
There are several apparently strong trends in QOL assessments, but we are not told what direction they are in. Significant differences in these scores were unlikely to be found as the deck was stacked when the trial was designed: p<0.01 was required for significance of QOL assessments. While this is justified because of multiple comparisons, it seems unfair to make the significance level for side effects more conservative than that for the primary outcome of interest (think Vioxx here). The significance level required for secondary endpoints (progression-free and overall survival) was not lowered to account for multiple comparisons. Moreover, more than half of QOL assessments were missing by 9 months, so this study is underpowered to detect differences in QOL. It is therefore all the more important to know the direction of the trends that are reported.
The authors appear to “gloss over” the significant side effects associated with this therapy. It made some subjects ill.
If we are willing to accept that overall survival is improved by this therapy (I’m personally circumspect about this for the above reasons) the bottom line for patients will be whether they would prefer on average 5 additional weeks of life with nausea, vomiting weight loss, fatigue, anorexia, and leg weakness to 5 fewer weeks of life without these symptoms. I think I know what choice many will make, and our projection bias may lead us to make inaccurate predictions of their choices (see Lowenstein, Medical Decision Making, Jan/Feb 2005: http://mdm.sagepub.com/cgi/content/citation/25/1/96).
The authors state in the concluding paragraph:
“Prophylactic cranial irradiation should be part of standard care for all patients with small-cell lung cancer who have a response to initial chemotherapy, and it should be part of the standard treatment in future studies involving these patients.”
I think the decision to use this therapy is one that only patients are justified making. At least now we have reasonably good data to help them inform their choice.
The trial is not blinded (masked is a more sensitive term) from a patient perspective and no effort was made to create a sham irradiation procedure. While unintentional unmasking due to side effects may have limited the effectiveness of a sham procedure, it may not have rendered it entirely ineffective. This issue is of importance because meeting the primary endpoint was contingent on patient symptoms, and a placebo effect may have impacted participants’ reporting of symptoms. Some investigators have gone to great lengths to tease out placebo effects using sham procedures, and the results have been surprising (e.g., knee arthroscopy; see: https://content.nejm.org/cgi/content/abstract/347/2/81?ck=nck).
We are not told if investigators, the patient’s other physicians, radiologists, and statisticians were masked to the treatment assignment. Lack of masking may have led to other differences in patient management, or to differences in the threshold for ordering CT/MRI scans. We are not told about the number of CT/MRI scans in each group. In a nutshell: possible ascertainment bias (see http://www.consort-statement.org/?o=1123).
There are several apparently strong trends in QOL assessments, but we are not told what direction they are in. Significant differences in these scores were unlikely to be found as the deck was stacked when the trial was designed: p<0.01 was required for significance of QOL assessments. While this is justified because of multiple comparisons, it seems unfair to make the significance level for side effects more conservative than that for the primary outcome of interest (think Vioxx here). The significance level required for secondary endpoints (progression-free and overall survival) was not lowered to account for multiple comparisons. Moreover, more than half of QOL assessments were missing by 9 months, so this study is underpowered to detect differences in QOL. It is therefore all the more important to know the direction of the trends that are reported.
The authors appear to “gloss over” the significant side effects associated with this therapy. It made some subjects ill.
If we are willing to accept that overall survival is improved by this therapy (I’m personally circumspect about this for the above reasons) the bottom line for patients will be whether they would prefer on average 5 additional weeks of life with nausea, vomiting weight loss, fatigue, anorexia, and leg weakness to 5 fewer weeks of life without these symptoms. I think I know what choice many will make, and our projection bias may lead us to make inaccurate predictions of their choices (see Lowenstein, Medical Decision Making, Jan/Feb 2005: http://mdm.sagepub.com/cgi/content/citation/25/1/96).
The authors state in the concluding paragraph:
“Prophylactic cranial irradiation should be part of standard care for all patients with small-cell lung cancer who have a response to initial chemotherapy, and it should be part of the standard treatment in future studies involving these patients.”
I think the decision to use this therapy is one that only patients are justified making. At least now we have reasonably good data to help them inform their choice.
Subscribe to:
Posts (Atom)