Showing posts with label delta. Show all posts
Showing posts with label delta. Show all posts

Thursday, December 5, 2019

Noninferiority Trials of Reduced Intensity Therapies: SCORAD trial of Radiotherapy for Spinal Metastases


No mets here just my PTX
A trial in JAMA this week by Hoskin et al (the SCORAD trial) compared two different intensities of radiotherapy for spinal metastases.  This is a special kind of noninferiority trial, which we wrote about last year in BMJ Open.  When you compare the same therapy at two intensities using a noninferiority trail, you are in perilous territory.  This is because if the therapy works on a dose response curve, it is almost certain, a priori, that the lower dose is actually inferior - if you consider inferior to represent any statistically significant difference disfavoring a therapy.  (We discuss this position, which goes against the CONSORT grain, here.)  You only need a big enough sample size.  This may be OK, so long as you have what we call in the BMJ Open paper, "a suitably conservative margin of noninferiority."  Most margins of noninferiority (delta) are far from this standard.

The results of the SCORAD trial were consistent with our analysis of 30+ other noninferiority trials of reduced intensity therapies, and the point estimate favored - you guessed it - the more intensive radiotherapy.  This is fine.  It is also fine that the 1-sided 95% confidence interval crossed the 11% prespecified margin of noninferiority (P=0.06).  That just means you can't declare noninferiority.  What is not fine, in my opinion, is that the authors suggest that we look at how little overlap there was, basically an insinuation that we should consider it noninferior anyway.  I crafted a succinct missive to point this out to the editors, but alas I'm too busy to submit it and don't feel like bothering, so I'll post it here for those who like to think about these issues.

To the editor:  Hoskin et al report results of a noninferiority trial comparing two intensities of radiotherapy (single fraction versus multi-fraction) for spinal cord compression from metastatic cancer (the SCORAD trial)1.  In the most common type of noninferiority trial, investigators endeavor to show that a novel agent is not worse than an established one by more than a prespecified margin.  To maximize the chances of this, they generally choose the highest tolerable dose of the novel agent.  Similarly, guidelines admonish against underdosing the active control comparator as this will increase the chances of a false declaration of noninferiority of the novel agent2,3.  In the SCORAD trial, the goal was to determine if a lower dose of radiotherapy was noninferior to a higher dose. Assuming radiotherapy is efficacious and operates on a dose response curve, the true difference between the two trial arms is likely to favor the higher intensity multi-fraction regimen.  Consequently, there is an increased risk of falsely declaring noninferiority of single fraction radiotherapy4.  Therefore, we agree with the authors’ concluding statement that “the extent to which the lower bound of the CI overlapped with the noninferiority margin should be considered when interpreting the clinical importance of this finding.”  The lower bound of a two-sized 95% confidence interval (the trial used a 1-sided 95% confidence interval) extends to 13.1% in favor of multi-fraction radiotherapy.  Because the outcome of the trial was ambulatory status, and there were no differences in serious adverse events, our interpretation is that single fraction radiotherapy should not be considered noninferior to a multi-fraction regimen, without qualifications.

1.            Hoskin PJ, Hopkins K, Misra V, et al. Effect of Single-Fraction vs Multifraction Radiotherapy on Ambulatory Status Among Patients With Spinal Canal Compression From Metastatic Cancer: The SCORAD Randomized Clinical Trial. JAMA. 2019;322(21):2084-2094.
2.            Piaggio G, Elbourne DR, Pocock SJ, Evans SW, Altman DG, f CG. Reporting of noninferiority and equivalence randomized trials: Extension of the consort 2010 statement. JAMA. 2012;308(24):2594-2604.
3.            Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ. 1996;313(7048):36-39.
4.            Aberegg SK, Hersh AM, Samore MH. Do non-inferiority trials of reduced intensity therapies show reduced effects? A descriptive analysis. BMJ open. 2018;8(3):e019494-e019494.

Thursday, September 8, 2016

Hiding the Evidence in Plain Sight: One-sided Confidence Intervals and Noninferiority Trials

In the last post, I linked a video podcast of my explaining non-inferiority trials and their inherent biases.  In this videocast, I revisit noninferiority trials and the use of one-sided confidence intervals.  I review the Salminen et al noninferiority trial of antibiotics versus appendectomy for the treatment of acute appendicitis in adults.  This trial uses a very large delta of 24%.  The criteria for non-inferiority were not met even with this promiscuous delta.  But the use of a 1-sided 95% confidence interval concealed a more damning revelation in the data.  Watch the 13 minute videocast to learn what was hidden in plain sight!

Erratum:  at 1:36 I say "excludes an absolute risk difference of 1" and I meant to say "excludes an absolute risk difference of ZERO."  Similarly, at 1:42 I say "you can declare non-inferiority".  Well, that's true, you can declare noninferiority if your entire 95% confidence interval falls to the left of an ARD of 0 or a HR of 1, but what I meant to say is that if that is the case "you can declare superiority."

Also, at 7:29, I struggle to remember the numbers (woe is my memory!) and I place the point estimate of the difference, 0.27, to the right of the delta dashed line at .24.  This was a mistake which I correct a few minutes later at 10:44 in the video.  Do not let it confuse you, the 0.27 point estimates were just drawn slightly to the right of delta and they should have been marked slightly to the left of it.  I would re-record the video (labor intensive) or edit it, but I'm a novice with this technological stuff, so please do forgive me.

Finally, at 13:25 I say "within which you can hide evidence of non-inferiority" and I meant "within which you can hide evidence of inferiority."

Again, I apologize for these gaffes.  My struggle (and I think about this stuff a lot) in speaking about and accurately describing these confidence intervals and the conclusions that derive from them result from the arbitrariness of the CONSORT "rules" about interpretation and the arbitrariness of the valences (some articles use negative valence for differences favoring "new" some journals use positive values to favor "new").  If I struggle with it, many other readers, I'm sure, also struggle in keeping things straight.  This is fodder for the argument that these "rules" ought to be changed and made more uniform, for equity and ease of understanding and interpretation of non-inferiority trials.

It made me feel better to see this diagram in Annals of Internal Medicine (Perkins et al July 3, 2012, online ACLS training) where they incorrectly place the point estimate at slightly less than -6% (to the left of the dashed delta line in the Figure 2), when it should have been placed slightly greater than -6% (to the right of the dashed delta line).  Clicking on the image will enlarge it.






Saturday, June 11, 2016

Non-inferiority Trials Are Inherently Biased: Here's Why

Debut VideoCast for the Medical Evidence Blog, explaining non-inferiority trial design and exposing its inherent biases:

In this related blog post, you can find links to the CONSORT statement in the Dec 26, 2012 issue of JAMA and a link to my letter to the editor.

Addendum:  I should have included this in the video.  See the picture below.  In the first example, top left, the entire 95% CI favoring "new" therapy lies in the "zone of indifference", that is, the pre-specified margin of superiority, a mirror image of the "pre-specified margin of noninferiority, in this case delta= +/- 0.15.  Next down, the majority of the 95% CI of the point estimate favoring "new" therapy lies in the "margin of superiority" - so even though the lower end of the 95% CI crosses "mirror delta", the best guess is that the effect of therapy falls in the zone of indifference.  In the lowest example, labeled "Truly Superior", the entire 95% confidence interval falls to the left of "mirror delta" thus reasonable excluding all point estimates in the "zone of indifference" (i.e. +/- delta) and all point estimates favoring the "old" therapy.  This would, in my mind, represent "true superiority" in a logical, rational, and symmetrical way that would be very difficult to mount arguments against.


Added 9/20/16:  For those who question my assertion that the designation of "New" versus "Old" or "comparator" therapy is arbitrary, here is the proof:  In this trial, the "New" therapy is DMARDs and the comparator is anti-tumour necrosis factor agents for the treatment of rheumatoid arthritis.  The rationale for this trial is that the chronologically newer anti-TNF agents are very costly, and the authors wanted to see if similar improvements in quality of life could be obtained with chronologically older DMARDs.  So what is "new" is certainly in the eye of the beholder.  Imagine colistin 50 years ago, being tested against, say, a newer spectrum penicillin.  The penicillin would have been found to be non-inferior, but with a superior side effect profile.  Fast forward 50 years and now colistin could be the "new" resurrected agent and be tested against what 10 years ago was the standard penicillin but is now "old" because of development of resistance.  Clearly, "new" and "old" are arbitrary and flexible designations.

Monday, May 20, 2013

It All Hinges on the Premises: Prophylactic Platelet Transfusion in Hematologic Malignancy


A quick update before I proceed with the current post:  The Institute of Medicine has met and they agree with me that sodium restriction is for the birds.  (Click here for a New York Times summary article.)  In other news, the oh-so-natural Omega-3 fatty acid panacea did not improve cardiovascular outcomes as reported in the NEJM on May 9th, 2013.

An article by the TOPPS investigators in the May 9th NEJM is very useful to remind us not to believe everything we read, to always check our premises, and that some data are so dependent on the perspective from which they're interpreted or the method or stipulations of analysis that they can be used to support just about any viewpoint.

The authors sought to determine if a strategy of withholding prophylactic platelet transfusions for platelet counts below 10,000 in patients with hematologic malignancy was non-inferior to giving prophylactic platelet transfusions.  I like this idea, because I like "less is more" and I think the body is basically antifragile.  But non-inferior how?  And what do we mean by non-inferior in this trial?

Wednesday, December 16, 2009

Dabigatran and Dabigscam of non-inferiority trials, pre-specified margins of non-inferiority, and relative risks

Anyone who thought, based on the evidence outlined in the last post on this blog, that dabigatran was going to be a "superior" replacement for warfarin was chagrinned last week with the publication in the NEJM of the RE-COVER study of dabigatran versus warfarin in the treatment of venous thromboembolism (VTE): http://content.nejm.org/cgi/content/abstract/361/24/2342 . Dabigatran for this indication is NOT superior to warfarin, but may be non-inferior to warfarin, if we are willing to accept the assumptions of the non-inferiority hypothesis in the RE-COVER trial.

Before we go on, I ask you to engage in a mental exercise of sorts that I'm trying to make a habit. (If you have already read the article and recall the design and the results, you will be biased, but go ahead anyway this time.) First, ask yourself what increase in an absolute risk of recurrent DVT/PE/death is so small that you consider it negligible for the practical purposes of clinical management. That is, what difference between two drugs is so small as to be pragmatically irrelevant? Next, ask yourself what RELATIVE increase in risk is negligible? (I'm purposefully not suggesting percentages and relative risks as examples here in order to avoid the pitfalls of "anchoring and adjustment": http://en.wikipedia.org/wiki/Anchoring .) Finally, assume that the baseline risk of VTE at 6 months is ~2% - with this "baseline" risk, ask yourself what absolute and relative increases above this risk are, for practical purposes, negligible. Do these latter numbers jibe with your answers to the first two questions which were answered when you had no particular baseline in mind?

Note how it is difficult to reconcile your "intuitive" instincts about what is a negligible relative and absolute risk with how these numbers might vary depending upon what the baseline risk is. Personally, I think about a 3% absolute increase in the risk of DVT at 6 months to be on the precipice of what is clinically significant. But if the baseline risk is 2%, a 3% absolute increase (to 5%) represents a 2.5x increase in risk! That's a 150% increase, folks! Imagine telling a patient that the use of drug ABC instead of XYZ "only doubles your risk of another clot or death". You can visualize the bewildered faces and incredulous, furrowed brows. But if you say, "the difference between ABC and XYZ is only 3%, and drug ABC costs pennies but XYZ is quite expensive, " that creates quite a different subjective impression of the same numbers. Of course, if the baseline risk were 10%, a 3% increase is only a 30% or 1.3x increase in risk. Conversely, with a baseline risk of 10%, a 2.5x increase in risk (RR=2.5) means a 15% absolute increase in the risk of DVT/PE/Death, and hardly ANYONE would argue that THAT is negligible. We know that doctors and laypeople respond better to, or are more impressed by, results that are described as RRR than ARR, ostensibly because the former inflates the risk because the number appears bigger (e-mail me if you want a reference for this). The bottom line is that what matters is the absolute risk. We're banking health dollars. We want the most dollars at the end of the day, not the largest increase over some [arbitrary] baseline. So I'm not sure why we're still designing studies with power calculations that utilize relative risks.

With this in mind, let's check the assumptions of the design of this non-inferiority trial (NIT). It was designed with 90% power to exclude a hazard ratio (HR; similar to a relative risk for our purposes) of 2.75. That HR of 2.75 sure SOUNDS like a lot. But with a predicted baseline risk of 2% (which prediction panned out in the trial - the baseline risk with warfarin was 2.1%), that amounts to only 5.78, or an increase of 3.78%, which I will admit is close to my own a priori negligibility level of 3%. The authors justify this assignment based on 4 referenced studies all prior to 1996. I find this curious. Because they are so dated and in a rather obscure journal, I have access only to the 1995 NEJM study (http://content.nejm.org/cgi/reprint/332/25/1661.pdf ). In this 1995 study, the statistical design is basically not even described, and there were 3 primary endpoints (ahhh, the 1990s). This is not exactly the kind of study that I want to model a modern trial after. In the table below, I have abstracted data from the 1995 trial and three more modern ones (al lcomparing two treatment regimens for DVT/PE) to determine both the absolute risk and relative risks that were observed in these trials.

Table 1. Risk reductions in several RCTs comparing treatment regimens for DVT/PE. Outcomes are the combination of recurrent DVT/PE/Death unless otherwise specified. *recurrent DVT/PE only; raw numbers used for simplicity in lieu of time to event analysis used by the authors

From this table we can see that in SUCCESSFUL trials of therapies for DVT/PE treatment, absolute risk reductions in the range of 5-10% have been demonstrated, with associated relative risk increases of ~1.75-2.75 (for placebo versus comparator - I purposefully made the ratio in this direction to make it more applicable to the dabigatran trial's null hypothesis [NH] that the 95% CI for dabigatran includes 2.75 HR - note that the NH in an NIT is the enantiomer of the NH in a superiority trial). Now, from here we must make two assumptions, one which I think is justified and the other which I think is not. The first is that the demonstrated risk differences in this table are clinically significant. I am inclined to say "yes, they are" not only because a 5-10% absolute difference just intuitively strikes me as clinically relevant compared to other therapies that I use regularly, but also because, in the cases of the 2003 studies, these trials were generally counted as successes for the comparator therapies. The second assumption we must make, if we are to take the dabigatran authors seriously, is that differences smaller than 5-10% (say 4% or less) are clinically negligible. I would not be so quick to make this latter assumption, particularly in the case of an outcome that includes death. Note also that the study referenced by the authors (reference 13 - the Schulman 1995 trial) was considered a success with a relative risk of 1.73, and that the 95% CI for the main outcome of the RE-COVER study ranged from 0.65-1.84 - it overlaps the Schulman point estimate of RR of 1.73, and the Lee point estimate of 1.83! Based on an analysis using relative numbers, I am not willing to accept the pre-specified margin of non-inferiority upon which this study was based/designed.

But, as I said earlier, relative differences are not nearly as important to us as absolute differences. If we take the upper bound of the HR in the RE-COVER trial (1.84) and multiply it by the baseline risk (2.1) we get an upper 95% CI for the risk of the outcome of 3.86, which corresponds to an absolute risk difference of 1.76. This is quite low, and personally it satisfies my requirement for very small differences between two therapies if I am to call them non-inferior to one another.


So, we have yet again a NIT which was designed upon precarious and perhaps untenable assumptions, but which, through luck or fate was nonetheless a success. I am beginning to think that this dabigatran drug has some merit, and I wager that it will be approved. But this does not change the fact that this and previous trials were designed in such a way as to allow a defeat of warfarin to be declared based on much more tenuous numbers.

I think a summary of sorts for good NIT design is in order:

• The pre-specified margin of non-inferiority should be smaller than the MCID (minimal clinically important difference), if there is an accepted MCID for the condition under study


• The pre-specified margin of non-inferiority should be MUCH smaller than statistically significant differences found in "successful" superiority trials, and ideally, the 95% CI in the NIT should NOT overlap with point estimates of significant differences in superiority trials


• NITs should disallow "asymmetry" of conclusions - see the last post on dabigatran. If the pre-specified margin of non-inferiority is a relative risk of 2.0 and the observed 95% CI must not include that value to claim non-inferiority, then superiority cannot be declared unless the 95% confidence interval of the point estimate does not cross -2.0. What did you say? That's impossible, it would require a HUGE risk difference and a narrow CI for that to ever happen? Well, that's why you can't make your delta unrealistically large - you'll NEVER claim superiority, if you're being fair about things. If you make delta very large it's easier to claim non-inferiority, but you should also suffer the consequences by basically never being able to claim superiority either.


• We should concern ourselves with Absolute rather than Relative risk reductions