Sunday, April 21, 2019

A Finding of Noninferiority Does Not Show Efficacy - It Shows Noninferiority (of short course rifampin for MDR-TB)

An image of two separated curves from Mayo's book SIST
Published in the March 28th, 2019 issue of the NEJM is the STREAM trial of a shorter regimen for Rifampin-resistant TB.  I was interested in this trial because if fits the pattern of a "reduced intensity therapy", a cohort of which we recently analyzed and published last year.  The basic idea is this:  if you want to show efficacy of a therapy, you choose the highest dose of the active drug to compare to placebo, to improve the chances that you will get "separation" of the two populations and statistically significant results.  Sometimes, the choice of the "dose" of something, say tidal volume in ARDS, is so high that you are accused of harming one group rather than helping the other.  The point is if you want positive results, use the highest dose so the response curves will separate further, assuming efficacy.

Conversely, in a noninferiority trial, your null hypothesis is not that there is no difference between the groups as it is in a superiority trial, but rather it is that there is a difference bigger than delta (the pre-specified margin of noninferiority.  Rejection of the null hypothesis a leads you to conclude that there is no difference bigger than delta, and you then conclude noninferiority.  If you are comparing a new antibiotic to vancomycin, and you want to be able to conclude noninferiority, you may intentionally or subconsciously dose vancomycin at the lower end of the therapeutic range, or shorten the course of therapy.  Doing this increases the chances that you will reject the null hypothesis and conclude that there is no difference greater than delta in favor of vancomycin and that your new drug is noninferior.  However, this increases your type 1 error rate - the rate at which you falsely conclude noninferiority.



Trials that use reduced intensity therapies are asking for false conclusions of noninferiority, which they often want, because they want their new drug to be added to the therapeutic armamentarium, usually because of lucre.  Many of the noninferiority trials of reduced intensity therapies (RIT) that we analyzed were not doing so for the profit motive, but rather to see if dose or duration reductions could minimize toxicity of, say, chemotherapy without losing an important margin of benefit.  The STREAM trial seems to be in that category.  Nonetheless, we showed that these trials have a higher rate of failure to show noninferiority, and we can make a moderately strong inference that they have increased type 1 error rates too.

Finally, note that if your active control comparator is not efficacious (as demonstrated by prior superiority trials showing it beats placebo by a significant margin), a finding of noninferiority after you compare your new drug to it does not say anything about the "efficacy" of your new drug - it shows that it is noninferior to the old drug, which is not efficacious compared to placebo - your new drug is noninferior to placebo.  (An optimist would say "at least it's not inferior!")

The impetus for this post is the editorial for the trial by Churchyard.  My beefs with it may seem nitpicking, but it is nitpicking about fundamental assumptions of statistical inference - which inferences are warranted and which are not. 

In the second paragraph of the editorial, Churchyard says that the "STREAM study....was conducted to properly evaluate the efficacy of the shorter regimen."  We're off to a bad start, and I needn't reiterate what I've stated above, but I do need to check if the basic assumption that the active comparator in the STREAM trial has proven efficacy compared to placebo.  In the index article, the authors state in the introduction,
"data from phase 3 randomized trials of combination drug regimens for multidrug-resistant tuberculosis are lacking. Recommendations from the World Health Organization (WHO) for the treatment of multidrug-resistant tuberculosis (published in 2011) are based on evidence that was classified as very low quality and were described as conditional (i.e., “the desirable effects of adherence to a recommendation probably outweigh the undesirable effects”)."
Since we don't know the efficacy of the regimens, we cannot make statements about the efficacy of the new regimen.  The only thing we're going to be able to do is to claim that the new regimen is noninferior.

Churchyard continues to talk about the "efficacy outcome" and the "efficacy analysis" in this paragraph, and in the next states,
 "the results of the STREAM study provide evidence for the efficacy of a shorter regimen"
This is an unjustified statement for reasons that should now be clear.  He can, based on the 10% prespecified margin of noninferiority and the results of the study, state that the short regimen is noninferior, but he cannot make any statements about efficacy, which implies a comparison to placebo, which is not available.

This is one of the problems with noninferiority trials - their inferences build on inferences made before them and they run the risk of biocreep in the loss of efficacy compared to placebo over time, even if an active comparator ever showed this.  If coumadin appears to be "superior" to placebo for venous thromboembolism based on a statistically significant result, an inference is made:  viz, that the difference observed between the populations is not due to chance alone (based on NHST and a p-value < alpha), that the differences between the populations is due to the drug not other differences between the populations at baseline or arising during conduct.  If these inferences are correct, we have built a case for the efficacy of coumadin.  Whenever we test a new drug compared to coumadin as active control in a noninferiority trial, we have a whole new set of inferences, and if we wish to declare efficacy, we must go back and check the strength of the inferences that we made in historical trials of coumadin, not by focusing exclusively on a result from the new noninferiority trial.

Because we cannot compare the long regimen to placebo for ethical reasons, these errors can be forgive.  If we're going to use the long course because we think that's the best we got, I understand that too.  Since the absolute difference between the two regimens was small at 1%, with confidence intervals extending towards 9% in either direction, it seems reasonable to conclude that the short course is noninferior and will spare patients with MDR-TB many months of inconvenience that may not be worth it, even if it doesn't lead to fewer side effects.  Convenience counts for a lot.

No comments:

Post a Comment