Showing posts with label bias. Show all posts
Showing posts with label bias. Show all posts

Sunday, September 1, 2019

Pediatrics and Scare Tactics: From Rock-n-Play to Car Safety Seats

Is sleeping in a car seat dangerous?
Earlier this year, the Fisher-Price company relented to pressure from the AAP (American Academy of Pediatrics) and recalled 4.7 million of Rock 'n Play (RnP) baby rockers, which now presumably occupy landfills.  This recommendation stemmed from an "investigation" by consumer reports showing that since 2011, 32 babies died while sleeping in the RnP.  These deaths are tragic, but what does it mean?  In order to make sense of this "statistic" we need to determine a rate, based on the exposure period, something like "the rate of infant death in the RnP is 1 per 10 million RnP occupied hours" or something like that.  Then we would compare it to the rate of infant death sleeping in bed.  If it was higher, we would have a starting point for considering whether ceteris paribus, maybe it's the RnP that is causing the infant deaths.  We would want to know the ratio of observed deaths in the RnP to expected deaths sleeping in some other arrangement for the same amount of time.  Of course, even if we found the observed rate was higher than the expected rate, other possibilities exist, i.e., it's an association, a marker for some other factor, rather than a cause of the deaths.  A more sophisticated study would, through a variety of methods, try to control for those other factors, say, socioeconomic status, infant birth weight, and so on.  The striking thing to me and other evidence minded people was that this recall did not even use the observed versus expected rate, or any rate at all!  Just a numerator!  We could do some back of the envelope calculations with some assumptions about rate ratios, but I won't bother here.  Suffice it to say that we had an infant son at that time and we kept using the RnP until he outgrew it and then we gave it away.

Last week, the AAP was at it again, playing loose with the data but tight with recommendations based upon them.  This time, it's car seats.  In an article in the August, 2019 edition of the journal Pediatrics, Liaw et al present data showing that, in a cohort of 11,779 infant deaths, 3% occurred in "sitting devices", and in 63% of this 3%, the sitting device was a car safety seat (CSS).  In the deaths in CSSs, 51.6% occurred in the child's home rather than in a car.  What was the rate of infant death per hour in the CSS?  We don't know.  What is the expected rate of death for the same amount of time sleeping, you know, in the recommended arrangement?  We don't know!  We're at it again - we have a numerator without a denominator, so no rate and no rate to compare it to.  It could be that 3% of the infant deaths occurred in car seats because infants are sleeping in car seats 3% of the time!

Thursday, May 17, 2018

Increasing Disparities in Infant Mortality? How a Narrative Can Hinge on the Choice of Absolute and Relative Change

An April, 11th, 2018 article in the NYT entitled "Why America's Black Mothers and Babies are in a Life-or-Death Crisis" makes the following alarming summary statement about racial disparities in infant mortality in America:
Black infants in America are now more than twice as likely to die as white infants — 11.3 per 1,000 black babies, compared with 4.9 per 1,000 white babies, according to the most recent government data — a racial disparity that is actually wider than in 1850, 15 years before the end of slavery, when most black women were considered chattel.
Racial disparities in infant mortality have increased since 15 years before the end of the Civil War?  That would be alarming indeed.  But a few paragraphs before, we are given these statistics:

In 1850, when the death of a baby was simply a fact of life, and babies died so often that parents avoided naming their children before their first birthdays, the United States began keeping records of infant mortality by race. That year, the reported black infant-mortality rate was 340 per 1,000; the white rate was 217 per 1,000.
The white infant mortality rate has fallen 217-4.9 = 212.1 infants per 1000.  The black infant mortality rate has fallen 340-11.3 = 328.7 infants per 1000.  So in absolute terms, the terms that concern babies (how many of us are alive?), the black infant mortality rate has fallen much more than the white infant mortality rate.  In fact, in absolute terms, the disparity is almost gone:  in 1850, the absolute difference was 340-217 = 123 more black infants per 1000 births dying and now it is 11.3-4.9 = 6.4 more black infants per 1000 births dying.

Analyzed a slightly different way, the proportion of white infants dying has been reduced by (217-4.9/217) 97.7%, and the proportion of black infants dying has been reduced by (340-11.3/340)= 96.7%.  So, within 1%, black and white babies shared almost equally in the improvements in infant mortality that have been seen since 15 years before the end of the Civil War.  Or, we could do a simple reference frame change and look at infant survival rather than mortality.  If we did that, the current infant survival rate is 98.87% for black babies and 99.51% for white babies.  The rate ratio for black:white survival is .994 - almost parity depending on your sensitivity to variances from unity.

It's easy to see how the author of the article arrived at different conclusions by looking only at the rate ratios in 1850 and contemporaneously.  But doing the math that way makes it seem as if a black baby is worse off today than in 1850!  Nothing could be farther from the truth.

You might say that this is just "fuzzy math" as our erstwhile president did in the debates of 2000.  But there could be important policy implications also.  Suppose that I have an intervention that I could apply across the US population and I estimate that it will save an additional 5 black babies per 1000 and an additional 3 white babies per 1000.  We implement this policy and it works as projected.  The black infant mortality rate is reduced to 6.3/1000 and the white infant mortality rate is 1.9/1000.  We have saved far many black babies than white babies across the population.  But the rate ratio for black:white mortality has increased from 2.3 to 3.3!  Black babies are now 3 (three!) times as likely to die as white babies!  The policy has increased disparities even though black babies are far better off after the policy change than before it.

It reminds me of the bias where people would rather take a smaller raise if it increased their standing relative to their neighbor.  Surprisingly, when presented with two choices:
  1. you make $50,000 and your peers make $25,000 per year
  2. You make $100,000 and your peers make $250,000 per year
many people choose 1, as if relative social standing is worth $50,000 per year in income.  (Note that relative social standing is just that, relative, and could change if you arbitrarily change the reference class.)

So, relative social standing has value and perhaps a lot of it.  But as regards the hypothetical policy change above, I'm not sure we should be focusing on relative changes in infant mortality.  We just want as few babies dying as possible. And it is disingenuous to present the statistics in a one-sided, tendentious way.

Saturday, June 11, 2016

Non-inferiority Trials Are Inherently Biased: Here's Why

Debut VideoCast for the Medical Evidence Blog, explaining non-inferiority trial design and exposing its inherent biases:

In this related blog post, you can find links to the CONSORT statement in the Dec 26, 2012 issue of JAMA and a link to my letter to the editor.

Addendum:  I should have included this in the video.  See the picture below.  In the first example, top left, the entire 95% CI favoring "new" therapy lies in the "zone of indifference", that is, the pre-specified margin of superiority, a mirror image of the "pre-specified margin of noninferiority, in this case delta= +/- 0.15.  Next down, the majority of the 95% CI of the point estimate favoring "new" therapy lies in the "margin of superiority" - so even though the lower end of the 95% CI crosses "mirror delta", the best guess is that the effect of therapy falls in the zone of indifference.  In the lowest example, labeled "Truly Superior", the entire 95% confidence interval falls to the left of "mirror delta" thus reasonable excluding all point estimates in the "zone of indifference" (i.e. +/- delta) and all point estimates favoring the "old" therapy.  This would, in my mind, represent "true superiority" in a logical, rational, and symmetrical way that would be very difficult to mount arguments against.


Added 9/20/16:  For those who question my assertion that the designation of "New" versus "Old" or "comparator" therapy is arbitrary, here is the proof:  In this trial, the "New" therapy is DMARDs and the comparator is anti-tumour necrosis factor agents for the treatment of rheumatoid arthritis.  The rationale for this trial is that the chronologically newer anti-TNF agents are very costly, and the authors wanted to see if similar improvements in quality of life could be obtained with chronologically older DMARDs.  So what is "new" is certainly in the eye of the beholder.  Imagine colistin 50 years ago, being tested against, say, a newer spectrum penicillin.  The penicillin would have been found to be non-inferior, but with a superior side effect profile.  Fast forward 50 years and now colistin could be the "new" resurrected agent and be tested against what 10 years ago was the standard penicillin but is now "old" because of development of resistance.  Clearly, "new" and "old" are arbitrary and flexible designations.

Monday, May 2, 2016

Hope: The Mother of Bias in Research

I realized the other day that underlying every slanted report or overly-optimistic interpretation of a trial's results, every contorted post hoc analysis, every Big Pharma obfuscation, is hope.  And while hope is generally a good, positive emotion, it engenders great bias in the interpretation of medical research.  Consider this NYT article from last month:  "Dashing Hopes, Study Shows Cholesterol Drug Had No Effect on Heart Health."  The title itself reinforces my point, as do several quotes in the article.
“All of us would have put money on it,” said Dr. Peter Libby, a Harvard cardiologist. The drug, he said, “was the great hope.”
 Again, hope is wonderful, but it blinds people to the truth in everyday life and I'm afraid researchers are no more immune to its effects than the laity.  In my estimation, three main categories of hope creep into the evaluation of research and foments bias:

  1. Hope for a cure, prevention, or treatment for a disease (on the part of patients, investigators, or both)
  2. Hope for career advancement, funding, notoriety, being right (on the part of investigators) and related sunk cost bias
  3. Hope for financial gain (usually on the part of Big Pharma and related industrial interests)
Consider prone positioning for ARDS.  For over 20 years, investigators have hoped that prone positioning improves not only oxygenation but also outcomes (mostly mortality).  So is it any wonder that after the most recent trial, in spite of the 4 or 5 previous failed trials, the community enthusiastically declared "success!"  "Prone Positioning works!"  Of course it is no wonder - this has been the hope for decades.

But consider what the most recent trial represents through the lens of replicability:  a failure to replicate previous results showing that prone positioning does not improve mortality.  The recent trial is the outlier.  It is the "false positive" rather than the previous trials being the "false negatives."

This way of interpreting the trials of prone positioning in the aggregate should be an obvious one, and it astonishes me that it took me so long to see the results this way - as a single failure to replicate previously replicable negative results.  But it hearkens to the underlying bias - we view results through the magnifying glass of hope, and it distorts our appraisal of the evidence.

Indeed, I have been accused of being a nihilist because of my views on this blog, which some see as derogating the work of others or an attempt to dash their hopes.  But these critics engage, or wish me to engage in a form of outcome bias - the value of the research lies in the integrity of its design, conduct, analysis, and reporting, not in its results.  One can do superlative research and get negative results, or shoddy research and get positive results.  My goal here is and always has been to judge the research on its merits, regardless of the results or the hopes that impel it.

(Aside:  Cholesterol researchers have a faith or hope in the cholesterol hypothesis - that cholesterol is a causal factor in pathways to cardiovascular outcomes.  Statin data corroborate this, and preliminary PCSK9 inhibitor data do, too.  But how quickly we engage in hopeful confirmation bias!  If cholesterol is a causal factor, it should not matter how you manipulate it - lower the cholesterol, lower cardiovascular events.  The fact that it does appear to matter how you lower it suggests that either there are multiplicity of agent effects (untoward and unknown effects of some agents negate some their beneficial effects in the cholesterol causal pathway) or that cholesterol levels are epiphenomena - markers of the effects of statins and PCSK9 inhibitors on the real, but as yet undelineated causal pathways.  Maybe the fact that we can easily measure cholesterol and that it is associated with outcomes in untreated individuals is a convenient accident of history that led us to trial statins which work in ways that we do not yet understand.)

Wednesday, February 10, 2016

A Focus on Fees: Why I Practice Evidence Based Medicine Like I Invest for Retirement

He is the best physician who knows the worthlessness of the most medicines."  - Ben Franklin

This blog has been highly critical of evidence, taking every opportunity to strike at any vulnerability of a trial or research program.  That is because this is serious business.  Lives and limbs hang in the balance, pharmaceutical companies stand to gain billions from "successful" trials, investigators' careers and funding are on the line if chance findings don't pan out in subsequent investigations, sometimes well-meaning convictions blind investigators and others to the truth; in short, the landscape is fertile for bias, manipulation, and even fraud.  To top it off, many of the questions about how to practice or deal with a particular problem have scant or no evidence to bear upon them, and practitioners are left to guesswork, convention, or pathophysiological reasoning - and I'm not sure which among these is most threatening.  So I am often asked, how do you deal with the uncertainty that arises from fallible evidence or paucity of evidence when you practice?

I have ruminated about this question and how to summarize the logic of my minimalist practice style for some time but yesterday the answer dawned on me:  I practice medicine like I invest in stocks, with a strategy that comports with the data, and with precepts of rational decision making.

Investors make numerous well-described and wealth destroying mistakes when they invest in stocks.  Experts such as John Bogle, Burton Malkiel, David Swenson and others have written influential books on the topic, utilizing data from studies in economics (financial and behavioral).  Key among the mistakes that investors make are trying to select high performers (such as mutual funds or hedge fund managers), chasing performance, and timing the market.  The data suggest that professional stock pickers fare little better than chance over the long run, that you cannot discern who will beat the average over the long run, and that the excess fees you are charged by high performers will negate any benefit they might otherwise have conferred to you.  The experts generally recommend that you stick with strategies that are proven beyond a reasonable doubt: a heavy concentration in stocks with their long track record of superior returns, diversification, and strict minimization of fees.  Fees are the only thing you can guarantee about your portfolio's returns.

Thursday, February 4, 2016

Diamox Results in Urine: General and Specific Lessons from the DIABOLO Acetazolamide Trial

The trial of acetazolamide to reduce duration of mechanical ventilation in COPD patients was published in JAMA this week.  I will use this trial to discuss some general principles about RCTs and make some comments specific to this trial.

My arguable but strong prior belief, before I even read the trial, is that Diamox (acetazolamide) is ineffectual in acute and chronic respiratory failure, or that it is harmful.  Its use is predicated on a "normalization fallacy" which guides practitioners to try attempt to achieve euboxia (normal numbers).  In chronic respiratory acidosis, the kidneys conserve bicarbonate to maintain normal pH.  There was a patient we saw at OSU in about 2008 who had severe COPD with a PaCO2 in the 70s and chronic renal failure with a bicarbonate under 20.  A well-intentioned but misguided resident checked an ABG and the patient's pH was on the order of 7.1.  We (the pulmonary service) were called to evaluate the patient for MICU transfer and intubation, and when we arrived we found him sitting at the bedside comfortably eating breakfast.  So it would appear that if the kidneys can't conserve enough bicarbonate to maintain normal pH, patients can get along with acidosis, but obviously evolution has created systems to maintain normal pH.  Why you would think that interfering with this highly conserved system to increase minute ventilation in a COPD patient you are trying to wean is beyond the reach of my imagination.  It just makes no sense.

This brings us to a major problem with a sizable proportion of RCTs that I read:  the background/introduction provides woefully insufficient justification for the hypothesis that the RCT seeks to test.  In the background of this paper, we are sent to references 4-14.  Here is a summary of each:

4.)  A review of metabolic alkalosis in a general population of critically ill patients
5.)  An RCT of acetazolamide for weaning COPD patients showing that it doesn't work
6.)  Incidence of alkalosis in hospitalized patients in 1980
7.)  A 1983 translational study to delineate the effect of acetazolamide on acid base parameters in 10 paitnets
8.)  A 1982 study of hemodynamic parameters after acetazolamide administration in 12 patients
9.)  A study of metabolic and acid base parameters in 14 patients with cystic fibrosis 
10.) A retrospective epidemiological descriptive study of serum bicarbonate in a large cohort of critically ill patients
11.)  A study of acetazolamide in anesthetized cats
12 - 14).  Commentary and pharmacodynamic studies of acetazolamide by the authors of the current study

Wednesday, July 22, 2015

There is (No) Evidence For That: Epistemic Problems in Evidence Based Medicine

Below is a Power Point Presentation that I have delivered several times recently including one iteration at the SMACC conference in Chicago.  It addresses epistemic problems in our therapeutic knowledge, and calls into question all claims of "there is evidence for ABC" and "there is no evidence for ABC."  Such claims cannot be taken at face value and need deeper consideration and evaluation considering all possible states of reality - gone is the cookbook or algorithmic approach to evidence appraisal as promulgated by the User's Guides.  Considered in the presentation are therapies for which we have no evidence, but they undoubtedly work (Category 1 - Parachutes) and therapies for which we have evidence of efficacy or lack thereof (Category 2) but that evidence is subject to false positives and false negatives, for numerous reasons including: the Ludic Fallacy, study bias (See: Why Most Published Research Findings Are False), type 1 and 2 errors, the "alpha bet" (the arbitrary and lax standard used for alpha, namely 0.05), Bayesian interpretations, stochastic dominance of the null hypothesis, inadequate study power in general and that due to delta inflation and subversion of double significance hypothesis testing.  These are all topics that have been previously addressed to some degree on this blog, but this presentation presents them together as a framework for understanding the epistemic problems that arise within our "evidence base."  It also provides insights into why we have a generation of trials in critical care the results of which converge on the null and why positive studies in this field cannot be replicated.

Thursday, March 20, 2014

Sepsis Bungles: The Lessons of Early Goal Directed Therapy

On March 18th, the NEJM published early online three original trials of therapies for the critically ill that will serve as fodder for several posts.  Here, I focus on the ProCESS trial of protocol guided therapy for early septic shock.  This trial is in essence a multicenter version of the landmark 2001 trial of Early Goal Directed Therapy (EGDT) for severe sepsis by Rivers et al.  That trial showed a stunning 16% absolute reduction in mortality in sepsis attributed to the use of a protocol based on physiological goals for hemodynamic management.  That absolute reduction in mortality is perhaps the largest for any therapy in critical care medicine.  If such a reduction were confirmed, it would make EGDT the single most important therapy in the field.  If such reduction cannot be confirmed, there are several reasons why the Rivers results may have been misleading:

There were other concerns about the Rivers study and how it was later incorporated into practice, but I won't belabor them here.  The ProCESS trial randomized about 1350 patients among three groups, one simulating the original Rivers protocol, one to a modified Rivers protocol, and one representing "standard care" that is, care directed by the treating physician without a protocol.  The study had 80% power to demonstrate a mortality reduction of 6-7%.  Before you read further, please wager, will the trial show any statistically significant differences in outcome that favor EGDT or protocolized care?

Monday, November 18, 2013

Dead in the Water: Colloids versus Crystalloids for Fluid Resuscitation in the ICU

It is a valid question:  at what point has a concept been tested ad infinitum such that further testing is not worthwhile?  There are at least three reasons why additional study of a concept may not be justified:

  1. Because the prior probability of success is so low (based on extant trials) that a subsequent trial is unlikely to influence the posterior probability that any success represents the truth.  (This is a Bayesian or meta-analytic worldview.)
  2. Because the low probability of success does not justify the expense of additional trials
  3. Because the low probability of success violates bioethical precepts mandating that trials must have added value for patients and society
And so we have, in the November 6th edition of JAMA, the CRISTAL trial of colloids versus crystalloids for resuscitation in the ICU.  As is customary, I will leave it to interested readers to peruse the manuscript for details.  My task here is to provide some background and nuance.

Thursday, June 20, 2013

More is Not Less, It Just Costs More: Early Tracheostomy, Early Parenteral Nutrition, and Rapid Blood Pressure Lowering in ICH

The past 2 weeks have provided me with some interesting reading of new data that deserve to be integrated with several other studies and themes discussed in this blog.  The three trials below share the goal of intervening early and aggressively so I thought it may be interesting to briefly consider them together.

Firstly, Young et al (May 22/29, 2013 issue of JAMA) report the results of the TracMan multicenter trial of early tracheostomy in ICUs in the UK.  These data seal the deal on an already evolving shift in my views on early tracheostomy that were based on anecdotal experience and earlier data from Rumbak and Terragni.  Briefly, the authors enrolled 899 patients expected to receive at least 7 more days of mechanical ventilation (that prediction was no more reliable in the current trial than it had been in previous trials) and randomized them to receive a trach on day 4 (early) versus on day 10 (late).    The early patients did end up receiving less sedatives and  had a trend toward shorter duration of respiratory support.  But their KM curves are basically superimposable and the mortality rates virtually identical at 30 days.  These data, combined with other available studies, leave little room for subjective interpretation.  Early tracheostomy, it is very likely, does not favorably affect outcomes enough to justify its costs and risks.

Friday, April 19, 2013

David versus Goliath on the Battlefield of Non-inferiority: Strangeness is in the Eye of the Beholder

In this week's JAMA is my letter to the editor about the CONSORT statement revision for the reporting of non-inferiority trials, and the authors' responses.  I'll leave it to interested readers to view for themselves the revised CONSORT statement, and the letter and response.

In sum, my main argument is that Figure 1 in the article is asymmetric, such that inferiority is stochastically less likely than superiority and an advantage is therefore conferred to the "new" [preferred; proprietary; profitable; promulgated] treatment in a non-inferiority trial.  Thus the standards for interpretation of non-inferiority trials are inherently biased.  There is no way around this, save for revising the standards.

The authors of CONSORT say that my proposed solution is "strange" because it would require revision of the standards of interpretation for superiority trials as well.  For me it is "strange" that we would endorse asymmetric and biased standards of interpretation in any trial.  The compromised solution, as I suggested in my letter, is that we force different standards for superiority only in the context of a non-inferiority trial.  Thus, superiority trial interpretation standards remain untouched.  It is only if you start with a non-inferiority trial that you have a higher hurdle to claiming superiority that is contingent on evidence of non-inferiority in the trial that you designed.  This would disincentivise the conduct of non-inferiority trials for a treatment that you hope/think/want to be superior.  In the current interpretation scheme, it's a no-brainer - conduct a non-inferiority trial and pass the low hurdle for non-inferiority, and then if you happen to be superior too, BONUS!

In my proposed scheme, there is no bonus superiority that comes with a lower hurdle than inferiority.  As I said in the last sentence, "investigators seeking to demonstrate superiority should design a superiority trial."  Then, there is no minimal clinically important difference (MCID) hurdle that must be cleared, and a statistical difference favoring new therapy by any margin lets you declare superiority.  But if you fail to clear that low(er) hurdle, you can't go back and declare non-inferiority.  

Which leads me to something that the word limit of the letter did not allow me to express:  we don't let unsuccessful superiority trials test for non-inferiority contingently, so why do we let successful non-inferiority trials test for superiority contingently?

Symmetry is beautiful;  Strangeness is in the eye of the beholder.

(See also:  Dabigatran and Gefitinib especially the figures, analogs of Figure 1 of Piaggio et al, on this blog.)

Monday, January 28, 2013

Coffee Drinking, Mortality, and Prespecified Falsification Endpoints

A few months back, the NEJM published this letter in response to an article by Freedman et al in the May 17, 2012 NEJM reporting an association between coffee drinking and reduced mortality found in a large observational dataset.  In a nutshell, the letter said that there was no biological plausibility for mortality reductions resulting from coffee drinking so the results were probably due to residual confounding, and that reductions in mortality in almost all categories (see Figure 1 of the index article) including accidents and injuries made the results dubious at best.  The positive result in the accidents and injuries category was in essence a failed negative control in the observational study.

Last week in the January 16th issue of JAMA Prasad and Jena operationally formalized this idea of negative controls for observational studies, especially in light of Ioannidis' call for a registry of observational studies.  They recommend that investigators mining databases establish a priori hypotheses that ought to turn out negative because they are biologically implausible.  These hypotheses can therefore serve as negative controls for the observational associations of interest, the ones that the authors want to be positive.  In essence, they recommend that the approach to observational data become more scientific.  At the most rudimentary end of the dataset analysis spectrum, investigators just mine the data to see what interesting associations they can find.  In the middle of the spectrum, investigators have a specific question that they wish to answer (usually in the affirmative), and they leverage a database to try to answer that question.  Prasad and Jena are suggesting going a step further towards the ideal end of the spectrum:  to specify both positive and negative associations that should be expected in a more holistic assessment of the ability of the dataset to answer the question of interest.  (If an investigator were looking to rule out an association rather than to find one, s/he could use a positive control rather than a negative one [a falsification end point] to establish the database's ability to confirm expected differences.)

I think that they are correct in noting that the burgeoning availability of large databases (of almost anything) and the ease with which they can be analyzed poses some problems for interpretation of results.  Registering observational studies and assigning prespecified falsification end points should go a long way towards reducing incorrect causal inferences and false associations.

I wish I had thought of that.

Added 3/3/2013 - I just realized that another recent study of dubious veracity had some inadvertent unspecified falsification endpoints, which nonetheless cast doubt on the results.  I blogged about it here:  Multivitamins caused epistaxis and reduced hematuria in male physicians.

Wednesday, April 8, 2009

The PSA Screening Quagmire - If Ignorance is Bliss then 'Tis Folly to be Wise?

The March 26th NEJM was a veritable treasure trove of interesting evidence so I can't stop after praising NICE-SUGAR and railing on intensive insulin therapy. If 6000 patients (40,000 screened) seemed like a commendable and daunting study to conduct, consider that the PLCO Project Team randomized over 76,000 US men to screening versus control (http://content.nejm.org/cgi/reprint/360/13/1310.pdf) and the ERSPC Investigators randomized over 162,000 European men in a "real-time meta-analysis" of sorts (wherein multiple simultaneous studies were conducted with similar but different enrollment requirements and combined; see: http://content.nejm.org/cgi/reprint/360/13/1320.pdf.)   This is, as the editorialist points out a "Hurculean effort" and that is fitting and poignant - because ongoing PSA screening efforts in current clinical practice represent a Hurculean effort to reduce morbidity and mortality of this disease and this reinforces the importance of the research question - are we wasting our time? Are we doing more harm than good?

The lay press was quick to start trumpeting the downfall of PSA screening with headlines such as "Prostate Test Found to Save Few Lives" . But for all their might, both of these studies give me, a longtime critic of cancer screening efforts, a good bit of pause. (Pulmonologists may be prone to "sour grapes" as a result of the failures of screening for lung cancer.)

Before I summarize briefly the studies and point out some interesting aspects of each, allow me to indulge in a few asides. First, I direct you to this interesting article in Medical Decision Making "Cure Me Even if it Kills Me". This wonderful study in judgment and decision making shows how difficult it is for patients to live with the knowledge that there is a cancer, however small growing in them. They want it out. And they want it out even if they are demonstrably worse off with it cut out or x-rayed out or whatever. It turns out that patients have a value for "getting rid of it" that probably arises from the emotional costs of living knowing there's a cancer in you. I highly recommend that anyone interested in cancer screening or treatment read this article.

This article invokes in me an unforgettable patient from my residency whom we screened in compliance with VA mandates at the time. Sure enough, this patient with heart disease had a mildly elevated PSA and sure enough he had a cancer on biopsy. And we discussed treatments in concert with our Urology colleagues. While he had many options, this patient agonized and brooded and could not live with the thought of a cancer in him He proceeded with radical prostatectomy, the most drastic of his options. And I will never forget that look of crestfallen resignation every time I saw him after that surgery because he thereafter came to clinic in diapers, having been rendered incontinent and impotent by that surgery. He was more full of self-flagellating regret than any other patient I have seen in my career. This poor man and his experience certainly jaded me at a young age and made me highly attuned to the pitfalls of PSA screening.

Against this backdrop where cancer is the most feared diagnosis in medicine, we feel an urge towards action to screen and prevent, even when there is a marginal net benefit of cancer screening, and even when other greater opportunities for improving health exist. I need not go into the literature about [ir]rational risk appraisal other than to say that our overly-exuberant fear of cancer (relative to other concerns) almost certainly leads to unrealistic hopes for screening and prevention. Hence the great interest in and attention to these two studies.

In summary, the PLCO study showed no reduction in prostate-cancer-related mortality from DRE (digital rectal examination) and PSA screening. Absence of evidence is not evidence, however, and a few points about this study deserve to be made:

~Because of high (and increasing) screening rates in the control group, this was essentially a study of the "dose" of screening. The dose in the control group was ~45 and that in the screening group was ~85%. So the question that the study asked was not really "does screening work" but rather "does doubling the dose of screening work". Had there been a favorable trend in this study, I would have been tempted to double the effect size of the screening to infer the true effect, reasoning that if increasing screening from 40% to 80% reduces prostate cancer mortality by x%, then increasing screening from 0% to 80% would reduce it by 2x%. Alas this was not the case with this study which was underpowered.

~I am very wary of studies that have cause-specific mortality as an endpoint. There's just too much room for adjudication bias, as the editorialist points out. Moreover, if you reduce prostate cancer mortality but overall mortality is unchanged, what do I, as a potential patient care? Great, you saved me from prostate cancer and I died at about the same time I would have but from an MI or a CVA instead? We have to be careful about whether our goals are good ones - the goal should not be to "fight cancer" but rather to "improve overall health". The latter, I admit, is a much less enticing and invigorating banner. We like to feel like we're fighting. (Admittedly, overall mortality appears to not differ in this study, but I'm at a loss as to what's really being reported in Table 4.) The DSMB for the ESRCP trial argue here that cancer specific mortality is most appropriate for screening trials because of dilution by other causes of mortality, and because screening for a specific cancer can only be expected to reduce mortality for that cancer. From an efficacy standpoint, I agree, but from an effectiveness standpoint, this position causes me to squint and tilt my head askance.

~It is so very interesting that this study was stopped not for futility, nor for harm, nor for efficacy, but because it was deemed necessary for the data to be released because of the [potential] impact on public health. And what has been the impact of those data? Utter confusion. That increasing screening from 40% to 80% does not improve prostate specific mortality does not say to me that we should reduce screening to 0%. In fact I don't know what to do, nor what to make of these data. Especially in the context of the next study.

In the ERSPC trial, investigators found a 20% reduction in prostate cancer deaths with screening with PSA alone in Europe. The same caveats regarding adjudication of this outcome notwithstanding, there are some very curious aspects of this trial that merit attention:

~This trial was, as I stated above, a "real-time meta-analysis" with many slightly different studies combined for analysis. I don't know what this does to internal or external validity because this is such an unfamiliar approach to me, but I'll be pondering it for a while I'm sure.

~I am concerned that I don't fully understand the way that interim analyses were performed in this trial, what the early stopping rules were, and whether a one-sided or two-sided alpha was used. Reference 6 states that it was one-sided but the index article says 2. Someone will have to help me out with the O'Brien-Fleming alpha spending function and let me know if 1% spending at each analysis is par for the course.

~As noted by the editorialist, we are not told what the "contamination rate" of screening in the control group is. If it is high, we might use my method described above to infer the actual impact of screening.

~Look at the survival curves that diverge and then appear to converge again at a low hazard rate. Is it any wonder that there is no impact on overall mortality?


So where does this all leave us? We have a population of physicians and patients that yearn for effective screening and believe in it, so much so that it is hard to conduct an uncontaminated study of screening. We have a US study that is stopped prematurely in order to inform public health, but which is inadequate to inform it. We have a European study which shows a benefit near the a priori expected benefit, but which has a bizarre design and is missing important data that we would like to consider before accepting the results. We have no hint of a benefit on overall mortality. We have lukewarm conclusions from both groups, and want desperately to know what the associated morbidities in each group are. We are spending vast amounts of resources and incurring an enormous emotional toll on men who live in fear after a positive PSA test, many of whom pay dearly ("a pound of flesh") to exorcise that fear. And we have a public over-reaction to the results of these studies which merely increase our quandary.

If ignorance is bliss, then truly 'tis folly to be wise. Perhaps this saying applies equally to individual patients, and the investigation of PSA screening in these large-scale trials. For my own part, this is one aspect of my health that I shall leave to fate and destiny, while I focus on more directly remediable aspects of preventive health, ones where the prevention is pleasurable (running and enjoying a Mediterranean diet) rather than painful (prostatectomy).