Showing posts with label data dredging. Show all posts
Showing posts with label data dredging. Show all posts

Saturday, March 14, 2009

"Statistical Slop": What billiards can teach us about multiple comparisons and the need to assign primary endpoints

Anyone who has played pool knows that you have to call your shots before you make them. This rule is intended to decrease probability of "getting lucky" from just hitting the cue ball as hard as you can, expecting that the more it bounces around the table, the more likely it is that one of your many balls will fall through chance alone. Sinking a ball without first calling it is referred to coloquially as "slop" or a "slop shot".

The underlying logic is that you know best which shot you're MOST likely to successfully make, so not only does that increase the prior probability of a skilled versus a lucky shot (especially if it is a complex shot, such as one "off the rail"), but also it effectively reduces the number of chances the cue ball has to sink one of your balls without you losing your turn. It reduces those multiple chances to one single chance.

Likewise, a clinical trialist must focus on one "primary outcome" for two reasons: 1.) because preliminary data, if available, background knowledge, and logic will allow him to select the variable with the highest "pre-test probability" of causing the null hypothesis to be rejected, meaning that the post-test probability of the alternative hypothesis is enhanced; and 2.) because it reduces the probaility to find "significant" associations among multiple variables through chance alone. Today I came across a cute little experiment that drives this point home quite well. The abstract can be found here on pubmed: http://www.ncbi.nlm.nih.gov/pubmed/16895820?ordinalpos=4&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum .


In it, the authors describe "dredging" a Canadian database and looking for correlations between astrological signs and various diagnoses. Significant associations were found between the Leo sign and gastrointestinal hemorrhage, and the Saggitarius sign and humerous fracture. With this "analogy of extremes" as I like to call them, you can clearly see how the failure to define a prospective primary endpoint can lead to statistical slop. (Nobody would have been able to predict a priori that it would be THOSE two diagnoses associated with THOSE two signs!) Failure to PROSPECTIVELY identify ONE primary endpoint led to multiple chances for chance associations. Moreover, because there were no preliminary data upon which to base a primary hypothesis, the prior probability of any given alternative hypothesis is markedly reduced, and thus the posterior probability of the alternative hypothesis remains low IN SPITE OF the statistically significant result.

It is for this very reason that "positive" or significant associations among non-primary endpoint variables in clinical trials are considered "hypothesis generating" rather than hypothesis confirming. Requiring additional studies of these associations as primary endpoints is like telling your slop shot partner in the pool hall "that's great, but I need to see you do that double rail shot again to believe that it's skill rather than luck."

Reproducibility of results is indeed the hallmark of good science.