The previous post about Dr. Cox, ensnared in a CPT (Child Protection Team) witch hunt in Wisconsin, has led me to evaluate several more research reports on child abuse, including SBS (shaken baby syndrome), AHT (abusive head trauma), and sentinel injuries. These reports are rife with critical assumptions, severe limitations, and gross errors which greatly limit the resulting conclusions in most studies I have reviewed. However, one study that was pointed out to me today takes the cake. I don't know what the prevalence of this degree of misunderstanding is, but CPTs and child abuse pediatricians need make sure they have a proper understanding of sensitivity, specificity, positive and negative predictive value, base rates, etc. And they should not be testifying about the probability of child abuse at all if they don't have this stuff down cold. And I think this means that some proportion of them needs to go back to school or stop testifying.
The article and associated correspondence at issue is entitled The Positive Predictive Value of Rib Fractures as an Indicator of Nonaccidental Trauma in Children published in 2004. The authors looked at a series of rib fractures in children at a single Trauma Center in Colorado during a six year period and identified all patients with a rib fracture. They then restricted their analysis to children less than 3 years of age. There were 316 rib fractures among just 62 children in the series; the average number of rib fractures per child is ~5. The proper unit of analysis for a study looking at positive predictive value is children, sorted into those with and without abuse, and with and without rib fracture(s) as seen in the 2x2 tables below.
Among the 62 children, 51 (82.3%) were identified as victims of child abuse. How? As is typical in this field, we are not told any particular methodology, but that it's child abuse because the CPT, or in this case the CAP (Child Advocacy and Protection Team), says it's abuse. Imagine if I tried to publish a study of risk factors for pneumonia, and my methodology was that it was pneumonia if a pulmonologist said it was pneumonia. That would never get past reviewers in adult medicine. It is too vague and subjective. Pneumonia is an apt analogy here because the gold standard for pneumonia is rarely met except when say, the blood cultures are positive for a known causative organism. These are minority cases, analogous to a parental admission or video evidence of abuse. In the remainder of cases of pneumonia, significant diagnostic uncertainty is present, spanning the spectrum from a little to a lot.
The authors could have taken this number, 82.3%, and said that it represented the "positive predictive value" of rib fractures as a diagnostic test for child abuse. (PPV is the proportion of all positive tests that represent true positive tests.) If they had done so, we would have still been plagued by all of the biases that may have influenced the classification of abuse, and they are significant. But they don't do this. They count rib fractures in patients with and without abuse and fill cells "a" and "b" of a 2x2 table with those numbers. This is an unequivocally incorrect methodology and in this case it inflates what they're calling the "positive predictive value" of rib fractures to 95%, because multiple rib fractures were more common in the children with abuse. However, it is not a positive predictive value at all because in many cases they're counting a test as positive multiple times in the same child. I don't even know what that should be called, but it is nothing that an epidemiologist or clinician recognizes as anything useful at all. Each child can be represented in the 2x2 table by only one count. The authors then engage in some more shenanigans and somehow inflate the positive predictive value to 100% which just magnifies their previous errors.
Now, anybody with more than rudimentary knowledge of diagnostic testing knows that this "positive predictive value" is dead in the water and cannot be utilized at any other facility unless we assume that the prevalence or base rate of child abuse at that facility is very similar to the Colorado Level 1 Trauma Center where the study was conducted, and that x-rays were done for the same reasons, with the same thresholds and with the same rates of alternative diagnoses. This is because the proportion of patients who have the disease, in this case 82.3%, is likely to vary depending on what population is tested and who gets xrays and so on. These are huge and unfounded assumptions, and the authors appear not to understand them - or at least they fail to mention them as limitations.
Fortunately, a perspicacious correspondent, himself a Child Abuse Physician, recognized these problems, and wrote a letter to the editor pointing them out. Interested readers can see his letter, which is very well done. I take exception to only a few things he says. First, no test has 100% predictive value, or at least must be assumed not to unless rigorously proven otherwise. He also makes some assumptions about likelihood ratios, which while reasonable and serving to make his point about prevalence, distract the unwary reader from the fact that the likelihood ratios for a test cannot be known without knowing the values for cells "c" and "d" in the 2x2 table (see the first pic above). This is why this study is dead where it stands because without knowing a likelihood ratio we cannot apply any of the data to other populations without significant assumptions about unknown variables. The authors can only legitimately say that, "given x-rays ordered for trauma at our center showing rib fractures, the probability of an ultimate determination of child abuse (however fallible it may or may not be) was 82.3%"
But here's the real rub. In their reply, the authors of the study betray their vast ignorance of basic clinical research and decision principles. Their misunderstanding is so egregious that some of their statements are either patently false or are completely inscrutable - I don't even know what they're trying to say. For example:
One thing we can do is take the prevalence of 1.6% they mention (note, this is is hospitalized children as the denominator, not all children) and do as Ricci did, make some generous assumptions about what the sensitivity and specificity might be and plug those estimations into the Bayes calculator at Status Iatrogenicus to see what the posterior probabilities would be. Let's say that rib fractures are 50% sensitive for child abuse (meaning that half of all cases of child abuse have rib fractures). (This would be a/a+c in the 2x2 table.) Is this reasonable? Well if you don't like sensitivity 50%, recall how many abused children don't have rib fractures (the true rate of rib fractures in child abuse is about 35%). This number is actually quite generous. Let's also give a very high number for specificity: 99% (that is, d/d+b in the 2x2 table). Is this reasonable? Well it would mean that of 100 children who are not abused we would find only 1 patient with rib fractures. Here is the resulting 2x2 table, taken from Status Iatrogenicus:
Under assumptions of 1.6% prevalence, 50% sensitivity, and 99% specificity, the posterior probability child abuse given rib fractures is a meager 45%. Nothing to dismiss, but hardly "confirmatory" of abuse.
We could argue ad nauseum about reasonable assumptions and resulting estimates, but that is not the point here. The point is that the authors of this study of rib fractures appear to have a rudimentary at best understanding of fundamental principles of clinical investigation or decision making, especially the influence of disease prevalence on the predictive value of diagnostic testing, and even the determination of positive predictive value. These are the people who are making decisions that affect the lives and welfare of families, and influence others who make such decisions. I am no apologist or sympathizer for child abusers, far from it. But these authors just aren't facile with these simple formulas and are thus prone to egregious errors in their estimations child abuse as well as the associated uncertainty (which they don't even bother to report, in the form of a confidence interval or other measure). It is unconscionable that CPT members and CAPs are taking the stand and testifying on the basis of studies like this one, or the one on sentinel injuries by Lynn Sheets and perhaps even some of the shaken baby studies. We risk a serious miscarriage of justice because of their ignorance and false confidence. In addition, behavior such as this, which is naive and irresponsible at best and nefarious and pernicious at worst, risks undermining public trust and confidence in CPTs and child abuse physicians. If this happens, it is going to be harder to prosecute the real abusers. The child abuse community is thereby doing itself and the welfare of children a great disservice by failing to conduct credible clinical research about child abuse, and committing serious overreach and overstatement about the validity and uncertainty of evidence (and experience) in their field. It is a shame, and net harm is likely to result from it.
The article and associated correspondence at issue is entitled The Positive Predictive Value of Rib Fractures as an Indicator of Nonaccidental Trauma in Children published in 2004. The authors looked at a series of rib fractures in children at a single Trauma Center in Colorado during a six year period and identified all patients with a rib fracture. They then restricted their analysis to children less than 3 years of age. There were 316 rib fractures among just 62 children in the series; the average number of rib fractures per child is ~5. The proper unit of analysis for a study looking at positive predictive value is children, sorted into those with and without abuse, and with and without rib fracture(s) as seen in the 2x2 tables below.
Among the 62 children, 51 (82.3%) were identified as victims of child abuse. How? As is typical in this field, we are not told any particular methodology, but that it's child abuse because the CPT, or in this case the CAP (Child Advocacy and Protection Team), says it's abuse. Imagine if I tried to publish a study of risk factors for pneumonia, and my methodology was that it was pneumonia if a pulmonologist said it was pneumonia. That would never get past reviewers in adult medicine. It is too vague and subjective. Pneumonia is an apt analogy here because the gold standard for pneumonia is rarely met except when say, the blood cultures are positive for a known causative organism. These are minority cases, analogous to a parental admission or video evidence of abuse. In the remainder of cases of pneumonia, significant diagnostic uncertainty is present, spanning the spectrum from a little to a lot.
The authors could have taken this number, 82.3%, and said that it represented the "positive predictive value" of rib fractures as a diagnostic test for child abuse. (PPV is the proportion of all positive tests that represent true positive tests.) If they had done so, we would have still been plagued by all of the biases that may have influenced the classification of abuse, and they are significant. But they don't do this. They count rib fractures in patients with and without abuse and fill cells "a" and "b" of a 2x2 table with those numbers. This is an unequivocally incorrect methodology and in this case it inflates what they're calling the "positive predictive value" of rib fractures to 95%, because multiple rib fractures were more common in the children with abuse. However, it is not a positive predictive value at all because in many cases they're counting a test as positive multiple times in the same child. I don't even know what that should be called, but it is nothing that an epidemiologist or clinician recognizes as anything useful at all. Each child can be represented in the 2x2 table by only one count. The authors then engage in some more shenanigans and somehow inflate the positive predictive value to 100% which just magnifies their previous errors.
Now, anybody with more than rudimentary knowledge of diagnostic testing knows that this "positive predictive value" is dead in the water and cannot be utilized at any other facility unless we assume that the prevalence or base rate of child abuse at that facility is very similar to the Colorado Level 1 Trauma Center where the study was conducted, and that x-rays were done for the same reasons, with the same thresholds and with the same rates of alternative diagnoses. This is because the proportion of patients who have the disease, in this case 82.3%, is likely to vary depending on what population is tested and who gets xrays and so on. These are huge and unfounded assumptions, and the authors appear not to understand them - or at least they fail to mention them as limitations.
Fortunately, a perspicacious correspondent, himself a Child Abuse Physician, recognized these problems, and wrote a letter to the editor pointing them out. Interested readers can see his letter, which is very well done. I take exception to only a few things he says. First, no test has 100% predictive value, or at least must be assumed not to unless rigorously proven otherwise. He also makes some assumptions about likelihood ratios, which while reasonable and serving to make his point about prevalence, distract the unwary reader from the fact that the likelihood ratios for a test cannot be known without knowing the values for cells "c" and "d" in the 2x2 table (see the first pic above). This is why this study is dead where it stands because without knowing a likelihood ratio we cannot apply any of the data to other populations without significant assumptions about unknown variables. The authors can only legitimately say that, "given x-rays ordered for trauma at our center showing rib fractures, the probability of an ultimate determination of child abuse (however fallible it may or may not be) was 82.3%"
But here's the real rub. In their reply, the authors of the study betray their vast ignorance of basic clinical research and decision principles. Their misunderstanding is so egregious that some of their statements are either patently false or are completely inscrutable - I don't even know what they're trying to say. For example:
"The more interesting calculation suggested by Dr. Ricci is the use of Bayes' Theorem of posterior (prior) probability to strengthen the positive predictive value of a rib fracture to predict non-accidental trauma."I cannot even tell you what that means or what they're trying to convey. Maybe this will help?
"Bayes Theorem generally states that a low prevalence of a particular disease (non-accidental trauma, NAT) strengthens the positive predictive value of a positive test (rib fracture) to define the disease state (victim of NAT).Nope. Ignoring the awkward language, this is comically and completely wrong, the exact opposite of the true state of affairs. The positive predictive value of any test is lower with a lower prevalence of the disease of interest.
"The prevalence of NAT in our facility is 1.6%. We chose to use this number in the following posterior probability calculations as all NAT victims are admitted to the hospital and a true prevalence of this sample can then be calculated from the total number of children who are admitted to the hospital for any reason."This meaningless gobbledygook then further degenerates into complete nonsense:
One thing we can do is take the prevalence of 1.6% they mention (note, this is is hospitalized children as the denominator, not all children) and do as Ricci did, make some generous assumptions about what the sensitivity and specificity might be and plug those estimations into the Bayes calculator at Status Iatrogenicus to see what the posterior probabilities would be. Let's say that rib fractures are 50% sensitive for child abuse (meaning that half of all cases of child abuse have rib fractures). (This would be a/a+c in the 2x2 table.) Is this reasonable? Well if you don't like sensitivity 50%, recall how many abused children don't have rib fractures (the true rate of rib fractures in child abuse is about 35%). This number is actually quite generous. Let's also give a very high number for specificity: 99% (that is, d/d+b in the 2x2 table). Is this reasonable? Well it would mean that of 100 children who are not abused we would find only 1 patient with rib fractures. Here is the resulting 2x2 table, taken from Status Iatrogenicus:
Under assumptions of 1.6% prevalence, 50% sensitivity, and 99% specificity, the posterior probability child abuse given rib fractures is a meager 45%. Nothing to dismiss, but hardly "confirmatory" of abuse.
We could argue ad nauseum about reasonable assumptions and resulting estimates, but that is not the point here. The point is that the authors of this study of rib fractures appear to have a rudimentary at best understanding of fundamental principles of clinical investigation or decision making, especially the influence of disease prevalence on the predictive value of diagnostic testing, and even the determination of positive predictive value. These are the people who are making decisions that affect the lives and welfare of families, and influence others who make such decisions. I am no apologist or sympathizer for child abusers, far from it. But these authors just aren't facile with these simple formulas and are thus prone to egregious errors in their estimations child abuse as well as the associated uncertainty (which they don't even bother to report, in the form of a confidence interval or other measure). It is unconscionable that CPT members and CAPs are taking the stand and testifying on the basis of studies like this one, or the one on sentinel injuries by Lynn Sheets and perhaps even some of the shaken baby studies. We risk a serious miscarriage of justice because of their ignorance and false confidence. In addition, behavior such as this, which is naive and irresponsible at best and nefarious and pernicious at worst, risks undermining public trust and confidence in CPTs and child abuse physicians. If this happens, it is going to be harder to prosecute the real abusers. The child abuse community is thereby doing itself and the welfare of children a great disservice by failing to conduct credible clinical research about child abuse, and committing serious overreach and overstatement about the validity and uncertainty of evidence (and experience) in their field. It is a shame, and net harm is likely to result from it.
No comments:
Post a Comment