Interpreting observational studies: Why empirical calibration is needed to correct p-values

Schuemie, Martijn; Ryan, Patrick; DuMouchel, William; Suchard, Marc; Madigan, David

doi:10.1002/sim.5925

M.J. Schuemie (Martijn), P.B. Ryan (Patrick), W. DuMouchel (William), M.A. Suchard (Marc) and D. Madigan (David)

2014-01-30

Interpreting observational studies: Why empirical calibration is needed to correct p-values

Statistics in Medicine , Volume 33 - Issue 2 p. 209- 218

Often the literature makes assertions of medical product effects on the basis of ' p<0.05'. The underlying premise is that at this threshold, there is only a 5% probability that the observed effect would be seen by chance when in reality there is no effect. In observational studies, much more than in randomized trials, bias and confounding may undermine this premise. To test this premise, we selected three exemplar drug safety studies from literature, representing a case-control, a cohort, and a self-controlled case series design. We attempted to replicate these studies as best we could for the drugs studied in the original articles. Next, we applied the same three designs to sets of negative controls: drugs that are not believed to cause the outcome of interest. We observed how often p<0.05 when the null hypothesis is true, and we fitted distributions to the effect estimates. Using these distributions, we compute calibrated p-values that reflect the probability of observing the effect estimate under the null hypothesis, taking both random and systematic error into account. An automated analysis of scientific literature was performed to evaluate the potential impact of such a calibration. Our experiment provides evidence that the majority of observational studies would declare statistical significance when no effect is present. Empirical calibration was found to reduce spurious results to the desired 5% level. Applying these adjustments to literature suggests that at least 54% of findings with p<0.05 are not actually statistically significant and should be reevaluated.

Additional Metadata
Keywords	Calibration, Hypothesis testing, Negative controls, Observational studies
Persistent URL	doi.org/10.1002/sim.5925, hdl.handle.net/1765/67635
Journal	Statistics in Medicine
Organisation	Department of Medical Informatics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Schuemie, M., Ryan, P., DuMouchel, W., Suchard, M., & Madigan, D. (2014). Interpreting observational studies: Why empirical calibration is needed to correct p-values. Statistics in Medicine, 33(2), 209–218. doi:10.1002/sim.5925

Free Full Text (Manuscript at PubMed Central)

Interpreting observational studies: Why empirical calibration is needed to correct p-values

Publication

Publication

About

Interpreting observational studies: Why empirical calibration is needed to correct p-values

Publication

Publication

Workflow

Workflow

Add Content