There is a strong and continuously growing interest in using large electronic healthcare databases to study health outcomes and the effects of pharmaceutical products. However, concerns regarding disease misclassification (i.e. classification errors of the disease status) and its impact on the study results are legitimate. Validation is therefore increasingly recognized as an essential component of database research. In this work, we elucidate the interrelations between the true prevalence of a disease in a database population (i.e. prevalence assuming no disease misclassification), the observed prevalence subject to disease misclassification, and the most common validity indices: sensitivity, specificity, positive and negative predictive value. Based on this, we obtained analytical expressions to derive all the validity indices and true prevalence from the observed prevalence and any combination of two other parameters. The analytical expressions can be used for various purposes. Most notably, they can be used to obtain an estimate of the observed prevalence adjusted for outcome misclassification from any combination of two validity indices and to derive validity indices from each other which would otherwise be difficult to obtain. To allow researchers to easily use the analytical expressions, we additionally developed a user-friendly and freely available web-application.,
Erasmus MC: University Medical Center Rotterdam

Bollaerts, K. (Kaatje), Rekkas, A. (Alexandros), De Smedt, T. (Tom), Dodd, C.N, Andrews, N.J, & Gini, R. (2020). Disease misclassification in electronic healthcare database studies: Deriving validity indices — A contribution from the ADVANCE project. PLoS ONE, 15(4). doi:10.1371/journal.pone.0231333