A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data

Wynants, L.; Bouwmeester, Walter; Moons, Karel; Moerbeek, Mirjam; Timmerman, D.; Van Huffel, Sabine; Van Calster, B.; Vergouwe, Yvonne

doi:10.1016/j.jclinepi.2015.02.002

L. Wynants, W. Bouwmeester (Walter), K.G.M. Moons (Karel), M. Moerbeek (Mirjam), D. Timmerman, S. Van Huffel (Sabine), B. Van Calster and Y. Vergouwe (Yvonne)

2015-12-01

A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data

Journal of Clinical Epidemiology , Volume 68 - Issue 12 p. 1406- 1414

Objectives This study aims to investigate the influence of the amount of clustering [intraclass correlation (ICC) = 0%, 5%, or 20%], the number of events per variable (EPV) or candidate predictor (EPV = 5, 10, 20, or 50), and backward variable selection on the performance of prediction models. Study Design and Setting Researchers frequently combine data from several centers to develop clinical prediction models. In our simulation study, we developed models from clustered training data using multilevel logistic regression and validated them in external data. Results The amount of clustering was not meaningfully associated with the models' predictive performance. The median calibration slope of models built in samples with EPV = 5 and strong clustering (ICC = 20%) was 0.71. With EPV = 5 and ICC = 0%, it was 0.72. A higher EPV related to an increased performance: the calibration slope was 0.85 at EPV = 10 and ICC = 20% and 0.96 at EPV = 50 and ICC = 20%. Variable selection sometimes led to a substantial relative bias in the estimated predictor effects (up to 118% at EPV = 5), but this had little influence on the model's performance in our simulations. Conclusion We recommend at least 10 EPV to fit prediction models in clustered data using logistic regression. Up to 50 EPV may be needed when variable selection is performed.

Additional Metadata
Keywords	Clustered data, Events per variable, Logistic model, Multicenter study, Prediction model, Simulation study
Persistent URL	doi.org/10.1016/j.jclinepi.2015.02.002, hdl.handle.net/1765/91572
Journal	Journal of Clinical Epidemiology
Organisation	Department of Public Health
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Wynants, L., Bouwmeester, W., Moons, K., Moerbeek, M., Timmerman, D., Van Huffel, S., … Vergouwe, Y. (2015). A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data. Journal of Clinical Epidemiology, 68(12), 1406–1414. doi:10.1016/j.jclinepi.2015.02.002

A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data

Publication

Publication

About

A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data

Publication

Publication

Workflow

Workflow

Add Content