A tutorial on variable selection for clinical prediction models: Feature selection methods in data mining could improve the results

Bagherzadeh-Khiabani, Farideh; Ramezankhani, Azra; Azizi, Joshan; Hadaegh, Farzad; Steyerberg, Ewout; Khalili, Davood

doi:10.1016/j.jclinepi.2015.10.002

F. Bagherzadeh-Khiabani (Farideh), A. Ramezankhani (Azra), J. Azizi (Joshan), F. Hadaegh (Farzad), E.W. Steyerberg (Ewout) and D. Khalili (Davood)

2016

A tutorial on variable selection for clinical prediction models

Feature selection methods in data mining could improve the results

Journal of Clinical Epidemiology , Volume 71 p. 76- 85

Objectives: Identifying an appropriate set of predictors for the outcome of interest is a major challenge in clinical prediction research. The aim of this study was to show the application of some variable selection methods, usually used in data mining, for an epidemiological study. We introduce here a systematic approach.
Study Design and Setting: The P-value-based method, usually used in epidemiological studies, and several filter and wrapper methods were implemented to select the predictors of diabetes among 55 variables in 803 prediabetic females, aged ≥20years, followed for 10-12years. To develop a logistic model, variables were selected from a train data set and evaluated on the test data set. The measures of Akaike information criterion (AIC) and area under the curve (AUC) were used as performance criteria. We also implemented a full model with all 55 variables.
Results: We found that the worst and the best models were the full model and models based on the wrappers, respectively. Among filter methods, symmetrical uncertainty gave both the best AUC and AIC.
Conclusion: Our experiment showed that the variable selection methods used in data mining could improve the performance of clinical prediction models. An R program was developed to make these methods more feasible and visualize the results.

Additional Metadata
Keywords	Data mining, Feature selection, Methods, Prediction, Statistical model, Variable selection
Persistent URL	doi.org/10.1016/j.jclinepi.2015.10.002, hdl.handle.net/1765/87755
Journal	Journal of Clinical Epidemiology
Organisation	Department of Public Health
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Bagherzadeh-Khiabani, F., Ramezankhani, A., Azizi, J., Hadaegh, F., Steyerberg, E., & Khalili, D. (2016). A tutorial on variable selection for clinical prediction models: Feature selection methods in data mining could improve the results. Journal of Clinical Epidemiology, 71, 76–85. doi:10.1016/j.jclinepi.2015.10.002

A tutorial on variable selection for clinical prediction models

Publication

Publication

About

A tutorial on variable selection for clinical prediction models

Publication

Publication

Workflow

Workflow

Add Content