An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design

Reps, Jenna M.; Rijnbeek, Peter; Cuthbert, Alana; Ryan, Patrick; Pratt, Nicole; Schuemie, Martijn

doi:10.1186/s12911-021-01408-x

Reps, J.M. (Jenna M.), P.R. Rijnbeek (Peter), Cuthbert, A. (Alana), P.B. Ryan (Patrick), Pratt, N. (Nicole) and M.J. Schuemie (Martijn)

2021-02-07

An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design

B M C Medical Informatics and Decision Making , Volume 21 - Issue 1

Background: Researchers developing prediction models are faced with numerous design choices that may impact model performance. One key decision is how to include patients who are lost to follow-up. In this paper we perform a large-scale empirical evaluation investigating the impact of this decision. In addition, we aim to provide guidelines for how to deal with loss to follow-up. Methods: We generate a partially synthetic dataset with complete follow-up and simulate loss to follow-up based either on random selection or on selection based on comorbidity. In addition to our synthetic data study we investigate 21 real-world data prediction problems. We compare four simple strategies for developing models when using a cohort design that encounters loss to follow-up. Three strategies employ a binary classifier with data that: (1) include all patients (including those lost to follow-up), (2) exclude all patients lost to follow-up or (3) only exclude patients lost to follow-up who do not have the outcome before being lost to follow-up. The fourth strategy uses a survival model with data that include all patients. We empirically evaluate the discrimination and calibration performance. Results: The partially synthetic data study results show that excluding patients who are lost to follow-up can introduce bias when loss to follow-up is common and does not occur at random. However, when loss to follow-up was completely at random, the choice of addressing it had negligible impact on model discrimination performance. Our empirical real-world data results showed that the four design choices investigated to deal with loss to follow-up resulted in comparable performance when the time-at-risk was 1-year but demonstrated differential bias when we looked into 3-year time-at-risk. Removing patients who are lost to follow-up before experiencing the outcome but keeping patients who are lost to follow-up after the outcome can bias a model and should be avoided. Conclusion: Based on this study we therefore recommend (1) developing models using data that includes patients that are lost to follow-up and (2) evaluate the discrimination and calibration of models twice: on a test set including patients lost to follow-up and a test set excluding patients lost to follow-up.

Additional Metadata
Keywords	Best practices, Censoring, Loss to follow-up, Model development, PatientLevelPrediction, Prognostic model
Persistent URL	doi.org/10.1186/s12911-021-01408-x, hdl.handle.net/1765/134847
Journal	B M C Medical Informatics and Decision Making
Organisation	Department of Medical Informatics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Reps, J.M. (Jenna M.), Rijnbeek, P., Cuthbert, A. (Alana), Ryan, P., Pratt, N. (Nicole), & Schuemie, M. (2021). An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design. B M C Medical Informatics and Decision Making, 21(1). doi:10.1186/s12911-021-01408-x

Free Full Text ( Final Version , 2mb )

An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design

Publication

Publication

About

An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design

Publication

Publication

Workflow

Workflow

Add Content