Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Afzal, Zubair; Schuemie, Martijn; van Blijderveen, Nico; Sen, Elif; Sturkenboom, Miriam; Kors, Jan

doi:10.1186/1472-6947-13-30

M.Z. Afzal (Zubair), M.J. Schuemie (Martijn), J.C. van Blijderveen (Nico), E.F. Sen (Elif), M.C.J.M. Sturkenboom (Miriam) and J.A. Kors (Jan)

2013-03-05

Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

B M C Medical Informatics and Decision Making , Volume 13 - Issue 1

Background: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. Methods. We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. Results: For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. Conclusions: We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation.

Additional Metadata
Keywords	Class imbalance, Cost sensitive learning, Electronic health records, Improving sensitivity, Random sampling
Persistent URL	doi.org/10.1186/1472-6947-13-30, hdl.handle.net/1765/62123
Journal	B M C Medical Informatics and Decision Making
Organisation	Erasmus MC: University Medical Center Rotterdam
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Afzal, Z., Schuemie, M., van Blijderveen, N., Sen, E., Sturkenboom, M.& Kors, J. (2013). Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records. B M C Medical Informatics and Decision Making, 13(1).https://doi.org/10.1186/1472-6947-13-30

Free Full Text ( Final Version , 337kb )

Additional Files
pubmedcentral Author Manuscript

Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Publication

Publication

About

Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

Publication

Publication

Workflow

Workflow

Add Content