We participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach. For entity recognition and normalization, we used Peregrine, our open-source indexing engine, with a dictionary based on French terms in the Unified Medical Language System (UMLS) supplemented with English UMLS terms that were translated into French with automatic translators. For ICD-10 coding, we used the Solr text tagger, together with one of two ICD-10 terminologies derived from the task training ma-terial. To reduce the number of false-positive detections, we implemented several post-processing steps. On the challenge test set, our best system obtained F-scores of 0.702 and 0.651 for entity recognition in the drug labels and in the Medline titles, respectively. For entity normalization, F-scores were 0.529 and 0.474. On the test set for ICD-10 coding, our system achieved an F-score of 0.848 (precision 0.886, recall 0.813). These scores were substantially higher than the average score of the systems that participated in the challenge.

, , , ,
2016 Working Notes of Conference and Labs of the Evaluation Forum, CLEF 2016
Erasmus MC: University Medical Center Rotterdam

Van Mulligen, E. M., Afzal, Z., Akhondi, S., Vo, D., & Kors, J. (2016). Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts. In CEUR Workshop Proceedings (pp. 171–178). Retrieved from http://hdl.handle.net/1765/100036