Concept recognition in French biomedical text using automatic translation

Afzal, Zubair; Akhondi, Saber; van Haagen, Herman; Van Mulligen, Erik M.; Kors, Jan

doi:10.1007/978-3-319-44564-9_13

M.Z. Afzal (Zubair), S.A. Akhondi (Saber), H.H.H.B.M. van Haagen (Herman), E.M. Van Mulligen (Erik M.) and J.A. Kors (Jan)

2016-08-23

Concept recognition in French biomedical text using automatic translation

We describe the development of a concept recognition system for French documents and its application in task 1b of the 2015 CLEF eHealth challenge. This community challenge included recognition of entities in a French medical corpus, normalization of the recognized entities, and normalization of entity mentions that had been manually annotated. Normalization had to be based on the Unified Medical Language System (UMLS). We addressed all three subtasks by a dictionary-based approach using Peregrine, our open-source indexing engine. To increase the coverage of our initial French terminology, we explored the use of two automatic translators, Google Translate and Microsoft Translator, to translate English UMLS terms into French. The corpus consisted of 1665 titles of French Medline abstracts and 6 French drug labels of the European Medicines Agency (EMEA). The corpus was manually annotated with concepts from the UMLS, and split in an equally-sized training and test set. The best performance on the training set was obtained with a terminology that contained the intersection of the translated terms in combination with several post-processing steps to reduce the number of false-positive detections. When evaluated on the test set, our system achieved F-scores of 0.756 and 0.665 for entity recognition on the EMEA documents and Medline titles, respectively. For subsequent entity normalization, the F-scores were 0.711 and 0.587. Entity normalization given the manually annotated entity mentions resulted in F-scores of 0.872 and 0.671. Our system obtained the highest F-scores among the systems that participated in the challenge.

Additional Metadata
Keywords	Concept identification, Entity recognition, French terminology, Term translation
Persistent URL	doi.org/10.1007/978-3-319-44564-9_13, hdl.handle.net/1765/96295
Series	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Rights	No subscription
Organisation	Department of Medical Informatics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Afzal, Z., Akhondi, S., van Haagen, H., Van Mulligen, E. M., & Kors, J. (2016). Concept recognition in French biomedical text using automatic translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). doi:10.1007/978-3-319-44564-9_13

Concept recognition in French biomedical text using automatic translation

Publication

Publication

About

Concept recognition in French biomedical text using automatic translation

Publication

Publication

Workflow

Workflow

Add Content