Multilingual semantic resources and parallel corpora in the biomedical domain: The CLEF-ER challenge
Multilingual terminological resources can be drawn from parallel corpora in the languages of interest, possibly exploiting machine translation solutions for term identification. This main objective of the CLEF-ER challenge involves parallel corpora in English and other languages. The challenge organisers have gathered and normalized documents from the biomedical domain: titles from scientific articles, drug labels from the European Medicines Agency, and patent texts from the European Patent Office. The parallel units have been identified, marked-up and formatted for future use. The three different corpora show comparable sizes. In preparation of the CLEF-ER challenge, the documents have been annotated with terminologies in English and non-English languages (de, fr, es, and nl) and the pre-existing terminological resource has been optimized for the entity recognition task in CLEF-ER. Finally a silver standard corpus for entity annotations and their identifiers has been produced on the English documents for the evaluation of challenge contributions.
|Conference||2013 Cross Language Evaluation Forum Conference, CLEF 2013|
Rebholz-Schuhmann, D, Clematide, S, Rinaldi, F, Kafkas, S, Van Mulligen, E.M, Bui, C, … Kors, J.A. (2013). Multilingual semantic resources and parallel corpora in the biomedical domain: The CLEF-ER challenge. Presented at the 2013 Cross Language Evaluation Forum Conference, CLEF 2013. Retrieved from http://hdl.handle.net/1765/90842