Multilingual terminological resources can be drawn from parallel corpora in the languages of interest, possibly exploiting machine translation solutions for term identification. This main objective of the CLEF-ER challenge involves parallel corpora in English and other languages. The challenge organisers have gathered and normalized documents from the biomedical domain: titles from scientific articles, drug labels from the European Medicines Agency, and patent texts from the European Patent Office. The parallel units have been identified, marked-up and formatted for future use. The three different corpora show comparable sizes. In preparation of the CLEF-ER challenge, the documents have been annotated with terminologies in English and non-English languages (de, fr, es, and nl) and the pre-existing terminological resource has been optimized for the entity recognition task in CLEF-ER. Finally a silver standard corpus for entity annotations and their identifiers has been produced on the English documents for the evaluation of challenge contributions.

Additional Metadata
Persistent URL
Conference 2013 Cross Language Evaluation Forum Conference, CLEF 2013
Rebholz-Schuhmann, D, Clematide, S, Rinaldi, F, Kafkas, S, Van Mulligen, E.M, Bui, C, … Kors, J.A. (2013). Multilingual semantic resources and parallel corpora in the biomedical domain: The CLEF-ER challenge. Presented at the 2013 Cross Language Evaluation Forum Conference, CLEF 2013. Retrieved from