Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.

Adverse drug reactions, Corpus development, Machine learning, Text mining,
Journal of Biomedical Informatics
This work was funded by the European Commission 7th Framework Programme; grant id fp7/215847 - Exploring and understanding adverse drug reactions by integrative mining of clinical records and biomedical knowledge (EU-ADR)
Erasmus MC: University Medical Center Rotterdam

van Mulligen, E.M, Fourrier-Reglat, A, Gurwitz, D, Molokhia, M, Nieto, A, Trifirò, G, … Furlong, L.I. (2012). The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships. Journal of Biomedical Informatics, 45(5), 879–884. doi:10.1016/j.jbi.2012.04.004