Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.

Additional Metadata
Keywords Adverse drug reactions, Corpus development, Machine learning, Text mining
Persistent URL,
Journal Journal of Biomedical Informatics
Grant This work was funded by the European Commission 7th Framework Programme; grant id fp7/215847 - Exploring and understanding adverse drug reactions by integrative mining of clinical records and biomedical knowledge (EU-ADR)
van Mulligen, E.M, Fourrier-Reglat, A, Gurwitz, D, Molokhia, M, Nieto, A, Trifiro, G, … Furlong, L.I. (2012). The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships. Journal of Biomedical Informatics, 45(5), 879–884. doi:10.1016/j.jbi.2012.04.004