The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships
Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.
|Keywords||Adverse drug reactions, Corpus development, Machine learning, Text mining|
|Persistent URL||dx.doi.org/10.1016/j.jbi.2012.04.004, hdl.handle.net/1765/37388|
van Mulligen, E.M., Fourrier-Reglat, A., Gurwitz, D., Molokhia, M., Nieto, A., Trifiro, G., … Furlong, L.I.. (2012). The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships. Journal of Biomedical Informatics, 45(5), 879–884. doi:10.1016/j.jbi.2012.04.004