Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection
Genes are discovered almost on a daily basis and new names have to be found. Although there are guidelines for gene nomenclature, the naming process is highly creative. Human genes are often named with a gene symbol and a longer, more descriptive term; the short form is very often an abbreviation of the long form. Abbreviations in biomedical language are highly ambiguous, i.e., one gene symbol often refers to more than one gene.Using an existing abbreviation expansion algorithm,we explore MEDLINE for the use of human gene symbols derived from LocusLink. It turns out that just over 40% of these symbols occur in MEDLINE, however, many of these occurrences are not related to genes. Along the process of making an inventory, a disambiguation test collection is constructed automatically.
|Keywords||*Databases, Genetic, *Genes, *MEDLINE, *Terminology, *Vocabulary, Controlled, Algorithms, Humans, Unified Medical Language System|
Weeber, M., Schijvenaars, R.J.A., van Mulligen, E.M., Mons, B., Jelier, R., van der Eijk, C.C., & Kors, J.A.. (2003). Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection. Proceedings : a conference of the American Medical Informatics Association / . AMIA Annual Fall Symposium. AMIA Fall Symposium, 704–708. Retrieved from http://hdl.handle.net/1765/10293