A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Pandi, Maria-Theodora; van der Spek, Peter; Koromina, Maria; Patrinos, George

doi:10.3389/fphar.2020.602030

Pandi, M.-T. (Maria-Theodora), P.J. van der Spek (Peter), Koromina, M. (Maria) and G.P. Patrinos (George)

2020-11-10

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Text mining in biomedical literature is an emerging field which has already been shown to have a variety of implementations in many research areas, including genetics, personalized medicine, and pharmacogenomics. In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations. The code that was used toward this end was implemented using R programming language, either through custom scripts, where needed, or through utilizing functions from existing libraries. Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central. Terms that denote a Mutation or a Gene as well as Chemical compound terms corresponding to drug compounds were normalized and the sentences containing the aforementioned terms were filtered and preprocessed to create appropriate training sets. Finally, after training and adequate hyperparameter tuning, four text classifiers were created and evaluated (FastText, Linear kernel SVMs, XGBoost, Lasso, and Elastic-Net Regularized Generalized Linear Models) with regard to their performance in identifying pharmacogenomics associations. Although further improvements are essential toward proper implementation of this text-mining approach in the clinical practice, our study stands as a comprehensive, simplified, and up-to-date approach for the identification and assessment of research articles enriched in clinically relevant pharmacogenomics relationships. Furthermore, this work highlights a series of challenges concerning the effective application of text mining in biomedical literature, whose resolution could substantially contribute to the further development of this field.

Additional Metadata
Keywords	FastText, biomedical text classification, supervised learning, natural language processing, pharmacogenomics associations, Pubmed, Pubtator, text mining
Persistent URL	doi.org/10.3389/fphar.2020.602030, hdl.handle.net/1765/131929
Journal	Frontiers in Pharmacology
Organisation	Department of Pathology
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Pandi, M.-T. (Maria-Theodora), van der Spek, P., Koromina, M. (Maria), & Patrinos, G. (2020). A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature. Frontiers in Pharmacology, 11. doi:10.3389/fphar.2020.602030

Free Full Text ( Final Version , 1mb )

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Publication

Publication

About

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Publication

Publication

Workflow

Workflow

Add Content