Text Mining for Chemical Compounds
Tekstmining naar chemische stoffen
Exploring the chemical and biological space covered by patent and journal publications is crucial in early- stage medicinal chemistry activities. The analysis provides understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents and journals through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. In this book, we addressed the lack of quality measurements for assessing the correctness of structural representation within and across chemical databases; lack of resources to build text-mining systems; lack of high performance systems to extract chemical compounds from journals and patents; and lack of automated systems to identify relevant compounds in patents. The consistency and ambiguity of chemical identifiers was analyzed within and between small- molecule databases in Chapter 2 and Chapter 3. In Chapter 4 and Chapter 7 we developed resources to enable the construction of chemical text-mining systems. In Chapter 5 and Chapter 6, we used community challenges (BioCreative V and BioCreative VI) and their corresponding resources to identify mentions of chemical compounds in journal abstracts and patents. In Chapter 7 we used our findings in previous chapters to extract chemical named entities from patent full text and to classify the relevancy of chemical compounds.