Ambiguity of non-systematic chemical identifiers within and between small-molecule databases

Akhondi, Saber; Muresan, Cornelia; Williams, Antony; Kors, Jan

doi:10.1186/s13321-015-0102-6

S.A. Akhondi (Saber), C. Muresan (Cornelia), A.J. Williams (Antony) and J.A. Kors (Jan)

2015-12-01

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases

Journal of Cheminformatics , Volume 7 - Issue 1

Background: A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers. Results: The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7-60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points). Conclusions: Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance.

Additional Metadata
Keywords	Chemical databases, Chemical name ambiguity, Molecular structure, Non-systematic chemical identifiers, Quality control
Persistent URL	doi.org/10.1186/s13321-015-0102-6, hdl.handle.net/1765/79177
Journal	Journal of Cheminformatics
Organisation	Department of Medical Informatics
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Akhondi, S., Muresan, C., Williams, A.& Kors, J. (2015). Ambiguity of non-systematic chemical identifiers within and between small-molecule databases. Journal of Cheminformatics, 7(1).https://doi.org/10.1186/s13321-015-0102-6

Free Full Text (Manuscript at PubMed Central)

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases

Publication

Publication

About

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases

Publication

Publication

Workflow

Workflow

Add Content