Analysis of Sapounidou scheme domain coverage was performed through screening of an “extended inventory”, consisting of more than 75,000 compounds. The origins of this are described in Table S4. To provide as broad a possible coverage of chemical space and so more effectively identify areas yet uncovered by current rules, substances were drawn from nine publicly available data sets—several of which were specific in terms of use-class and origin. Termed “defined-use inventories”, these included pesticides, pharmaceuticals, botanical natural products, and cosmetic constituents, alongside the European Chemical Agency (ECHA) Registration, Evaluation, Authorisation and restriction of Chemicals (REACH) preregistration list. Chemicals present within each set were subject to preprocessing, within which available SMILES were canonicalized (Open Babel v.2.4.0; http://openbabel.org/wiki/Main_Page),35 salt components were stripped, and stereochemical information was deleted. Duplicate entries were removed, alongside inorganics and those lacking defined structures such as mixtures and polymers.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.