Reaction SMILES postprocessing

Alain C. Vaucher; Philippe Schwaller; Joppe Geluykens; Vishnu H. Nair; Anna Iuliano; Teodoro Laino

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Reaction SMILES postprocessing

AV Alain C. Vaucher

PS Philippe Schwaller

JG Joppe Geluykens

VN Vishnu H. Nair

AI Anna Iuliano

TL Teodoro Laino

This method is extracted from research article: Nat Commun, May 2021

Inferring experimental procedures from text-based representations of chemical reactions

DOI: 10.1038/s41467-021-22951-1

Request a Protocol

Ask a question

Favorite

The Pistachio database provides reaction SMILES strings parsed into reactant, reagent, and product molecules. We merged the reactant and reagent molecules into a list of precursor molecules, and all the SMILES strings were canonicalized with RDKit^³³. For both lists of precursor and product SMILES, we removed the duplicates and reordered the lists alphabetically. The concatenation of the SMILES strings produced the reaction SMILES used for training. Following the reaction SMILES notation, we separated the molecules within the same class using dots (“.”), while the precursor and product lists were separated by “>>”. For fragment bonds, we adopted the convention of using the tilde symbol (“~”) instead of a dot.

For use in language-based models, the reaction SMILES is tokenized by inserting spaces between the SMILES tokens.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol