2. InChI and SMILES identifiers for chemical structures¶

2.1. Main Objectives¶
The main purpose of this recipe is:
To take an SDF file, validate the content for chemical inconsistencies, and generate InChIs, InChIKeys, and SMILES for each entry in the SDF file.
Actions.Objectives.Tasks | Input | Output |
---|---|---|
Skill dependency:
Bash experience
Technical requirements:
Groovy
2.2. Creating InChI and SMILES identifiers for chemical structures¶
To run the below scripts, you need a Groovy installation. The Groovy scripts use version 2.7.1 of the Chemistry Development Kit (see 2). This library and its use in Groovy is further explain in the book Groovy Cheminformatics with the Chemistry Development Kit. Check this git repository for more detailed use instructions and where to find the tools: https://github.com/FAIRplus/fairplus-sdf
2.2.1. Record validation¶
When generating InChIs, the InChI library (see 1) may return several success states reflecting issues with the compound record in the SDF file, including: WARNING and ERROR. This first script reports such issues:
The output may look like this:
2.2.2. Calculate InChls¶
Similarly, InChIKeys can be generated:
When the success state is ERROR, nothing is outputted.
2.2.3. Calculate SMILES strings¶
The last script calculates a SMILES for each entry in the SDF file:
2.3. Conclusion¶
This recipe explained who to validate the chemical structures in an SDF file, and convert them to SMILES, InChI, and InChIKey. The latter can then be used with BridgeDb and its metabolite ID mapping databases to get additional identifiers.
2.3.1. What to read next?¶

FAIRsharing records appearing in this recipe:
2.4. References¶
References
- 1
Jonathan M. Goodman, Igor Pletnev, Paul Thiessen, Evan Bolton, and Stephen R. Heller. Inchi version 1.06: now more than 99.99. Journal of Cheminformatics, may 24 2021.
- 2
Egon Willighagen, John W Mayfield, Jonathan Alvarsson, Arvid Berg, Lars Carlsson, Nina Jeliazkova, Stefan Kuhn, Tomáš Pluskal, Miquel Rojas-Chertó, Ola Spjuth, Gilleain Torrance, Chris T. Evelo, Rajarshi Guha, and Christoph Steinbeck. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. Journal of Cheminformatics, jun 6 2017.