This section provides the details regarding the steps involved in performing structural annotation of peptides in SATPdb database. Following are the steps:
First, all the peptides are matched against the PDB database for exact match to identify identical peptides. If a match is found, the tertiary structure of the peptide is taken as is from PDB.
Next, the tertiary structure of remaining peptides which are not matched against PDB and have length <= 30 residues, are predicted using PEPstrMOD. PEPstrMOD implements PEPstr algorithm to predict the structure of peptides having natural amino acids. Those peptides having modified residues (like PTMs, non-natural residues, terminal modifications, etc.) are predicted by PEPstrMOD using special force field libraries which have force field parameters to handle such modified residues.
The structure obtained using the above step is then subjected to energy minimization and molecular dynamics simulation (using AMBER or GROMACS software packages) to yield the final predicted tertiary structure of the peptide.
The peptides with length > 30 residues were predicted using homology modeling approach. HHsearch and HHblits were used to search for the templates for query peptide sequence and if template had probability >= 70% then those templates were given to MODELLER software to build the tertiary structure. If no template was found with >= 70% probability then we used I-TASSER suite to predict their structures.
The structure of peptides with complex chemical modifications were not predicted.
Assignment of secondary structural states of Peptides
After prediction of the tertiary structure of peptides, we use DSSP software to assign 8 types of secondary structure states of each peptide structure. DSSP is a state-of-art software for assigning secondary structure states. It assigns 8 states which are H: alpha-helix; B: isolated beta-bridge; E: extended strand; G: 3/10 helix; I: pi helix; T: Turn; S: Bend and Blank/C: Loop.
To simplify the secondary structural states, we reduce the 8 types of secondary structure states into 4 types by using following conversion which is common in literature [T/S/C to Coil (C); G/H/I to Helix (H) and B/E to Strand (E)].
Generation of SMILES notation
Open babel software was used to generate the SMILES notation of all the peptide structures predicted in the above steps. Open babel requires tertiary structure of peptide in PDB format as an input and converts it into SMILES notation