METHOD
Bhairpred server is based on machine learning technique SVM using single sequence information, evolutionary profile, predicted and observed secondary structure (as obtained using Psipred and DSSP), predicted and observed accessibility values (as obtainned from Netasa and DSSP). The methods were trained and tested on dataset of 2880 proteins and their performance was evaluated on dataset of 534 proteins used by Thornton (PNAS, 2002). Best prediction results were obtained with hybrid approach that combined prediction results from evolutionary profile, predicted secondary structure and accessibility.
Beta-Hairpin Dataset
2880 protein chains were selected from PDB and secondary structure of each amino acid was assigned using DSSP. Strech of amino acids that form sheet-coil-sheet (ECE) regions were extracted. Amino acids forming β-hairpins in these proteins were extracted using PROMOTIF. ECE patterns that was assigned hairpins by PROMOTIF were taken as positive examples and remaining as negative examples. On analysis it was found that length of hairpin varies between 5-22 and majority of them has length 17 amino acids. Hence we fixed 17 residues with maximum coil region 10 residues and minimum sheet length 3 residues. In case of less than 17 residues, flanking residues were taken to complete the required length. The final dataset has 5102 hairpins and 5131 non-hairpins.
SVM Models
Different input features were used to develop the SVM Model. was constructed using single sequence information, PSI-BLAST evolutionary profile, secondary structure and accessibility.
predicted and observed [http://bioinf.cs.ucl.ac.uk/psipred/] (obtained from PSI-PRED and DSSP [ftp://ftp.embl-heidelberg.de/pub/databases/dssp] respectively) observed and predicted from DSSP and NetASA server (http://www.netasa.org) respectively
Feature Representation :