Menu
Home
Help
Dataset
Algorithm
Supplemantary
Developers
Contact
BIC, IMTECH
|
Algorithm
Support vector machine
The SVM was implemented using freely downloadable software package SVM_light (Joachims, 1999). The input vectors used are amino acid composition (20 vectors) and dipeptide composition (400 vectors) of each protein sequence.
Amino acid composition. Amino acid composition is the fraction of each amino acid in a protein. The fraction of all 20 natural amino acids was calculated and used as input vector.
Dipeptide composition. Dipeptide composition was used to encapsulate the global information about each protein sequence, which gives a fixed pattern length of 400 (20 ´ 20). This representation encompassed the information about amino acid composition along local order of amino acid.
Hidden markov model
HMM profiles of four types of ion channels (sodium, potassium, calcium and chloride) were constructed(Eddy, 1998). Each protein sequences were aligned in a multiple sequence alignment using CLUSTAL-W. A profile HMM was build with hmmbuild program for each class and later each profile was calibrated with hmmcalibrate program. We created our own HMM database by concating each single HMM profile files. Hmmpfam program was used for searching a query sequence against the created profile HMM database. We set an E-value threshold (e value>0.01) while predicting quality by a five fold cross-validation.
Down load Hmm files
PSI-Blast
A module of PSI-BLAST (Altschul et al., 1997) was designed in which query sequences in test dataset were searched against proteins in training dataset using PSI-BLAST. Three iterations of PSI-BLAST were carried out at a cut-off E-value of 0.01. The module could predict ion-channels, types of ion-channels (sodium, potassium, calcium, chloride) depending upon the similarity of the query protein to the protein in the dataset
Performance measures
Five-fold cross-validation
The performance modules constructed in this study for discriminating ion-channels, and types of ion-channels were evaluated using a 5-fold cross-validation technique. In the 5-fold cross-validation, the relevant dataset was randomly divided into five sets. The training and testing was carried out five times, each time using one distinct set for testing and the remaining four sets for training. Five threshold-dependent parameters - sensitivity, specificity, accuracy, PPV and Matthew’s correlation coefficient (MCC), [Baldi et al., 2000] were used for predicting ion-channelsand also in classification of ion-channels.
|