Sequence Name: The user can give a name of sequence which he wish to input for the prediction.
Simple composition based prediction: In this percentage composition of all 20 amino acids were calculated, which inturn were used to derive the weight corrosponding to each amino acid. It was done by substracting the composition data. To determine the any unknown protein, compositions is calculated and then corrosponding weight is multiplied to it. All the 20 values determined in this way is summed up to get the cumulative score. If the cumulative score is less than 0 then it will be classified as non-GST protein and vice-versa.
Simple dipeptide composition based prediction: In this percentage composition of all 400 dipeptides were calculated, which inturn were used to derive the weight corrosponding to each dipeptide. It was done by substracting the composition data. To determine the any unknown protein, dipeptide compositions is calculated and then corrosponding weight is multiplied to it. All the 400 values determined in this way is summed up to get the cumulative score. If the cumulative score is less than 0 then it will be classified as non-GST protein and vice-versa.
Simple tripeptide composition based prediction: In this percentage composition of all 8000 tripeptides were calculated, which inturn were used to derive the weight corrosponding to each dipeptide. It was done by substracting the composition data. To determine the any unknown protein, tripeptide compositions is calculated and then corrosponding weight is multiplied to it. All the 8000 values determined in this way is summed up to get the cumulative score. If the cumulative score is less than 0 then it will be classified as non-GST protein and vice-versa.
SVM: The method is based on Support Vector Machine (SVM). Depending upon the threshold value which user choses, SVM will classify the unknown protein into GST and non-GST protein. The default threshold is 0. If user want less sensitivity but more specificity, then higher threshold value should be specified, but if opposite is anticipated then lower threshold value should be choosen. So, the expected outcome will depends on the trade-off between sensitivity and specificity.
In general, threshold value -0.2 for monopeptide composition based, 0.2 for dipeptide composition and -0.4 for tripeptide composition give better result.
If you still have any doubt or got some suggestion then kindly contact us.
Department of Computational Biology, Indraprastha Institute of Information Technology,New Delhi,India
|