TumorHPD Algorithm
Data set used
We have taken data set of 651 peptides from the database (TumorHoPe) and equal number of peptides were taken from Swiss-Prot database (for negative data set). This data set was used for calculating amino acid composition and dipeptide composition.
For N5 terminal , data set contains first five residues and C5 terminal ,dataset contains last five residues of peptides in main dataset .
For N10 terminal ,data set contains first ten residues and C10 terminal data set contains last ten residues of peptide in main dataset .
Data set for peptides (length in between 4 and 10 residue) consist of 469 peptides from main dataset
Prediction approach:
We have used following approaches for the development of SVM models:
Amino acid composition:
Composition profile of patterns is the percentage frequencies of each amino acid in a fixed length sequence patterns. The fraction of all 20 natural amino acids of fixed length sequence patterns are taken as input vector for SVM.
Dipeptide composition :
In this approach, fixed pattern length of 400 (20 x 20). It encapsulates the global as well as local information of the sequence.
Binary profile:
In this approach, fixed length of 21-window sequence patterns was converted into binary form. Each residue of patterns was represented by a vector of dimension 21 (e.g. Ala by 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; Cys by 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), which contains 20 amino acids and one dummy amino acid "X".
Models Used on Webserver:
We have developed models on the svm based prediction method on different input features. Following results were obtained from amino acid composition, binary profile which we have used in model generation.
Method
Thres
TP
FP
TN
FN
Sensitivity
Specificity
Accuracy
MCC
ROC
Amino Acid Composition
0
531
108
545
120
81.57
83.46
82.52
0.65
0.90
Binary(NTCT5)
0.1
486
83
568
163
74.88
87.25
81.08
0.63
0.88
Binary (NTCT10)
-0.1
204
31
222
49
80.63
87.75
84.19
0.69
0.91
(Binary NTCT5 upto 10 residue long)
0.1
343
44
425
126
73.13
90.62
81.88
0.65
0.88