We have developed a SVM based methods for the prediction of quantitative affinity of the peptides binding toward TAP. The prediction is based on complex patterns extracted from the sequence and 33 other properties of amino acids like volume, charge, aromatics residues etc. The affinity of peptide for TAP was obtained on the scale of 1-10. The correlation coefficient between the SVM prediction and measured affinity was 0.889.
-
Datasets for the development of prediction method:
The peptides dataset used in this study was kindly provided by Peter Van Endert ( INSERM U580, Institut Necker, Paris France). TAP binding affinity of the peptides were expressed in term of IC50 value. The binding affinity of all peptides was tested experimentally by TAP Binding assay. The peptides have diverse binding affinity from very high (<0.03 nM) to negligible or no binding (2600 nM). All the duplicate peptides were removed from dataset. The peptides with unnatural amino acids also deleted from the dataset. The final dataset have 431 peptides with experimentally verified binding affinity. Out of 431 peptides, 179 peptides known to bind to various MHC alleles.Out of these MHC binders ,113 are present in SWISS-PROT database.
The prediction is based on the support vector machine (SVM). Support vector machines are relatively new type of supervised machine learning that have proven to be particularly attractive to biological analysis due to their ability to handle noise and large input spaces. SVMs have been shown to perform well in multiple areas of biological analysis, including MHC binder prediction, analysis of microarray expression data and multiclass fold recognition.SVM simulation was achieved by using the SVM_light package. This package enable the user to define a number of parameters as well as select a choice of inbuilt kernel functions including Polynomial, RBF, Linear, Sigmoid or others. In this study the regression mode of SVM was used to model the TAP binding affinity of peptides.
Algorithm for Simple SVM:-
The simple SVM was generated on the basis of binary encoding of the sequence. Each amino acid was encoded as a 20-bit string with a unique position set at 1 and all other positions set at 0. Each peptide of 9 aa was represented by 180 inputs and a target value during the generation of the model. The target value is a real value varying from 0-10. The models were generated by using the different type of the kernels like polynomial, RBF and linear. The best model was generated by varying parameters of kernel and regulatory parameter C. The performance of the standard kernel function was evaluated by using the Jack knife testing. The performance of the kernel was determined by measuring the correlation coefficient between predicted and experimentally measured values.The overview of the final model was shown in figure below.
The correlation between the predicted and measured binding affinity reached 0.81 with the simple polynomial kernel. The performance was evaluated by using the jackknife testing. The results clearly demonstrate that polynomial kernel is more accurate; therefore it is considered as the best.The various parameters of the polynomial kernel are listed below. Kernel::Polynomial
Regulatory parameter(C) ::5.00
Dgree of Kernel ::1.00
Correlation Coefficient::0.81
Algorithm for Cascade SVM:-
In cascade SVM, prediction is based on the sequence and features of amino acids.At first level, 33 models were generated by combining 33 features of amino acids with sequence information (one each time). At second level, final model was generated by giving the output of first level as input.
First Level:-
Models were generated on the basis of sequence and features of amino acids.The input vector for each amino acid is 21 dimensional. Among these, first twenty units of the vector stands for one type of amino acids. In order to specify particular feature of residue like charge, volume, etc, the 21st unit is added for each residue.In this manner, combining single feature of amino acids to sequence information resulted in 33 feature specific models.The overview is shown in figure below.
Second Level:-
The second model takes the outputs of the 33 models generated at first level and yields the final output on the base of these outputs. Each peptides of 9 amino acids are encode by 34 real values units, where one unit codes for the targeted value and rest 33 inputs are outputs of each peptide from 33 models generated in first level.The best model was chosen after experimenting with various types of kernels and varying their parameters. The model was fine tuned by changing the value of regulatory parameter C.
Total 33 models were generated by considering 33 features of amino acids. The analysis of the results demonstrates that none of the feature of amino acids in combination with sequence information results in significant improvement in correlation between the predicted and measure binding affinity.Using another model of SVM, we have filtered or correlated the results of first model. The second model was fed with the output of each of 33 models generated at first level. The best result were considered were the maximum correlation between the predicted and measured binding affinity were obtained after jackknife validation testing. Using the second model, the value of correlation coefficient between predicted and measured binding affinity reached to 0.88, which is significantly higher in comparison to only sequence based prediction.
The best resulted obtained at first and second level along with parameters and kernels are listed below.
First Level::-
Kernel::Polynomial
Regulatory parameter(C) ::5.00
Dgree of Kernel ::1.00
Correlation Coefficient::0.80
Second Level::-
Kernel::RBF
Regulatory parameter(C) ::30.0
Kernel Parameter (g) ::2.00
Correlation Coefficient::0.889
Results and Conculsion:-
The outlines of the results obtained are shown in table below. The results clearly demonstrate that SVM outperformence the ANN in the classfication of data of TAP binding peptides.The results obtained by using the sequence based simple SVM model are better as compared to ANN based method. The correlation coffiecent of 0.732 is obatined between the measured and predicted values in previously published ANN based method.To further improve the reliability of prediction we have icoprtaed the feature information of amino acids along with sequential information.We have tried in number of ways to incoperate the fetures along with sequence information. The SVM model was generated by incorporating features of amino acids along with sequence information. The features of amino acids include 33 physiochemical properties. This results in insignificant improvement in performance of prediction method. A significant lack of improvement in the performance of prediction methods may be the result of complexity of input patterns. The SVM model generated only on the basis of features of amino acids is not able to perform comparable to only sequence-based model. The poorer performance of the features based method may be due to overlapping features of amino acids.In last we have adopted the cascade SVM based statergy for more reliable prediction. In cascase SVM the two SVM models were used. The Two models are able to predict the affinity of peptides toward TAP transporter more accurately as compared to sequentail models. The correlation coffiecent of .889 was achieved between the predicted and measured values. The outlines of the results are shown in table below.
However, for more reliable prediction of TAP affinities of individual peptides, it can be envisioned to increase the predictive performance by retaining the SVM with additional data.In conclusion, human tap may skew the HLA class I associated system of antigen processing and presentation to its main task, the display of abundant of non-self proteins derived from viral or bacterial sources.
Analysis of Peptides Interacting with TAP:-
All peptides interacting with the TAP were analyzed in term of features (physical and chemical properties) of different positions (P1-P9). The TAP interacting peptides were analyzed in term of following features (Volume, Charge, aromatic, hydrophobicity, hydrophilicity, average accessibility, flexibility, hydropathy, %buried). The analysis was based on the assumption that a overrepresentation of particular property at particular position will have positive effect on affinity whereas under representation of particular property at particular position will help detrimental effect on binding. The binding affinity (IC50 value) of peptides used in analysis were expressed on the scale of 0 to 10, representing a 5-log range of normalized IC50 value from >1000 (score 0) to <0.003 (score 10) with a score increment of 1 corresponding to three fold smaller IC50 value. The values of each feature are normalized between 0 and 1. The effect of each feature for different positions of peptide is obtained by measuring correlation between feature and measured binding affinity values. The variation in each feature along the peptides (P1-P9) can be easily analyzed by plotting a graph between correlation coefficient and peptide positions.The results of the analysis are shown in graphs below.
These graphs of figure 4 clearly demonstrate that three positions at N terminal and COOH terminal favors the residues with particular features. The position 1 (P1) of peptides favors the charged and hydrophilic residues, whereas the aromatic, higher volume and hydrophobic residues are not favored at P1 of peptides. The higher volume, charged, hydrophilic, accessible, flexible residues are favored at the 2nd position of the peptide. The 3rd position mostly possesses higher volume, aromatic, hydrophobic and accessible residues.The COOH terminal of the peptides prefers the higher volume, charged, aromatic, hydrophobic and accessible residues.
Related References:-
-
Lankat-Buttgereit B, Tampe R.(1999)
The transporter associated with antigen processing TAP: structure and function.FEBS Lett. 464 108-12.[PUBMED]
- Abele R, Tampe R.(1999)
Function of the transport complex TAP in cellular immune recognition.
Biochim Biophys Acta. 1461(2) :405-19.[PUBMED]
-
Lankat-Buttgereit B, Tampe R.(2002)
The transporter associated with antigen processing: function and implications
in human diseases. Physiol Rev.82(1) :187-204.[PUBMED]
-
van Endert PM, Saveanu L, Hewitt EW, Lehner P.(2002)
Powering the peptide pump: TAP crosstalk with energetic nucleotides.
Trends Biochem Sci. 27(9) :454-61.[PUBMED]
-
Uebel S, Tampe R. (1999)
Specificity of the proteasome and the TAP transporter.
Curr Opin Immunol.11(2) :203-8.[PUBMED]
-
Neefjes J, Gottfried E, Roelse J, Gromme M, Obst R, Hammerling GJ, Momburg
F. (1995) Analysis of the fine specificity of rat, mouse and human TAP peptide
transporters. Eur J Immunol. 25(4): 1133-6.[PUBMED]
-
Schumacher TN, Kantesaria DV, Heemels MT, Ashton-Rickardt PG, Shepherd JC,
Fruh K, Yang Y, Peterson PA, Tonegawa S, Ploegh HL.(1994)
Peptide length and sequence specificity of the mouse TAP1/TAP2 translocator.
J Exp Med. 179(2) :533-40.[PUBMED]
-
van Endert PM, Riganelli D, Greco G, Fleischhauer K, Sidney J, Sette A, Bach
JF.(1995) The peptide-binding motif for the human transporter associated with antigen processing.J Exp Med. 182(6): 1883-95.[PUBMED]
-
Daniel S, Brusic V, Caillat-Zucman S, Petrovsky N, Harrison L, Riganelli D,
Sinigaglia F, Gallazzi F, Hammer J, van Endert PM. (1998)
Relationship between peptide selectivities of human transporters associated
with antigen processing and HLA class I molecules.
J Immunol. 161(2) :617-24.[PUBMED]
-
Townsend, A., T. Elliot, V. Cerundolo, L. Foster, B. Barber, A. Tse. (1990.) Assembly of MHC class-I molecules analyzed in vitro. Cell 62: 285.[PUBMED]
-
Heemels, M.-T., H. L. Ploegh. (1994.) Substrate specificity of allelic variants of the TAP peptide transporter. Immunity 1: 775 .[PUBMED]
-
Heemels, M.-T., T. N. M. Schuhmacher, K. Wonigeit, H. L. Ploegh. 1993. Peptide translocation by variants of the transporter associated with antigen processing. Science 262: 2059. [PUBMED]
-
Androlewicz, M. J., P. Cresswell.(1994). Human transporters associated with antigen processing possess a promiscuous peptide-binding site. Immunity 1:7.[PUBMED]
-
Van Endert, P. M., R. Tampé, T. H. Meyer, R. Tisch, J.-F. Bach, H. O. McDevitt. 1994. A sequential model for peptide binding and transport by the transporters associated with antigen processing. Immunity 1: 491[PUBMED]
-
Uebel, S., W. Kraas, S. Kienle, K. H. Wiesmuller, G. Jung, R. Tampe. (1997). Recognition principle of the TAP transporter disclosed by combinatorial peptide libraries. Proc. Natl. Acad. Sci. USA 94 :8976.[Abstract]
-
Brusic V, van Endert P, Zeleznikow J, Daniel S, Hammer J, Petrovsky N. (1999)
A neural network model approach to the study of human TAP transporter.
In Silico Biol. 1(2) :109-21. [PUBMED]
-
Hill A, Ploegh H. (1995)
Getting the inside out: the transporter associated with antigen processing
(TAP) and the presentation of viral antigen.
Proc Natl Acad Sci U S A. 92(2): 341-3.[PUBMED]
-
Bhasin, M., Singh, H. and Raghava, G.P.S. (2003) MHCBN: A comprehensive database of MHC binding and non-binding peptides. Bioinformatics. 19, 666-667.
[PUBMED]
-
Joachims, T. (1999) Making large-Scale SVM Learning Practical. In: B Scholkopf and C Burges and A Smola, (eds) Advances in Kernel methods ?support vector learning. MIIT Press, Cambridge massachusetts,London England .
|