SVM based quantitative Method for predicting peptide TAP binding affinity

Home

Help

Information

Links

Team

Contact


TAPPred is an on-line service for predicting binding affinity of peptides toward the TAP transporter.The prediction of TAP binding peptides is crucial in identifying the MHC class-1 restricted T cell epitopes. The Prediction is based on cascade SVM, using sequence and properties of the the amino acids. The correlation coefficient of 0.88 was obtained by using jack-knife validation test.

General information about TAP Transporter

Detailed Algorithm for Tappred

Dataset for the Development of Method

Algorithm for simple SVM

Algorithm for Cascade SVM

Analysis of peptides Interacting with TAP

Conclusion & Results of Prediction

References

General information about TAP Transporter:-

TAP is an transporter assiociated with MHC class I restricted antigen processing.The TAP is heterodimeric transporter belong to the family of ABC transporter, that uses the energy provided by ATP to transloacte the peptides across the membrane.The transporter is composed of two proteins named TAP-1 and TAP-2. The subset of these transported peptide will bind MHC class I molecules and stabilize them. These MHC-peptide complexes will be translocated on the surface of antigen presenting cells (APCs). The adducts of MHC and Peptide complexes are the ligands for T cell receptors (TCR). These complexes elicit the immune response for clearing various intracellular infections.

How does the TAP complex work is shown in the figure below.

Detailed mechnaism of transport by TAP transporter

Peptide transport by TAP is a multi-step process. In a fast bimolecular association step, the peptide binds to TAP in an ATP-independent manner, followed by a slow isomerization of the TAP complex. It is suggested that this structural reorganization of the molecule triggers ATP hydrolysis and peptide translocation across the membrane. These binding steps primarily determine the selectivity of TAP. The translocation strictly requires hydrolysis of ATP, because non-hydrolyzable ATP analogs do not promote peptide transport. ATP and ADP have similar affinities for TAP; therefore, peptide translocation can be inhibited by ADP.

Peptide binding to TAP transporter

Due to extensive polymorphism of TAP transporter , distinct set of peptides will be translocated to ER. The natrure of these peptides is reflected in the nature of MHC binding peptides The selective transport of the peptides by TAP may modulate or limit the supply of the peptides to HLA class I molecules. Thus, the molecular understanding of the selectivity and specificity of TAP may contribute dramatically in the prediction of the MHC class I restricted T cell epitopes.The TAP transporter efficently bind and transport the peptides of 8-12 amino acids. It appears TAP binds peptides that are of optimal length or slighly larger then those presented by MHC class I molcules.In spite of length preference the nature of peptides has an influence on peptide selectivity. TAP from the Rat strain RT1a as well as human TAP translocate peptides with broad specificity (hydrophobic or basic amino acids at COOH terminus), whereas TAP from rat strain RT1u and TAP prefers the peptides with hydrophobic COOH termini. According to another observation TAP favours strongly hydrophobic residues in position 3 (P3) and hydrophobic and charged residues in P2, whereas aromatic and acidic residues in P1. Van Endert and Coworkers also observed that proline in position 1 and 2 have very deterious effect on binding.The TAP specificity obtained by the Peptide specificity for the TAP transporter as determined by combinatorial peptide libraries

The figure has been obtained from Lankat-Buttgereit et al., 2002. Top panel: substrate specificity for TAP. The first three NH2-terminal amino acids and the last COOH-terminal amino acid contribute significantly to the stabilization of peptide binding to TAP. Middle panel: favored amino acids at the individual positions with negative Delta Delta G values (favored residues) are shown in blue, and positive Delta Delta G values (disfavored residues) are in red. For example, for the first position the amino acids K, N, and R are favored, and D, E, and F are disfavored. Bottom panel: a model of the substrate-binding pocket of TAP.

Why computaional method is required for prediction of TAP binding affinity of peptides?

The wet Lab testing of the peptides deived from the proteins is experimantally laborious and economically expensive.The Prediction methods based on the specificity of TAP transporter will complement the wet lab experiments and speed up the knowledge discoveries.on the basis of this two computational algorithms were dedeveloped in past. The algorithms are based on the machine learning technique(ANN).

Detailed algorithm of Tappred:-

We have developed a SVM based methods for the prediction of quantitative affinity of the peptides binding toward TAP. The prediction is based on complex patterns extracted from the sequence and 33 other properties of amino acids like volume, charge, aromatics residues etc. The affinity of peptide for TAP was obtained on the scale of 1-10. The correlation coefficient between the SVM prediction and measured affinity was 0.889.

Datasets for the development of prediction method:
The peptides dataset used in this study was kindly provided by Peter Van Endert ( INSERM U580, Institut Necker, Paris France). TAP binding affinity of the peptides were expressed in term of IC50 value. The binding affinity of all peptides was tested experimentally by TAP Binding assay. The peptides have diverse binding affinity from very high (<0.03 nM) to negligible or no binding (2600 nM). All the duplicate peptides were removed from dataset. The peptides with unnatural amino acids also deleted from the dataset. The final dataset have 431 peptides with experimentally verified binding affinity. Out of 431 peptides, 179 peptides known to bind to various MHC alleles.Out of these MHC binders ,113 are present in SWISS-PROT database.

The prediction is based on the support vector machine (SVM). Support vector machines are relatively new type of supervised machine learning that have proven to be particularly attractive to biological analysis due to their ability to handle noise and large input spaces. SVMs have been shown to perform well in multiple areas of biological analysis, including MHC binder prediction, analysis of microarray expression data and multiclass fold recognition.SVM simulation was achieved by using the SVM_light package. This package enable the user to define a number of parameters as well as select a choice of inbuilt kernel functions including Polynomial, RBF, Linear, Sigmoid or others. In this study the regression mode of SVM was used to model the TAP binding affinity of peptides.

Algorithm for Simple SVM:-
The simple SVM was generated on the basis of binary encoding of the sequence. Each amino acid was encoded as a 20-bit string with a unique position set at 1 and all other positions set at 0. Each peptide of 9 aa was represented by 180 inputs and a target value during the generation of the model. The target value is a real value varying from 0-10. The models were generated by using the different type of the kernels like polynomial, RBF and linear. The best model was generated by varying parameters of kernel and regulatory parameter C. The performance of the standard kernel function was evaluated by using the Jack knife testing. The performance of the kernel was determined by measuring the correlation coefficient between predicted and experimentally measured values.The overview of the final model was shown in figure below.

The correlation between the predicted and measured binding affinity reached 0.81 with the simple polynomial kernel. The performance was evaluated by using the jackknife testing. The results clearly demonstrate that polynomial kernel is more accurate; therefore it is considered as the best.The various parameters of the polynomial kernel are listed below.

Kernel::Polynomial
Regulatory parameter(C) ::5.00
Dgree of Kernel ::1.00
Correlation Coefficient::0.81

Algorithm for Cascade SVM:-
In cascade SVM, prediction is based on the sequence and features of amino acids.At first level, 33 models were generated by combining 33 features of amino acids with sequence information (one each time). At second level, final model was generated by giving the output of first level as input.

First Level:-
Models were generated on the basis of sequence and features of amino acids.The input vector for each amino acid is 21 dimensional. Among these, first twenty units of the vector stands for one type of amino acids. In order to specify particular feature of residue like charge, volume, etc, the 21st unit is added for each residue.In this manner, combining single feature of amino acids to sequence information resulted in 33 feature specific models.The overview is shown in figure below.
Second Level:-
The second model takes the outputs of the 33 models generated at first level and yields the final output on the base of these outputs. Each peptides of 9 amino acids are encode by 34 real values units, where one unit codes for the targeted value and rest 33 inputs are outputs of each peptide from 33 models generated in first level.The best model was chosen after experimenting with various types of kernels and varying their parameters. The model was fine tuned by changing the value of regulatory parameter C.

Total 33 models were generated by considering 33 features of amino acids. The analysis of the results demonstrates that none of the feature of amino acids in combination with sequence information results in significant improvement in correlation between the predicted and measure binding affinity.Using another model of SVM, we have filtered or correlated the results of first model. The second model was fed with the output of each of 33 models generated at first level. The best result were considered were the maximum correlation between the predicted and measured binding affinity were obtained after jackknife validation testing. Using the second model, the value of correlation coefficient between predicted and measured binding affinity reached to 0.88, which is significantly higher in comparison to only sequence based prediction. The best resulted obtained at first and second level along with parameters and kernels are listed below.
First Level::-

Kernel::Polynomial
Regulatory parameter(C) ::5.00
Dgree of Kernel ::1.00
Correlation Coefficient::0.80

Second Level::-

Kernel::RBF
Regulatory parameter(C) ::30.0
Kernel Parameter (g) ::2.00
Correlation Coefficient::0.889

Results and Conculsion:-

The outlines of the results obtained are shown in table below. The results clearly demonstrate that SVM outperformence the ANN in the classfication of data of TAP binding peptides.The results obtained by using the sequence based simple SVM model are better as compared to ANN based method. The correlation coffiecent of 0.732 is obatined between the measured and predicted values in previously published ANN based method.To further improve the reliability of prediction we have icoprtaed the feature information of amino acids along with sequential information.We have tried in number of ways to incoperate the fetures along with sequence information. The SVM model was generated by incorporating features of amino acids along with sequence information. The features of amino acids include 33 physiochemical properties. This results in insignificant improvement in performance of prediction method. A significant lack of improvement in the performance of prediction methods may be the result of complexity of input patterns. The SVM model generated only on the basis of features of amino acids is not able to perform comparable to only sequence-based model. The poorer performance of the features based method may be due to overlapping features of amino acids.In last we have adopted the cascade SVM based statergy for more reliable prediction. In cascase SVM the two SVM models were used. The Two models are able to predict the affinity of peptides toward TAP transporter more accurately as compared to sequentail models. The correlation coffiecent of .889 was achieved between the predicted and measured values. The outlines of the results are shown in table below.

However, for more reliable prediction of TAP affinities of individual peptides, it can be envisioned to increase the predictive performance by retaining the SVM with additional data.In conclusion, human tap may skew the HLA class I associated system of antigen processing and presentation to its main task, the display of abundant of non-self proteins derived from viral or bacterial sources.

Analysis of Peptides Interacting with TAP:-

All peptides interacting with the TAP were analyzed in term of features (physical and chemical properties) of different positions (P1-P9). The TAP interacting peptides were analyzed in term of following features (Volume, Charge, aromatic, hydrophobicity, hydrophilicity, average accessibility, flexibility, hydropathy, %buried). The analysis was based on the assumption that a overrepresentation of particular property at particular position will have positive effect on affinity whereas under representation of particular property at particular position will help detrimental effect on binding. The binding affinity (IC50 value) of peptides used in analysis were expressed on the scale of 0 to 10, representing a 5-log range of normalized IC50 value from >1000 (score 0) to <0.003 (score 10) with a score increment of 1 corresponding to three fold smaller IC50 value. The values of each feature are normalized between 0 and 1. The effect of each feature for different positions of peptide is obtained by measuring correlation between feature and measured binding affinity values. The variation in each feature along the peptides (P1-P9) can be easily analyzed by plotting a graph between correlation coefficient and peptide positions.The results of the analysis are shown in graphs below.

These graphs of figure 4 clearly demonstrate that three positions at N terminal and COOH terminal favors the residues with particular features. The position 1 (P1) of peptides favors the charged and hydrophilic residues, whereas the aromatic, higher volume and hydrophobic residues are not favored at P1 of peptides. The higher volume, charged, hydrophilic, accessible, flexible residues are favored at the 2nd position of the peptide. The 3rd position mostly possesses higher volume, aromatic, hydrophobic and accessible residues.The COOH terminal of the peptides prefers the higher volume, charged, aromatic, hydrophobic and accessible residues.

Related References:-

Lankat-Buttgereit B, Tampe R.(1999) The transporter associated with antigen processing TAP: structure and function.FEBS Lett. 464 108-12.[PUBMED]
Abele R, Tampe R.(1999) Function of the transport complex TAP in cellular immune recognition. Biochim Biophys Acta. 1461(2) :405-19.[PUBMED]
Lankat-Buttgereit B, Tampe R.(2002) The transporter associated with antigen processing: function and implications in human diseases. Physiol Rev.82(1) :187-204.[PUBMED]
van Endert PM, Saveanu L, Hewitt EW, Lehner P.(2002) Powering the peptide pump: TAP crosstalk with energetic nucleotides. Trends Biochem Sci. 27(9) :454-61.[PUBMED]
Uebel S, Tampe R. (1999) Specificity of the proteasome and the TAP transporter. Curr Opin Immunol.11(2) :203-8.[PUBMED]
Neefjes J, Gottfried E, Roelse J, Gromme M, Obst R, Hammerling GJ, Momburg F. (1995) Analysis of the fine specificity of rat, mouse and human TAP peptide transporters. Eur J Immunol. 25(4): 1133-6.[PUBMED]
Schumacher TN, Kantesaria DV, Heemels MT, Ashton-Rickardt PG, Shepherd JC, Fruh K, Yang Y, Peterson PA, Tonegawa S, Ploegh HL.(1994) Peptide length and sequence specificity of the mouse TAP1/TAP2 translocator. J Exp Med. 179(2) :533-40.[PUBMED]
van Endert PM, Riganelli D, Greco G, Fleischhauer K, Sidney J, Sette A, Bach JF.(1995) The peptide-binding motif for the human transporter associated with antigen processing.J Exp Med. 182(6): 1883-95.[PUBMED]
Daniel S, Brusic V, Caillat-Zucman S, Petrovsky N, Harrison L, Riganelli D, Sinigaglia F, Gallazzi F, Hammer J, van Endert PM. (1998) Relationship between peptide selectivities of human transporters associated with antigen processing and HLA class I molecules. J Immunol. 161(2) :617-24.[PUBMED]
Townsend, A., T. Elliot, V. Cerundolo, L. Foster, B. Barber, A. Tse. (1990.) Assembly of MHC class-I molecules analyzed in vitro. Cell 62: 285.[PUBMED]
Heemels, M.-T., H. L. Ploegh. (1994.) Substrate specificity of allelic variants of the TAP peptide transporter. Immunity 1: 775 .[PUBMED]
Heemels, M.-T., T. N. M. Schuhmacher, K. Wonigeit, H. L. Ploegh. 1993. Peptide translocation by variants of the transporter associated with antigen processing. Science 262: 2059. [PUBMED]
Androlewicz, M. J., P. Cresswell.(1994). Human transporters associated with antigen processing possess a promiscuous peptide-binding site. Immunity 1:7.[PUBMED]
Van Endert, P. M., R. Tampé, T. H. Meyer, R. Tisch, J.-F. Bach, H. O. McDevitt. 1994. A sequential model for peptide binding and transport by the transporters associated with antigen processing. Immunity 1: 491[PUBMED]
Uebel, S., W. Kraas, S. Kienle, K. H. Wiesmuller, G. Jung, R. Tampe. (1997). Recognition principle of the TAP transporter disclosed by combinatorial peptide libraries. Proc. Natl. Acad. Sci. USA 94 :8976.[Abstract]
Brusic V, van Endert P, Zeleznikow J, Daniel S, Hammer J, Petrovsky N. (1999) A neural network model approach to the study of human TAP transporter. In Silico Biol. 1(2) :109-21. [PUBMED]
Hill A, Ploegh H. (1995) Getting the inside out: the transporter associated with antigen processing (TAP) and the presentation of viral antigen. Proc Natl Acad Sci U S A. 92(2): 341-3.[PUBMED]
Bhasin, M., Singh, H. and Raghava, G.P.S. (2003) MHCBN: A comprehensive database of MHC binding and non-binding peptides. Bioinformatics. 19, 666-667. [PUBMED]
Joachims, T. (1999) Making large-Scale SVM Learning Practical. In: B Scholkopf and C Burges and A Smola, (eds) Advances in Kernel methods ?support vector learning. MIIT Press, Cambridge massachusetts,London England .