IgPred - Prediction of Antibody-specific B-cell epitopes

The prediction server for Antibody class specific B cell epitope has been designed in a very user-friendly manner. Here, on this page, user can get the details of all the algorithms and procedures exploited in the different modules.

One vs. rest approach

This approach is used to classify multiple classes and it relies on comparing one class with other classes taken together. Since we want to develop machine learning classifiers, which can distinguish each class-specific B-cell epitopes from other class specific B-cell epitopes as well as from non-B-cell epitopes, three types of datasets corresponding to IgG, IgE and IgA-specific B-cell epitopes were developed using one-vs-rest approach. Here, in this approach, while developing model for one particular class, that class specific B-cell epitopes are considered as positive examples and sequences from rest of the classes are considered as negative examples as described in the following figure:

Multiple Em for Motif Elicitation/Motif alignment and Search Tool (MEME/MAST) studies

The MEME/MAST comprises of two programs, one is for discovery of motifs shared by closely related sequences (MEME) and the other allows database search for sequences containing these motifs (MAST). The occurrence of certain motif in a related sequence is not just random; they must be sharing some biological function. In this study meme-3.0.14 version was used. We did meme study at default parameters, for each class of epitopes (IgG, IgE and IgA) to discover the motifs.

Support Vector Machine based methods

In the present study, SVM classifier was used from freely available SVM_light package . This package is powerful as well as user-friendly where we can adjust the parameters and kernel functions like Linear, Polynomial, RBF and Sigmoid.

Weka-3.6.0 based methods

Weka 3.2 is a collection of machine-learning algorithms for solving real world data mining problems (Witten and Frank 1999). We used nine algorithms of WEKA package namly SMO, IBk, and RandomForest .

Evaluation or Performance

Five-fold cross validation technique has been used. Four sets are used for training and remaining one in used for testing, in this way the process repeats five times. Evaluation of performance of different SVM modules has been done by calculating accuracy and Matthew's correlation coefficient (MCC).

Input features for SVM

In this study we have been used various features as SVM input for the prediction of CPPs.

1. Amino Acid Composition: Amino Acid Composition is the fraction of each amino acid present in a peptide. There are 20 vectors generated in which one corresponds to one amino acid and these vectors used for as SVM input.

2. Dipeptide Composition: Dipeptide Composition is the fraction of each dipeptide like AA, AC, AD and so on. It provides compositional as well as local order each residue present in the peptide. It contains 20x20 (400) vectors.

3. Amino Acid Propensity: Amino Acid Propensity can be defined as the dipeptide composition multiplied by its frequency of occurrence in Bcipep and Swissprot databases. Here vector size remains 400.

4. Composition-transition -distribution: Each peptide sequence is mapped in to a string defined by three symbols. These symbols are resulted from grouping of all amino acids in to three groups, on the basis of certain physiochemical property. For every physicochemical property, we have string of 1,2 and 3 symbols, three feature given by composition three feature given by the percent frequency of i followed by j or j followed by i (transition) and three features are five features per symbol representing the fractions of the entire sequence where the first, 25, 50, 75, and 100% of the candidate symbol are contained in string (distribution). The final vector size becomes 108.

5. Physico-chemical Properties: Physico-chemical Properties of each amino acid like hydrophobicity, hydrohpilicity, charge, pI etc. has been used as input feature for the prediction. We obtained physico-chemical properties values of each amino acid form the webserver AAindex and used them to calculate physico-chemical properties of peptide by Perl programes.

5. Binary Profile: In binary profile of patterns, each amino acid is presented by a vector of dimension 20 as described below. Since the length of epitopes was 20, a pattern of window length 20 is represented by a vector of dimension (20 x 20) .