PREDICTION METHOD

The server provides two different approaches namely machine learning approach - Artificial neural network and statistical learning technique - support vector machine for the prediction of transmembrane beta barrel regions.It also allows combined prediction based on these two approaches.SVM and ANN methods also discrimnate between the globular and beta-barrel proteins at accuracies of 92.3% and 88.89% respectively.Qok index which is a and important stringent measure for transmembrane prediction methods is 0.475 and 0.625 for ANN and SVM respectively.

Neural network based (SNNS) Approach :-

The SNNS approach is same as outlined by Cassidio et al. Feed forward neural network with backpropagation algorithm was implemented. Single hidden layer with 5 nodes and single output node was used. Dataset however included 17 proteins which helped us achieve accurcay of 80.5% with LOCV as against 78% reported by Cassidio et al. For Details please refer to article by Cassidio et al 2001.

Statistical learning - SVM Approach :- SVM (C.J.C. Burges and V.Vapnik 1998)

Support Vector Machines and related kernel methods have become increasingly popular tool for data mining tasks such as classification ,regression and novelty detection.Their remarkably robust performance with respect to sparse and noisy data is making them the system of choice in number of applications from text categorization to protein function prediction and many more aspects of computational biology.They can be used to generate many possible learning machine architectures(e.g. RBF networks, feedforward networks).
The When used for classification, they separate a given set of binary labelled training data with a hyper plane that is maximally distant from them (known as maximal margin hyper plane).For cases in which no separation is possible they can work in combination with the technique of kernels that automatically realizes a non linear mapping to a feature space.The hyper-plane found by SVM in feature space correspods to non linear decision boundary in the input space.
The method elliminates many of the problems experienced with other inference methodologies like neural networks,decision trees:-
1. There are no problems with local minima. One can easily construct highly non linear classification without worrying about getting stuck in local minima.
2. There are few model parameters to pick.
3. Final results are stable,reproducible and largely independent of the specific algorithm used to optimise tthe SVM model unlike neural networks where the results are dependent on particular algorithm and starting point.
Methodology : - The complete SVM method can be summarised as follows:- we begin by choosing a kernel starting from simplest linear kernel to sigmoid or user defined kernel.Tune the parameter C which represents the tradeoff between minimizing the training set error and maximising the margin.
Two different SVM models were developed and tested. One based on just amino acid information and other based on 36 physico-chemical properties.These 36 physico-chemical properties belong to the class of solvent accessibility, hydrophobicity, hydrophilicity, flexibilty, charge, volume, polarity, concentration of neighbouring aromatic residues and propensities of all 20 amino acids for alpha-helix,sheet and turns.

Combined Prediction :-

Finally ,the two methods were combined to acheive better accuracy levels.In case ov SVM, the prediction values are in the range of -1.5 to +1.5,as against 0 to 1 of ANN. We have normalized the SVM score in order to make it in the range of 0 to 1 by adding 1.5 to SVM score and dividing by three. The final per residue score was calculated by taking average of two scores (ANN Score and normalized SVM score). Prediction was evaluated by calculating different scoring parameters as senstivity, specificity, npv, mcc and Qok.
Qok is considered to be a very important measure in case of predictions of treansmembrane regions in proteins.Its is calculated as the number of proteins which predicted barrel regions in correct number and correct location.In case of latter slight overlap is allowed.

Actual Predicted
10 - 20 10 - 22
25 - 35 22 - 32
40 - 50
Thus, the above protein is not a Qok protein, as it has extra barrel region.
Also the discrimation power of thew two methods was checked taking of dataset of 16 randomly seected globular proteins with less than 25% sequence identity. Final rules were devised acorrding to which protein with more than 8 beta-barrel regions of minimum length 7 is considered as beta-barrel protein otherwise it is considered as non-beta barrel protein.

 

 

 

 
Indraprastha Institute of Information Technology,IMTECH ,New Delhi ,India