PSEApred Prediction of Plasmodium Secretory and Infected Erythrocyte Associated Proteins

    HOME    SUBMIT    HELP    TEAM    CONTACT

GENERAL INFORMATION

Sequence Submission
Input Sequence:- Our server provides two options for submitting the query sequences. The first option user can paste their sequence in the given inbox. The other option user can upload the sequence files.
Sequence Format:-
The server can accept both the formatted or unformatted raw antigenic sequences.The server uses ReadSeq routine to parse the input.The user should choose whether the sequence uploaded or pasted is plain or formatted before running prediction. The results of the prediction will be wrong if the format choosen is wrong.
Please do paste one sequence at a time.

Dataset Information:-
The dataset used in this study consists of 252 secretory proteins and 252 non-secretory proteins.This dataset was used to train and test our method.
Support Vector Machine:-
Support Vector Machine Support vector machine (SVM) is a novel machine learning method. It is based on the statistical learning theory presented by V.N.Vapnik, it has been successfully applied to numerous classification and pattern recognition problems such as text categorization, image recognition and bioinformatics. The application of SVM results in the globally optimized while with neural networks, the gradient based on training algorithms and the solution for a classification problems. The SVM light is a freely downloadable package written by Joachim's which can be downloadable from http://ais.gmd.de/~thorsten/svm_light/. The SVM_light is used to predict the secretory protein. The SVM modules were developed based on Aminoacid composition.
Amino acid composition:-
The amino acid composition provided the information of protein in 20 dimensions vector. The amino acid composition is the fraction of each amino acid in protein. It was observed that amino acid composition of surface exposed and non-surface exposed proteins was somewhat different.Thus a SVM based classifier was developed using amino acid composition where amino acid composition was used as input vector of dimension 20. Different kernels and parameters of SVM were tried and achieved maximum accuracy 83.2% with MCC 0.7 using RBF kernel. It was interesting to note that our method was able to predict 80.2% secretary proteins (sensitivity) at specificity 86.3% for threshold 0.0.
Evaluation of Performance:-
The leave-one-out cross validation technique examined the prediction quality. Leave-one-out cross validation (LOOCV) is a technique where the classifier is occassoinally learned on n-1 samples and tested on the remaining one. The accuracy of results commonly measured by the quantity of True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN). In the prediction system the total prediction accuracy, Mathew's correlation co-efficient(MCC), sensitivity and specificity was calculated by following equations.

Sensitivity = TP / (TP+FN),

Specificity = TN / (TN+FP),

Accuracy = TP+TN / TP+TN+FP+FN and

MCC = (TP*TN)-(FP*FN)/(TP+FN)*(TP+FP)*(TN+FP)*(TN+FN).
team help home contact submit