Hslpred: A svm based method for the subcellular localization of human proteins

HELP

Users can follow the following steps while using the Hslped :

Name of the protein : This is an optional field.

Input sequence: There are two ways of submitting the protein sequence. User can paste the sequence directly into the inbox field provided or upload the file by using the "BROWSE" option. Sequences must be entered in the one-letter code. All the non standard characters will be ignored from the sequence.

Sequence format: Server can accept both the formatted or unformatted protein sequences. It uses ReadSeq routine to parse the input. The user should check the format of the input sequence before submitting the prediction. The results of the prediction will be wrong if the format choosen is wrong.

Prediction Approaches: Server provides 4 different types of approaches for the prediction of subcellular localization of the proteins. Users have the option to choose either of the prediction approach available. The brief account of all the approaches is given below:

Amino acid composition: SVM module developed on the basis of fraction of each amino acid in a protein can predict cytoplasmic, mitochondrial, nuclear and plasma membrane protein with 63.5%, 46%, 76.2% 90.3% accuracy respectively. The calculation of amino acid composition generates the 20 dimensional input vectors which were used to train four types of SVM models for the four types of subcellular localizations. The composition based SVM module was able to predict with overall accuracy of 76.6%.
Dipeptide composition : This SVM module encompassed the information about amino acid composition along local order of amino acid.It uses the fixed pattern length of 400 vectors. The SVM module was predicted with 77.8% overall accuracy which was nearly 1% better then amino acid composition based SVM module.
PSI-BLAST : Since homology of the protein with other related sequences also provides broad range of the evolutionary information, therefore we have also developed PSI-BLAST module to predict subcellular localization of human proteins. The SVM module based on this approach was able to predict the subcellular localization of a protein with overall accuracy of 73.3%.
Hybrid based approach: To further, enhance the prediction accuracy, we have devised methodologies to encapsulate more comprehensive information of a protein. A SVM-based module called as hybrid module was constructed on the basis of comprehensive information about the proteins including amino acid composition, dipeptide composition and PSI-BLAST results. This module uses an input vector of 425 dimensions. The hybrid module was able to achieve striking accuracy of 84.9%, which is significantly better then rest of the modules developed in this study. These results confirmed that detection of subcellular localization of proteins requires wide range of information about a protein.

Output: The output shows the input data as submitted by the user along with the prediction results. It gives the name ( if provided), input sequence, length of the sequence and prediction approach as used by the users. In addition to this different scores generated for all the four types of locations are also given. In case of hybrid approach, details such as RI value and expected accuracy are aslo displayed.