|
General information & Help
GENERAL INFORMATION
The recently sequenced rice genome has opened new challenges ahead for the plant research community in terms of identifying the function and regulation of each encoded protein. Gaining an understanding of the biological functions of novel genes is a more ambitious goal than obtaining just their sequences. To narrow this huge gap between the enormous amount of raw sequence data of the rice genome and the experimental characterization of the corresponding proteins, Professors therefore have to find computational ways to efficiently analyze these data. Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological function. We have therefore, developed RSLpred which is a SVM based prediction method for 4 major target proteins (Chloroplast, Cytoplasm, Mitochondria and Nucleus).
STEPWISE HELP
Name of Protein: This is an optional field. The name of protein may have letters and numbers with the "-" or "_". All other characters are non-permissible. The field is assigned a default name "Protein". The sequence name is just used for only your information. User may or may not enter the name of the sequence. Paste a Protein Sequence: This server allows the submission of sequence in any of the standard formats. The user can paste plain sequence in the provided box. Amino acid sequences must be entered in the one-letter code. All the non-standard characters will be ignored from the sequence. Upload Sequence File: The server also has the facility for uploading the local sequence files by using the "BROWSE" option. Select Sequence Format: RSLpred can accept both the formatted or unformatted protein sequences. It uses ReadSeq routine to parse the input. The user should check the format of the input sequence before submitting the sequence for prediction. The results of the prediction will be false if the format choosen is wrong. Choose a Prediction Approach: In case of default prediction, RSLpred uses the PSSM-based module. However, in the present investigation, we have developed 13 different SVM modules for the prediction of subcellular localization of rice proteins (see text for details). The best performing modules have been implemented on the World Wide Web as a dynamic web server where the Users have the option to choose either of the prediction approach available. The brief account of these approaches is given below: |
|
1. Amino acid composition: This SVM module developed on the basis of fraction of each amino acid in a protein can predict chloroplast, cytoplasmic, mitochondrial and nuclear proteins with 42.86%, 58.25%, 78.54% and 94.75% accuracy respectively. The calculation of amino acid composition generates the 20 dimensional input vectors which were used to train four types of SVM models for the four types of subcellular localizations. The composition based SVM module is able to predict with an overall accuracy of 81.43%. 2. Dipeptide Composition: was used to encompass the information about amino acid composition along the local order of amino acid which gives fixed pattern length of 400 (20 X 20) SVM vector. This SVM module predicts with 80.88% of overall accuracy which is nearly at par with the amino acid composition based SVM module. 3. Hybrid approach-I: To further enhance the prediction accuracy, we have devised methodologies to encapsulate more comprehensive information of a protein. In the first step, we developed a hybrid module by combining amino acid composition and dipeptide composition. Here, the SVM input vector pattern is 420 (20 for amino acid and 400 for dipeptide composition). This approach predicts the subcellular localization of a protein with an overall accuracy of 82.53% which is about 1% superior over the amino acid composition based SVM method. [ HOME ] [ SUBMIT ] [ TOP ] 4. Splitted Amino Acid Composition (SAAC): This method divides each of the protein sequence into three parts viz. N-terminal (25 residues), centre portion and the C-terminal (25 residues) part. The amino acid composition is calculated for each part separately so that we have finally 60 (20 x 3) SVM vector pattern. This SVM module predicts the subcellular localization with an overall accuracy of 79.78%. Among the terminal based SVM approaches, this method has the highest accuracy as compared to the accuracy of other terminal based SVM modules. 5. PSSM based SVM: is another module constructed by combining the evolutionary information stored in the matrix called as PSSM which is a method for detecting distantly related proteins by sequence comparison. Here, the information is expressed in a position-specific scoring table (profile), which is created from a group of sequences previously aligned by PSI-BLAST. The PSSM based SVM module is the best of all the methods attempted by us as it gives an overall accuracy of 87.10%. Submit Sequence: Click here to submit the protein sequence entered/pasted by you. OUTPUT: The output shows the input data as submitted by the user along with the prediction results. It displays the name ( if provided), input sequence, length of the sequence and prediction approach as followed by the user. In addition to this, different scores generated for all the four types of locations are also displayed. In case of PSSM approach, details such as R.I. value and expected accuracy are also displayed. However, due to PSI-BLAST searches and generation of profiles in the form of a PSSM table, the prediction of the query sequence through PSSM-based approach may take some time to serve the query. For rest of the approaches, the prediction results will be displayed within a few seconds. |