|
INTRODUCTION
RICE (Oryza sativa L.) is the single most important agricultural resource which feeds more than half of the world's population. The recently sequenced rice genome Nature 436: 793-800 (August, 2005) has opened new challenges ahead for the plant research community in terms of identifying the function and regulation of each encoded protein. Gaining an understanding of the biological functions of novel genes is a more ambitious goal than obtaining just their sequences; however, the wealth of information on nucleotide sequences that is being generated through the International Rice Genome Sequencing Project (IRGSP) far outweighs what is currently available on the amino acid sequences of known proteins. To narrow this huge gap between the enormous amount of raw sequence data of the rice genome and the experimental characterization of the corresponding proteins, Professors therefore have to find computational ways to efficiently analyze these data. Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological function. Compared with experimental methods, computational prediction methods provide fast, automatic and accurate assignment of subcellular location to a protein. Therefore, a fully automatic and reliable prediction system for subcellular localization of rice proteins would be very useful. RSLpred is an attempt in this direction which is a SVM based prediction method for 4 major target proteins (Chloroplast, Cytoplasm, Mitochondria and Nucleus). SVM modules are based on various features of the protein such as - Amino acid composition, Dipeptide composition, Pseudo amino acid (PseAA) composition and evolutionary information of PSI-Blast. The overall prediction accuracy of these SVM modules is 81.43%, 80.88%, 82.97% and 68.35%, respectively. To enhance the accuracy, a SVM method based on the evolutionary information stored in the sequence matrix called as Position Specific Scoring Matrix (PSSM) was also developed which gave higher accuracy of 87.10% being the best of all the methods developed. To encapsulate more comprehensive information, hybrid approach based SVM modules were also attempted based on all the above mentioned features of a protein. The best hybrid module achieved an overall accuracy of 84.84%. In addition, N-terminal and C-terminal amino acid composition based SVM modules were also developed. The best accuracy of about 80% was achieved with N-Centre-C (3-parts) based SVM module. Conclusively, the best modules were selected and made available on this server for real time subcellular predictions of rice proteome. CITATION If you are using this tool, please cite: Kaundal, R. and Raghava, G.P.S. (2009) RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. PROTEOMICS, 9(9): 2324 - 2342. |
|