CSIR
Department of Computational Biology
IMTECH, New Delhi, INDIA
 
 
Home
 
Beta Turns
Beta Turns: Definition
Beta Turns: Methods of prediction
Beta Turns: Evaluation of methods
 
Neural Networks
Neural Networks: Definition
Neural Networks: Learning alogrithms
 
Multiple alignment
PSI-BLAST profiles
 
Performance Measures
Qtotal
Qpredicted
Qobserved
Matthew's Correlation Coefficient
 
Help
 
References
 
Contact
  
Multiple Alignment

Prediction from a multiple alignment of protein sequences rather than a single sequence has long been recognized as a way to improve prediction accuracy (Cuff and Barton, 1999). During evolution, residues with similar physico-chemical properties are conserved if they are important to the fold or function of the protein. The availability of large families of homologous sequences revolutionised secondary structure prediction. Traditional methods, when applied to a family of proteins rather than a single sequence proved much more accurate at identifying core secondary structure elements.

The same approach is used here for the prediction of beta turns. It is a combination of neural network and multiple alignment information. Net is trained on the PSI-BLAST(part of PSIPRED) generated position specific scoring matrices.

PSI-BLAST

In PSI-BLAST(Position Specific Iterative Blast)(Altschul et al., 1997), the sequences extracted from a Blast search are aligned and a statistical profile is derived from the multiple alignment. The profile is then used as a query for the next search, and this loop is iterated a number of times that is controled by the user. For more information, Click here.

The PSIPRED method has been used for secondary structure prediction. It uses PSI-BLAST to detect distant homologues of a query sequence and generate position specific scoring matrix as part of the prediction process (Jones, 1999), and training is done on these intermediate PSI-BLAST generated position specific scoring matrices as a direct input to the neural network. The matrix has 21 X M elements, where M is the length of the target sequence and each element represents the likelihood of that particular residue substitution at that position in the template. It is a sensistive scoring system, whcih involves the probabilities with which amino acids occur at various positions.