The dihedral angle intervals for classic and inverse gamma-turns are:

Method Used

Also referred to as connectionist architectures, parallel distributed processing, and neuromorphic systems, an artificial neural network (ANN) is an information-processing paradigm inspired by the way the densely interconnected, parallel structure of the mammalian brain processes information. Artificial neural networks are collections of mathematical models that emulate some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of the ANN paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements that are analogous to neurons and are tied together with weighted connections that are analogous to synapses.

The gammapred server uses two feed forward back propagation networks with a single hidden layer. Both the networks have window five residues wide and have 25 units in a single hidden layer. The target output consists of a single binary number and is 1 or 0 (true or false).
For neural network implementation and to generate the neural network architecture and the learning process, the publicly available free simulation package SNNSv4.2 from Stuttgart University is used. It allows incorporation of the resulting networks into an ANSI C function for use in stand-alond code. A linear activation function is used.

Backpropagation

In 1986, Rumelhart et al. redeveloped and popularized a supervised learning algorithm, called Backpropagation or the generalized delta rule. It finally provided the multi-layer perceptron with an efficient learning rule. Backpropagation is based on the minimization of a suitable error or cost function, in this case the average error between the actual output signal and the corresponding desired output signal.
The global error E at the output layer is the sum of squared differences of the desired outputs d_j and the actually calculated outputs o_j of each output neuron j, and can be expressed as

This error is a function of the connection weights which are the parameters that have to be optimized in such a way that the error E becomes a minimum. On a multi-dimensional error surface, the goal is to approach or even reach the global minimum which is achieved by gradient descent method.

Network architecture

In GammaPred server, following two networks have been used:
Neural Network SNNS: First level-Sequence-to-structure net
The input to the first network is PSI-BLAST obtained position specific matrices. PSIPRED uses PSI-BLAST to detect distant homologues of a query sequence and generate position specific matrix as part of the prediction process, and here we have used these intermediate PSI-BLAST generated position specific scoring matrices, as a direct input to the first network. The matrix has 21 X M real elements, where M is the length of the target sequence.
Neural Network SNNS: Second level-Sequence-to-structure net
An important feature of the predictions generated by the first network is that they are uncorrelated, that is the network made prediction for each residue in isolation without reference to neighboring prediction. The correlation can be taken into account by using a second structure-to-structure network. The input to second filtering network is prediction obtained from the first net and the secondary structure predicted by PSIPRED.

The following figure shows the network architecture used in GammaPred

Multiple Alignment

Prediction from a multiple alignment of protein sequences rather than a single sequence has long been recognized as a way to improve prediction accuracy (Cuff and Barton, 1999). During evolution, residues with similar physico-chemical properties are conserved if they are important to the fold or function of the protein. The availability of large families of homologous sequences revolutionised secondary structure prediction. Traditional methods, when applied to a family of proteins rather than a single sequence proved much more accurate at identifying core secondary structure elements.

The same approach is used here for the prediction of gamam turns. It is a combination of neural network and multiple alignment information. Net is trained on the PSI-BLAST(part of PSIPRED) generated position specific scoring matrices.

PSI-BLAST

In PSI-BLAST(Position Specific Iterative Blast)(Altschul et al., 1997), the sequences extracted from a Blast search are aligned and a statistical profile is derived from the multiple alignment. The profile is then used as a query for the next search, and this loop is iterated a number of times that is controled by the user. For more information, Click here.

The PSIPRED method has been used for secondary structure prediction. It uses PSI-BLAST to detect distant homologues of a query sequence and generate position specific scoring matrix as part of the prediction process (Jones, 1999), and training is done on these intermediate PSI-BLAST generated position specific scoring matrices as a direct input to the neural network. The matrix has 21 X M elements, where M is the length of the target sequence and each element represents the likelihood of that particular residue substitution at that position in the template. It is a sensistive scoring system, whcih involves the probabilities with which amino acids occur at various positions.

Performance Measures

Here, four different parameters are used to measure the performance of GammaPred as described by Shepherd et al. (1999).
The predictive performance of a method is expressed by following four parameters:
1. Qtotal, the percentage of correctly classified residues, is defined as

where, p is the number of correctly classified gamma-turn residues, n is the number of correctly classified non-gamma-turn residues and t is the total number of residues in a protein. Qtotal, also known as 'prediction accuracy' may be defined simply as the total percentage of correct prediction. One difficulty with this measure is that it does not take into account disparities in the number of gamma-turns and non-turns. Hence, it is possible to get a Qtotal score of about 75% by the trivial strategy of predicting all residues to be non-turn residues. Therefore, there is a risk of losing the information because of the dominance of non-turn residues. The Matthews Correlation Coefficient remedies this problem, which is defined as
2. MCC, the Matthews Correlation Coefficient, defined as

where, p is the number of correctly classified gamma-turn residues, n is the number of correctly classified non-gamma-turn residues, o is the number of non-gamma-turn residues incorrectly classified as gamma-turn residues and u is the number of gamma-turn residues incorrectly classified as non-gamma-turn residues. It is a measure that accounts for both over- and under-predictions.
3. Qpredicted, defined as

Qpredicted is the percentage of gamma-turn predictions that are correct. Otherwise known as specificity, is the proportion of true negatives or the proportion of non-turn residues that have been correctly predicted as nonturns.
4. Qobserved, defined as

Qobserved is the percentage of observed gamma-turns that are correctly predicted. Otherwise, known as sensitivity, is the proportion of true positives or the proportion of gamma-turn residues that have been correctly predicted as gamma-turns.
Thus, the prediction accuracy is measured at residue level or accuracy is considered in terms of the percentage of individual amino acids predicted correctly.

Back to Submission Form