Tight turns

Definition

The architecture of protein three-dimensional structure is characterized by the repetitive motif elements such as alpha-helices and beta-sheets, and non-repetitive motif elements such as tight turns, bulges and random coil structures. Tight turns are the third "classical" secondary structures that serve to reverse the direction of the polypeptide chain and are responsible for the globular shape of proteins. They are located primarily on the protein surface and accordingly contain polar and charged residues. Antibody recognition, phophorylation, glycosylation, hydroxylation, and intron/exon splicing are found frequently at or adjacent to turns.

Types of Tight Turns

There are different types of tight turns (Chou 2000) depending upon the number of residues forming the turn. These are as follows:

  • Delta-turn - It is the smallest tight turn which involves only two amino acid residues and the intraturn hydrogen bond for a delta-turn is formed between the backbone NH(i) and the backbone CO(i+1).
  • Gamma-turn - It involves three amino acid residues and the intraturn hydrogen bond for a gamma-turn is formed between the backbone CO(i) and the backbone NH(i+2).
  • Beta-turn - A beta-turn involves four amino acid residues and may or may not be stabilized by the intraturn hydrogen bond between the backbone CO(i) and the backbone NH(i+3).
  • Alpha-turn - An alpha-turn involves five amino acid residues where the distance between the Calpha(i) and the Calpha(i+4) is less than 7Å and the pentapeptide chain is not in a helical conformation.
  • Pi-turn - It is the largest tight turn which involves six amino acid residues.

Gamma Turns

Of tight turns, beta-turn and gamma-turns have received much attention and been studied in detail. The second most characterized turn class, after the beta turns, is the gamma turns. A gamma turn resembles a beta turn, but it has only three turn residues.

Definition

A gamma-turn is defined by the existence of a hydrogen bond between the CO group of ith residue and the NH of the (i+2)th residue.
Earlier studies have shown that the gamma-turn structure is fairly common in proteins. The first example of a gamma-turn in a protein was described by Matthews (1972) at the end of a beta-hairpin in thermolysin. Most of the classic gamma-turns occur at the end of beta-hairpins, but very few of the inverse. Also, gamma turns occur at ligand binding sites or active sites, for example the loop where that catalytically important aspartate is located in serine proteases. It has however been postualted that inverse gamma turns may function as intermediates in folding and thus stabilizing beta-strands before they become beta-sheets. Recently, gamma-turns have brought attention through studies that describe the incorporation of peptide secondary structure mimetics into small bioactive peptides in the development of stable, effective and selective receptor ligands.

Types of Gamma Turns

Gamma-turns are divided into two classes called inverse and classic. They differ in that the main-chain atoms of the two forms are related by mirror symmetry (just as type I and I' or type II and II' beta-turns are). Of the two, classic ones are far less common, and those that do exist are frequently found at the loop ends of beta-hairpins. On the other hand inverse gamma-turns tend not to give rise to polypeptide chain reversal. The following figure shows the two types of gamma-turns:


The dihedral angle intervals for classic and inverse gamma-turns are:

Method Used

Neural Networks

Also referred to as connectionist architectures, parallel distributed processing, and neuromorphic systems, an artificial neural network (ANN) is an information-processing paradigm inspired by the way the densely interconnected, parallel structure of the mammalian brain processes information. Artificial neural networks are collections of mathematical models that emulate some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of the ANN paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements that are analogous to neurons and are tied together with weighted connections that are analogous to synapses.

The gammapred server uses two feed forward back propagation networks with a single hidden layer. Both the networks have window five residues wide and have 25 units in a single hidden layer. The target output consists of a single binary number and is 1 or 0 (true or false).
For neural network implementation and to generate the neural network architecture and the learning process, the publicly available free simulation package SNNSv4.2 from Stuttgart University is used. It allows incorporation of the resulting networks into an ANSI C function for use in stand-alond code. A linear activation function is used.


Backpropagation

In 1986, Rumelhart et al. redeveloped and popularized a supervised learning algorithm, called Backpropagation or the generalized delta rule. It finally provided the multi-layer perceptron with an efficient learning rule. Backpropagation is based on the minimization of a suitable error or cost function, in this case the average error between the actual output signal and the corresponding desired output signal.
The global error E at the output layer is the sum of squared differences of the desired outputs dj and the actually calculated outputs oj of each output neuron j, and can be expressed as
This error is a function of the connection weights which are the parameters that have to be optimized in such a way that the error E becomes a minimum. On a multi-dimensional error surface, the goal is to approach or even reach the global minimum which is achieved by gradient descent method.

Network architecture

In GammaPred server, following two networks have been used:
Neural Network SNNS: First level-Sequence-to-structure net
The input to the first network is PSI-BLAST obtained position specific matrices. PSIPRED uses PSI-BLAST to detect distant homologues of a query sequence and generate position specific matrix as part of the prediction process, and here we have used these intermediate PSI-BLAST generated position specific scoring matrices, as a direct input to the first network. The matrix has 21 X M real elements, where M is the length of the target sequence.
Neural Network SNNS: Second level-Sequence-to-structure net
An important feature of the predictions generated by the first network is that they are uncorrelated, that is the network made prediction for each residue in isolation without reference to neighboring prediction. The correlation can be taken into account by using a second structure-to-structure network. The input to second filtering network is prediction obtained from the first net and the secondary structure predicted by PSIPRED.

The following figure shows the network architecture used in GammaPred


Multiple Alignment

Prediction from a multiple alignment of protein sequences rather than a single sequence has long been recognized as a way to improve prediction accuracy (Cuff and Barton, 1999). During evolution, residues with similar physico-chemical properties are conserved if they are important to the fold or function of the protein. The availability of large families of homologous sequences revolutionised secondary structure prediction. Traditional methods, when applied to a family of proteins rather than a single sequence proved much more accurate at identifying core secondary structure elements.

The same approach is used here for the prediction of gamam turns. It is a combination of neural network and multiple alignment information. Net is trained on the PSI-BLAST(part of PSIPRED) generated position specific scoring matrices.

PSI-BLAST

In PSI-BLAST(Position Specific Iterative Blast)(Altschul et al., 1997), the sequences extracted from a Blast search are aligned and a statistical profile is derived from the multiple alignment. The profile is then used as a query for the next search, and this loop is iterated a number of times that is controled by the user. For more information, Click here.

The PSIPRED method has been used for secondary structure prediction. It uses PSI-BLAST to detect distant homologues of a query sequence and generate position specific scoring matrix as part of the prediction process (Jones, 1999), and training is done on these intermediate PSI-BLAST generated position specific scoring matrices as a direct input to the neural network. The matrix has 21 X M elements, where M is the length of the target sequence and each element represents the likelihood of that particular residue substitution at that position in the template. It is a sensistive scoring system, whcih involves the probabilities with which amino acids occur at various positions.

  Performance Measures
Here, four different parameters are used to measure the performance of GammaPred as described by Shepherd et al. (1999).

The predictive performance of a method is expressed by following four parameters:

1. Qtotal, the percentage of correctly classified residues, is defined as

where, p is the number of correctly classified gamma-turn residues, n is the number of correctly classified non-gamma-turn residues and t is the total number of residues in a protein. Qtotal, also known as 'prediction accuracy' may be defined simply as the total percentage of correct prediction. One difficulty with this measure is that it does not take into account disparities in the number of gamma-turns and non-turns. Hence, it is possible to get a Qtotal score of about 75% by the trivial strategy of predicting all residues to be non-turn residues. Therefore, there is a risk of losing the information because of the dominance of non-turn residues. The Matthews Correlation Coefficient remedies this problem, which is defined as

2. MCC, the Matthews Correlation Coefficient, defined as

where, p is the number of correctly classified gamma-turn residues, n is the number of correctly classified non-gamma-turn residues, o is the number of non-gamma-turn residues incorrectly classified as gamma-turn residues and u is the number of gamma-turn residues incorrectly classified as non-gamma-turn residues. It is a measure that accounts for both over- and under-predictions.

3. Qpredicted, defined as

Qpredicted is the percentage of gamma-turn predictions that are correct. Otherwise known as specificity, is the proportion of true negatives or the proportion of non-turn residues that have been correctly predicted as nonturns.

4. Qobserved, defined as

Qobserved is the percentage of observed gamma-turns that are correctly predicted. Otherwise, known as sensitivity, is the proportion of true positives or the proportion of gamma-turn residues that have been correctly predicted as gamma-turns.

Thus, the prediction accuracy is measured at residue level or accuracy is considered in terms of the percentage of individual amino acids predicted correctly.