Ar-NHPred:Prediction of aromatic-backbone NH interactions

[Ar-NH interactions] [Method Used] [Network Architecture] [How server works?] [Performance Measures] [References]

Hydrogen bonds (Baker and Hubbard, 1984; Jeffry and Saenger, 1994; Creighton, 1993), salt bridges (Horovitz et al., 1990; Pace et al., 1996)and the hydrophobic effect (Dill 1990; Lins and Brasseur 1995) all play role in protein folding and stability.The structural aspects of the"conventional" hydrogen bonds, which involve electronegative atoms such as N and O, are well established and are well investigated in proteins. However, hydrogen bonding is a very broad phenomenon that is not only restricted to N and O, but may involve less electronegative atoms. In fact, a great variety of weak hydrogen bonds such as C-H...pi,N-H...pi and C-H...O are known and these "non-conventional" hydrogen bonds have recently shown to be of greater importance in proteins (Desiraju and Steiner, 1999; Weiss, 2001).

Aromatic-backbone NH interactions (Ar-NH interactions)

Top

Aromatic residues Phe, Tyr, Trp have pi ring system that can form a hydrogen bond to the NH moiety,thereby offering additional stability (Toth et al., 2001 ; Levitt and Perutz 1988). Depending on the interaction with main-chain or side chain NH moiety, the interaction can be classified as Ar-NH(backbone) or Ar-NH(side-chain) interaction respectively. The Ar-NH interactions in the dataset have been identified using the web server NCI (http://www.mrc-lmb.com.ac.uk/genome/nci/)(Babu, 2003) that is based purely on geometric criteria. The default parameters (N...pi_m less than and equal to 4.3A; H...pi_m less than or equal to 3.5A; N-H...pi_m and N...pi_m...pi_n less than or equal to 30^o) have been used where pi_m represents the mid-point of pi-ring and pi_n represents the vector normal to the plane of the ring. Further, Ar-NH interactions have been selected which have donor and acceptor sequential separation up to three resiudes.

Method Used

Top

Two networks have been used: the first sequence-to-structure network and the second structure-to-structure network. The input to first networks is position specific scoring matrices generated by PSI-BLAST (Altschul et al., 1997). Using a second structure-to-structure network, the output obtained from the first network has been correlated. The input to second filtering network is prediction obtained from the first net and PSIPRED predicted secondary structure (Jones 1999). Four units encode each residue where one unit codes for interacting/non-interacting prediction and is the actual prediction score of first network. Remaining three units correspond to the reliability indices for three secondary structure states (helix, extended and coil) obtained from PSIPRED. Since at first stage, a neural network has been used to predict whether a given fragment has Ar-NH interaction or not, however it does not predict the position of the donor residue. The position of donor residue in "potential interacting fragments" (as predicted at stage 1) is predicted using a separate neural network, trained on single sequence (with amino acids as input) on the dataset containing different types of Ar-NH interactions, with donor residues present at different positions.

Neural Network Architecture

Top

The server uses two feed-forward neural networks with back-propagation (Rumelhart et al., 1986) as learning algorithm. Both the networks have input window seven residues wide and have 10 units in a single hidden layer. The target output consists of a single binary number and is 1 (having Ar-NH interaction) or 0 (having no Ar-NH interaction). The actual location/position of the donor residue in the positively predicted fragment has been further predicted using a separate network. This network has window size 7 and the target output has 7 units, each representing one of the possible Ar-NH interaction: Ar(i)-NH(i-3),Ar(i)-NH(i-2), Ar(i)-NH(i-1), Ar(i)-NH(i), Ar(i)-NH(i+1), Ar(i)-NH(i+2) and Ar(i)-NH(i+3). For a given input and set of weights, the output of the network will be 7 numbers between 0 and 1. The interaction type is the output unit having the highest activity level or value. In this work, the SNNSv4.2 neural network simulation package from Stuttgart University has been applied (Zell and Mamier, 1997, publicly available at http://www.informatik.uni-stuttgart.de/). The architecture of the whole network is shown in following figure. Figure a shows the network architecture for the prediction of Ar-NH interaction and figure b is the network used for prediction of actual location of donor residue within the positively predicted fragment.

Multiple alignment or position specific scoring matrices

PSIPRED uses PSI-BLAST to detect distant homologues of a query sequence and generate position specific scoring matrix (PSSM) as part of the prediction process, and here we have used these intermediate PSI-BLAST generated position specific scoring matrices as a direct input to the first level network. PSI-BLAST has been run on the standard NR(non-redundant) database. PSSM has 21 X M elements, where M is the length of the target sequence. Each element represents the likelihood of that particular residue substitution at that position.

How server works?

Top

The Ar_NHPred server predicts the aromatic-backbone NH interactions in a given amino acid sequence in following steps:

Step I: Get PSI-BLAST position specific matrices
The secondary structure of the target sequence is predicted by PSIPRED. PSIPRED uses PSI-BLAST to detect distant homologues of a query sequence and generate position specific matrix as part of the prediction process and here we used these intermediate PSI-BLAST generated position specific scoring matrices as a direct input to the first level sequence-to-structure network.

Step II: Generate patterns of window size 7 for first sequence-to-structure network
Patterns of target sequence with window size 7 are generated for prediction with SNNS first level sequence-to-structure network.

Step III: Select the fragments with aromatic residue at central position
From the dataset of fragments of length seven residue wide (generated at step II), select those fragments having aromatic residue at central position and flanked symmetrically by three residues on both sides.

Step IV: Run SNNS and analyze output
The neural network training is carried out using error back-propagation with a sum of square error function (SSE) (Rumelhart et al.1986). During the testing of the network, a cutoff value is set for each network and the output produced by the network is compared with the cutoff value. If the output is greater than the cutoff value, then the fragment is predicted to have Ar-NH interaction while if it is lower, it is considered to have no Ar-NH interaction.

Step V: Filtering by second structure-to-structure network
The input to second network is prediction obtained from the first net and the secondary structure predicted by PSIPRED. Four units encode each residue where one unit codes for output from first net and remaining three units are the reliability indices of three secondary structure states-helix, strand and coil.

StepVI: Generate patterns of window size 7 for second structure-to-structure network
Patterns with window size 7 are generated for prediction with SNNS second level structure-to-structure network and the fragments having aromatic residue at central position are retained.

Step VII: Run SNNS and analyze output
SNNS-Standard Backpropagation algorithm is executed and the output so obtained is analyze with a threshold value.

Step VIII: Predict the donor residue in the predicted fragment
A sequence-to-structure network trained on single sequence (with amino acids as input) is used to predict the position of donor residue in the "potential predicted" fragment and the correspondingly the fragment is assigned Ar(i)-NH(i-3),Ar(i)-NH(i-2), Ar(i)-NH(i-1), Ar(i)-NH(i), Ar(i)-NH(i+1), Ar(i)-NH(i+2) or Ar(i)-NH(i+3)Ar(i) interaction.

Performance Measures

Top

Following four different parameters are used to measure the performance of Ar_NHPred:

1. Qtotal, the percentage of correctly classified fragment, is defined as

where, p is the number of correctly classified interacting fragments, n is the number of correctly classified non-interacting fragments and t is the total number of fragments in the dataset. Qtotal, also known as 'prediction accuracy' may be defined simply as the total percentage of correct prediction. One difficulty with this measure is that it does not take into account disparities in the number of interacting fragments and non-interacting fragments (number of Ar-NH interactions are very less). Hence, it is possible to get a high Qtotal score by the trivial strategy of predicting all fragments to be non-interacting. Therefore, there is a risk of losing the information because of the dominance of non-interacting fragments. The Matthews Correlation Coefficient remedies this problem, which is defined as

2. MCC, the Matthews Correlation Coefficient, defined as

where, p is the number of correctly classified interacting fragments, n is the number of correctly classified non-interacting fragments, o is the number of non-interacting fragments incorrectly classified as interacting fragments and u is the number of interacting fragments incorrectly classified as non-interacting fragments. It is a measure that accounts for both over- and under-predictions.

3. Qpredicted, defined as

Qpredicted is the percentage of interacting fragments predictions that are correct. Otherwise known as specificity, is the proportion of true negatives or the proportion of non-interacting fragments that have been correctly predicted as noninteracting.

4. Qobserved, defined as

Qobserved is the percentage of observed interacting fragments that are correctly predicted. Otherwise, known as sensitivity, is the proportion of true positives or the proportion of interacting fragments that have been correctly predicted as interacting.

References

Top

Baker, E. N. and Hubbard, R. E. (1984) Hydrogen bonding in globular proteins. Prog. Biophys. Mol. Biol., 44, 97-179.

Jeffry, G. A. and Saenger, W. (1994) Hydrogen bonding in biological systems. Springer Verlag.

Creighton, T. (1993) Proteins: Structure and Molecular Properties, 2nd edn. W.H. Freeman and Co., New York

Horovitz, A., Serrano, L., Avron, B., Bycroft, M. and Fersht, A. (1990) Strength and co-operativity of contributions of surface salt bridges to protein stability. J. Mol. Biol., 216, 1031-1044.

Pace, C. N., Shirley, B. A., Mcnutt, M. and Gajiwala, K. (1996) Forces contributing to the conformational stability of proteins. FASEB J., 10, 75-83.

Dill, K. A. (1990) Dominant forces in protein folding. Biochemistry, 29, 7133-7155.

Lins, L. and Brasseur, R. (1995) The hydrophobic effect in protein folding. FASEB J., 9, 535-540.

Desiraju, G. R. and Steiner, T. (1999) The Weak Hydrogen Bond in Structural Chemsitry and Biology. Oxford University Press, Oxford.

Weiss, M. S. (2001) More hydrogen bonds for the (structural) biologist. Trends Biochem Sci., 26, 521-523.

Toth, G., Watts, C. R., Murphy, R. F. and Lovas, S. (2001) Significance of aromatic-backbone amide interactions in protein structure. Proteins, 43, 373-381.

Levitt, M. and Perutz, M. F. (1988) Aromatic rings as hydrogen bond acceptors. J. Mol. Biol., 201, 751-754.

Altschul, S.F., Madden, T.L., Alejandro, A.S., Zhang, J., Zhang, Z., Mil ler, W. and Lipman, D.J. 1997. Gapped blast and psi-blast: a new generation of protein databases and search programs. Nucleic Acids Research 25: 3389-3402.[Abstract]

Jones, D.T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195-202.[Abstract]

Rumelhart, D.E., Hinton, G.E. and Williams, R.J. 1986. Learning representations by back-propagation errors. Nature 323: 533.

Zell, A. and Mamier, G. 1997. Stuttgart Neural Network Simulator version 4.2 University of Stuttgart.

Aromatic-backbone NH interactions (Ar-NH interactions)

Method Used

Neural Network Architecture

How server works?

Step II: Generate patterns of window size 7 for first sequence-to-structure network Patterns of target sequence with window size 7 are generated for prediction with SNNS first level sequence-to-structure network.

Step III: Select the fragments with aromatic residue at central position From the dataset of fragments of length seven residue wide (generated at step II), select those fragments having aromatic residue at central position and flanked symmetrically by three residues on both sides.

StepVI: Generate patterns of window size 7 for second structure-to-structure network Patterns with window size 7 are generated for prediction with SNNS second level structure-to-structure network and the fragments having aromatic residue at central position are retained.

Step VII: Run SNNS and analyze output SNNS-Standard Backpropagation algorithm is executed and the output so obtained is analyze with a threshold value.

Performance Measures

Following four different parameters are used to measure the performance of Ar_NHPred:

1. Qtotal, the percentage of correctly classified fragment, is defined as

2. MCC, the Matthews Correlation Coefficient, defined as

3. Qpredicted, defined as

Qpredicted is the percentage of interacting fragments predictions that are correct. Otherwise known as specificity, is the proportion of true negatives or the proportion of non-interacting fragments that have been correctly predicted as noninteracting.

4. Qobserved, defined as

Qobserved is the percentage of observed interacting fragments that are correctly predicted. Otherwise, known as sensitivity, is the proportion of true positives or the proportion of interacting fragments that have been correctly predicted as interacting.

References

Step II: Generate patterns of window size 7 for first sequence-to-structure network
Patterns of target sequence with window size 7 are generated for prediction with SNNS first level sequence-to-structure network.

Step III: Select the fragments with aromatic residue at central position
From the dataset of fragments of length seven residue wide (generated at step II), select those fragments having aromatic residue at central position and flanked symmetrically by three residues on both sides.

StepVI: Generate patterns of window size 7 for second structure-to-structure network
Patterns with window size 7 are generated for prediction with SNNS second level structure-to-structure network and the fragments having aromatic residue at central position are retained.

Step VII: Run SNNS and analyze output
SNNS-Standard Backpropagation algorithm is executed and the output so obtained is analyze with a threshold value.