Algorithms used in AHTpin

Datasets-

The peptides were derived mainly from AHTPDB, BIOPEP and ACEpepDB databases. We have developed models on separate datasets made according to the length of the peptide as in the figure below.

Descriptors Used

[1] Amino Acid Composition : It is the percentage content of each amino acid in a peptide/protein and is represented by a vector of 20 values for 20 natural amino acids calculated by the formula

Composition of amino acid (i) = Frequency of amino acid (i) / Length of the protein

where i can be any natural amino acid

[2] Atomic Composition : Similar to the Amino Acid Composition, the Atomic Composition of the atoms that constitute the natural amino acids namely Carbon(C), Hydrogen(H), Nitrogen(N), Oxygen(O) and Sulphur(S). For instance the Carbon composition is calculated as ration of number of Carbon atoms in a peptide to the total number of atoms in the peptide.

[3] Chemical Descriptors : A total of 15,537 types of descriptors, including 1D, 2D, 3D and 10 different types of binary fingerprints were calculated using the PaDEL software and only the minimum number of important descriptors were used for model building.

Machine Learning Technique

Support vector machines (SVMs) were trained with the selected sequence and atomic features. SVM allows choosing a number of parameters and kernels. The SVMlight software package (available at http://svmlight.joachims.org/) was used to construct SVM classifiers and regression models. In this study, we used the radial basis function (RBF) kernel:.

k(x ,y)=exp(-γ||x - y||²)

where x and y are two data vectors, and γ is a training parameter.

Evaluation of Regression models

In order to evaluate performance of our models, we used Pearson’s correlation coefficient (R). All models were evaluated using five-fold cross validation technique.

pcc

Where n is the size of test set, E_i^pred and E_i^act is the predicted and actual IC50 values respectively.

Evaluation of Classification models

In order to evaluate performance of our models, we used Mathew’s correlation coefficient (MCC). All models were evaluated using five-fold cross validation technique.

pcc

where TP, FP, TN and FN stand for True Positives, False Positive, True negative and False Negative respectively.

AHTpin

Antihypertensive Peptide Inhibitors

Algorithms used in AHTpin

Datasets-

Descriptors Used

Machine Learning Technique