Help page of AntiCP
About Datasets

Following datasets were created to develop various SVM models:




Main dataset

We have extracted 225 experimentally validated ACPs from literature and databases like antimicrobial database (ADP), collection of antimicrobial peptides (CAMP) and database of anuran defense peptides (DADP). Majority of these peptides are AMPs with anticancer activities. All these peptides were unique and considered as positive examples. Since there are very few experimentally proved non-ACPs, we extracted (2250) random peptides from SwissProt proteins and considered them as negative examples (non-ACP). Thus, main dataset contains 225 ACPs and 2250 non-ACPs.


Alternate dataset

In this dataset, we have considered 225 experimentally validated ACPs as postive examples. We have extracted 1372 AMPs to which no anticancer activity were reported, from APD database and considered as negative examples (non-ACPs). Thus, this datset contains 225 ACPs and 1372 non-ACPs. This dataset was used to develop SVM models, which discriminate ACPs from AMPs.


Balanced datasets

In order to remove the biasness in machine learning, we have created balanced datasets from these realistic datasets.


Balanced dataset-1

We have extrated 225 random peptides from 2250 Swissprot peptides to creat a balanced dataset.


Balanced dataset-2

It contains 225 experimentally validated ACPs as positive exmples as stated above and equal number of AMPs were selected from 1372 AMPs extracted from APD database.

Support Vector Machine

In the present study, SVM classifier was used from freely available SVM_light package. This package is powerful as well as user-friendly where we can adjust the parameters and kernel functions like Linear, Polynomial, RBF and Sigmoid.

Evaluation or Performance

Both Ten-fold cross validation & Five-fold cross validation technique has been used.In Ten-fold cross validation & Five-fold cross nine and four are used for training repectively and remaining one in used for testing, in this way the process repeats ten & five times. Evaluation of performance of different SVM modules has been done by calculating accuracy and Matthew's correlation coefficient (MCC).

Input features for SVM

In this study we have been used various features as SVM input for the prediction of AntiCPs.

1. Amino Acid Composition: Amino Acid Composition is the fraction of each amino acid present in a peptide. There are 20 vectors generated in which one corresponds to one amino acid and these vectors used for as SVM input.
2. Dipeptde Composition: Dipeptde Composition is the fraction of each dipeptide like AA, AC, AD and so on. It provides compositional as well as local order each residue present in the peptide. It contains 20x20 (400) vectors.
3. Binary Profile pattern: Binary Profile pattern is represented by 20 vectors for each amino acid. For a peptide of length n, there are nx20 vectors generated in binary form which were used as SVM input.

Hybrid Method

We observed that there are number of motifs present in the AntiCPs. So, we have used this motif information for the prediction of AntiCPs. AntiCP motifs was searched by the Merci software and then query sequences were hit with the AntiCP motif list by Merci software.