We have taken 843 experimentally validated CPPs from the CPPsite databse (Gautam et al, 2012) and generated three different datasets form these peptides. Three datasets are as follows:
CPPsite-1
In this dataset, we have included 708 CPPs having either high or low cell penetrating efficiency taken from CPPsite database. Here, we removed all the peptides containing non-natural and D-amino acids. Since we did not find experimentally proved non-CPPs in the literature, so we have generated equal number of peptides randomly from SwissProt proteins and considered them as negative.
CPPsite-2
In CPPsite database, we found peptides with different penetrating efficiency from low to high. In this dataset, we have taken 187 highly efficient CPPs. Negative peptides were generated randomly from SwissProt proteins.
CPPsite-3
Here, we have included CPPs with high penetrating efficiency same as in CPPsite-2 and CPPs with low efficiency were taken as negative peptides. We developed this dataset because it allows us to discriminate between CPPs with high and low penetration efficiency.
|