Download Datasets



DBpred provides the standard dataset of DNA interacting proteins obtained from HybridNAP and ProNA2020. Standard techniques were used for generating the dataset, which contains 646 non-redundatnt (30%, CD-HIT) DNA-interacting proteins for training, 46 non-redundatnt DNA-interacting proteins sequences for validation. The Trainig dataset consist of 15636 DNA-interacting and 298503 DNA non-interacting patterns, validated on validation dataset consisting of 965 DNA-interacting and 9911 non-interacting residues, with the pattern size of 17. In order to facilitate the users for using our dataset effectively, we provide DNA interacting protein sequences. User can download the dataset by clicking on the provided link.



DatasetDescriptionFiles

Training

Dataset contains 646 DNA-interacting protein sequences. Interacting residues have been shown in the form of '+' sign whereas non-interacting is denoted by '-' sign.

Validation

Dataset contains 46 DNA-interacting protein sequences. Interacting residues have been shown in the form of '+' sign whereas non-interacting is denoted by '-' sign.