Datasets compiled from literature (Collection of existing datasets)

We collected and compilied following datasets from literature. In past, these datasets have been used for developing prediction methods. This list is not complete, collection is in progress, please help us by submitting/suggesting new dataset to this collection

Table: Showing list of Datasets compiled from published literature
Types of DatasetsSub TypesPDB chains in datasetReferences (Pubmed ID)Original siteDownload Data
Regular Secondary structure helix, strand, coil513,406
1155
24761
60,21
987,988
83
25883141
29501620
25380779
28968641
23298369
22095872
Jpred4
PATSIM
POLYPROLINE
MemBrain
SSCon
K2D3
CB513 , CB406
PATSIM
PolyprOnline
MemBrain
SSCon
K2D3(Table S1)
Strand145224064422bcovBetaSheet1452
Irregular Secondary structure Beta-Turns1296
547
426
426,6376,20142
20673368
15822097
12592033
25728793
DEBT
COUDES
BTEVAL
BetaTPred3.0
PDB1296
FA547
GR426
BT426,BT6376,BT20142
Gamma-Turns490
749
320
17936305
12662358
10878855
sciencedirect
wiley.com
JBIOSCI
PA490
KG749
KG320
Alpha-Turns490
193
16894602
8652792
Alpha Turn V1.0
wiley.com
JX490
VP193
B-Hairpin53412177429BhairPredBhairPred
Beta barrel188171222843985TMBB-DBTMBB-DB
PDB derived Datasets with non-redundant sequence and structural quality criteria PDB_SELECT513019783827PDB_SELECTPDB_SELECT_25
PDBFINDER1683579021272PDBFINDERPDBFINDER
PISCES302312912846PISCESPISCES
AbDb1476 complete non-redundant, 57 non-redundant light chains, 177 non-redundant heavy chains29718130AbDbAbDb
DNA/RNA interacting residues DNA206
500
62
25
782
488,82
21069866
19767616
16845003
19594868
28381244
28132027
Proteins
3dfootprint
BindN
BindN
DisBind
DRNAPred
DBP206
3D-footprint
PDNA-62
PRINR25
DisBind
DRNAPred
RNA147
205
109
782
488,82
17483510
20483814
16790841
28381244
28132027
PRIDB
PRNA
RNABindR
DisBind
DRNAPred
RB147
PRNA
MT109
DisBind
DRNAPred
DNA/RNA interacting proteins DNA146
1153
92
146
18042272DNAbinder Main Dataset
Alternate Dataset
Realistic Dataset
Independent Dataset
RNA377

766,326
26306,24228,1678,2662
2241,369
1678
20677174

26607710
29495575
22192482
22192482
RNApred

PRIdictor
RPiRLS
RPISeq
RPIntDB
RNA Binding Data
non-RNA binding data
PRIdictor data
RPiRLS data
RPISeq data
RPIntDB data
Nucleotide interacting residues ATP168
429
168,227
227,17,1372
20021687
29361215
23288787
22130595
ATPint
ATPbind
TargetATPsite
NsitePred
ATPint168
ATPbind
TargetATPsite
NsitePred_ATP
ADP321,25,137222130595NsitePredNsitePred_ADP
AMP140,18,137222130595NsitePredNsitePred_AMP
GTP44
56,6,1372
20525281
22130595
GTPBinder
NsitePred
GTPbinder44
NsitePred_GTP
GDP105,9,137222130595NsitePredNsitePred_GDP
NAD195 20353553NADbinderNADbinder195
FAD19820122222FADPredFADPred198
Metals and Ions Interacting Residue Ioncom dataset142(Zn), 110(Cu), 227 (Fe2+), 103 (Fe3+), 379(Mn), 179 (Ca), 103(Mg), 53(K), 78(Na), 62(CO3), 22(NO2), 303(So4), 339(Po4)27378301IonComIonCom_dataset
Bacterial protein interaction Functional interaction1941
229
79
19798435
22102573
22053087
Bacteriome
DBETH
MimoDB
Functional interactions dataset
DBETH
MimoDB
TAP interaction91819798435BacteriomeHu et al. TAP interaction dataset
Functional & TAP interaction228319798435BacteriomeCombined interactions dataset
Experimental interaction229119798435BacteriomeExtended interaction dataset
Protein crystalization Propensity Dataset1 395818285371SSPF Crystallisation Propensity Predictors (Main server non-functional)ParCrys Datasets
Propensity Dataset2144, 50019755114MetaPPCP (Main server non-functional)MetaPPCP Datasets
Protein crystallization, purification and production propensity dataset 13587, 358521685077PPCpredPPCpred Dataset
Protein crystallization, purification and production propensity dataset 25383, 2334825148528PredPPCrys (Main server non-functional)PredPPCrys Dataset
Protein crystallization, purification and production propensity dataset 35383, 23348, 1194626906024CrysalisCrysalis Dataset
Protein crystallization and propensity dataset1197, 237824019868SCMCRYS (Main server non-functional)SCMCRYS Dataset
Helix packing Helix packing610wiki.c2b2Helix packing patternsHelix packing dataset
Membrane proteins Homologous Membrane Proteins 3616648166HOMEPhomep datasets
Transmembrane Proteins24715111065PhobiusPhobius
Membrane Proteins3249 trainset,4333 testset,7695 non-membrane proteins22386149ProClusEnsemProClusEnsem
Dihedral angles Dihedral angles dataset513,80,175,179,212,1989,198820025785DISSPreddisspred datasets
Protein backbone dihederal angles1267,1267,85,40,504629745828 RaptorX-AngleRaptorX-Angle datasets
Dihedral angles from chemical shifts and/or homology141,31,1516845087PREDITORPREDITOR datasets
Protein Backbone Torsion Angle500,460,102918923703 ANGLORANGLOR datasets
Protein Torsion Angle1552,1128923002 DNTorDNTor datasets
Surface accessibility Surface accessibility dataset21511170200 Protein surface accessibilityManesh-215
AcconPred5729,94526339631Solvent Accessibility and Contact Number SimultaneouslyAcconPred
Rotamer Libraries Dunbrack Rotamer Libraries85012163064Dunbrack Rotamer LibrariesDunbrack Rotamer Libraries
Tuffery et al's rotamer libraries292612557186Backbone independent

Backbone Dependent
Tuffery's Backbone independent

Tuffery's Backbone dependent
Penultimate Rotamer Library24010861930Penultimate Rotamer LibraryPenultimate Rotamer Library
Kirys et al's Rotamer Library23322544766Kirys Rotamer LibraryKirys Rotamer Library (Only Table S2&S3)