We provide both positive and negative dataset without any charge, user can download it. These sequences are downloaded from swiss-prot and checked manualy. We remove all those sequences which are putative, fragment or by similarity GSTs proteins.
These data set contain only those sequences which have less than 90% similarity. Any sequence which have more than 90% sequence similarity removed from data set by using CD-HIT software.