Following steps should be followed while using DNAbinder server :
Input sequence: There are two ways of sequence submission. Directly paste into text-box provided or upload the file by using "BROWSE" option. Sequences must be in FASTA format and single-letter code. All non standard characters (other than ACDEFGHIKLMNPQRSTVWY) will be ignored from the sequence. With amino acid composition based prediction batch mode submission is allowed. In case of PSSM profile based prediction, only one sequence can be predicted at a time. If more than one sequence is submitted only first sequence will be considered for prediction.
Prediction approach : There are two approaches of prediction on the basis of input vector used to train SVM
- Amino acid composition and
- PSSM profile
We have used three different datasets and developed SVM models using amino acid and PSSM as input.
- Main dataset: This module was developed using DNA-binding and non-binding protein chains. Hence, this module is best suited if domain sequence is submitted for prediction.
- Realistic dataset: This module was developed keeping in mind the real life situation where ratio of DNA-binding and non-binding protein is 1:10. This means it should be ideal choice if high specificity is desired during prediciton.
- Alternate dataset: If input sequence is full length protein, then this module should be ideal choice because prediction will be done using SVM modules developed using full length protein sequences.
SVM threshold : Selection of prediction threshold is most important parameter of prediction. DNAbinder server provides threshold in range of -1 to +1 (default= 0). If the prediction score of query sequence is more than specified threshold it will be predicted as DNA-binding otherwise non DNA-binding protein. To get prediction with less number of false positives, user should choose higher threshold. For prediction with less number of false negatives, threshold should be very low. In summary, for prediction with very high specificity threshold should be very high but for high sensitivity threshold should be low.
Output :The output shows SVM score along with the prediction results.
|