We designed our web-server icaars in very user friendly mode. Where any user can submit their protein sequences for the prediction and classification of aaRSs. Here we describing the way how to use icaars easily. The Job submission page includes following options and sections, which you can use according to your need.
Job Title [optional]: The user can give a name of job for the prediction.
E-mail [optional]: Sometimes the jobs need more computation power and time. It is advisable to give Email ID to run the process instead of waiting for a long time to publish result at the terminal itself.User can then stop the process and submit new job.All the job result will be mailed to him/her.
Sequence Input: The user can paste/upload protein sequences in FASTA format.
Monopeptide composition based prediction: In this percentage composition of all 20 amino acids were calculated, which inturn were used to derive the weight corrosponding to each amino acid. It was done by substracting the composition data. To determine the any unknown protein, compositions is calculated and then corrosponding weight is multiplied to it. All the 20 values determined in this way is summed up to get the cumulative score. If the cumulative score is less than 0 then it will be classified as non-Aminoacyl-tRNA syntehtases protein and vice-versa.
Dipeptide composition based prediction: In this percentage composition of all 400 (20X20) dipeptides were calculated, which inturn were used to derive the weight corrosponding to each dipeptide. It was done by substracting the composition data. To determine the any unknown protein, dipeptide compositions is calculated and then corrosponding weight is multiplied to it. All the 400 values determined in this way is summed up to get the cumulative score. If the cumulative score is less than 0 then it will be classified as non-Aminoacyl-tRNA synthetases protein and vice-versa. In the case of 30% non-redundanct dataset we have selected 18 dipeptides for the discrimination between aaRSs and non-aaRSs and 14 dipeptides for discrimination between class-1 and class-2 aaRSs, by using WEKA method and calculated composition of only those dipeptides.
Hybrid approach1 [Monopeptide composition and PROSITE] based prediction: In this percentage composition of all 20 monopeptides were calculated and we have selected four PROSITE motifs (PS50862, PS00178, PS50860 and PS50861) and created a hybrid approach with monopeptide composition and prepared a SVM based model for total 24 (20+4) vectors for each sequence. which inturn were used to derive the weight corrosponding to each monopeptide and PROSITE motif. It was done by substracting the composition data and motif data. To determine the any unknown protein, monopeptide compositions and PROSITE motif is calculated and then corrosponding weight is multiplied to it. All the 24 values determined in this way is summedup to get the cumulative score. If the cumulative score is less than 0 then it will be classified as non-Aminoacyl-tRNA synthetases protein and vice-versa. We have added one additional signature profile PS50889 for the prediction between class-1 and class-2 aaRSs, by using 25 (20+5) vectors input.
Hybrid approach2 [Dipeptide composition and PROSITE] based prediction: In this percentage composition of all 400 (20X20) dipeptides were calculated and we have selected four PROSITE motifs (PS50862, PS00178, PS50860 and PS50861) and created a hybrid approach with dipeptide composition and prepared a SVM based model for total 404 (400+4) vectors for each sequence. which inturn were used to derive the weight corrosponding to each dipeptide and PROSITE motif. It was done by substracting the composition data and motif data. To determine the any unknown protein, dipeptide compositions and PROSITE motif is calculated and then corrosponding weight is multiplied to it. All the 404 values determined in this way is summedup to get the cumulative score. If the cumulative score is less than 0 then it will be classified as non-Aminoacyl-tRNA synthetases protein and vice-versa. We have added one additional signature profile PS50889 for the prediction between class-1 and class-2 aaRSs, by using 405 (400+5) vectors input. In the case of 30% non-redundanct dataset, we have combined domain based features with selected 18 dipeptides for the discrimination between aaRSs and non-aaRSs and 14 dipeptides for discrimination between class-1 and class-2 aaRSs.
Sequence Redundancy Options: We have developed the different prediction methods for the different redundancy (100%, 90%, 80%, 70%, 60%, 50%, 40% and 30%) of sequences. User can select any one type of method for the prediction od AARS. The default prediction method is based on 30% redundancy.
Realistic dataset based prediction: In this prediction, we have created a realistic dataset, which contains 117 aaRSs and 1200 non-aaRSs (nearly 1:10 ratio of aaRSs/positive and non-aaRSs/negative instances). We developed models using Hybrid approach2(selected dipeptide composition and domain features) for discriminating aaRSs and non-aaRSs on above 30% non-redundant dataset.
SVM Threshold Options: The method is based on Support Vector Machine (SVM). Depending upon the threshold value which user choices, SVM will classify the unknown protein into AARS and non-AARS protein. The default threshold is 0.0. If user want less sensitivity but more specificity, then higher threshold value should be specified, but if opposite is anticipated then lower threshold value should be choosen. So, the expected outcome will depends on the trade-off between sensitivity and specificity.