Prediction
of neurotoxins based on their function and source
Running
Tite: Neurotoxins prediction
Address correspondence to:
Dr. G. P. S. Raghava, Professor, Department of Computational Biology Institute of Microbial
Technology Okhla Phase 3,
New Delhi, INDIA, Phone: +91-11-26907444; Fax: +91-172-26907444 E-mail: raghava@iiitd.ac.in
Feed forward network and recurrent neural network were used for the classification of neurotoxins and non toxins. Different fidden nodes were used with single hidden layer. The performance at of the best module at different thresholds was shown below.
Table S1: The performance of FNN on classifying
neurotoxin sequences and non-toxin sequences a hidden node 35
Thres |
Sensitivity |
Specificity |
Accuracy |
PPV |
MCC |
1.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.9000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.8000 |
0.0018 |
0.2000 |
0.1013 |
0.2000 |
0.0133 |
0.7000 |
0.0246 |
0.7983 |
0.4131 |
0.7600 |
0.0859 |
0.6000 |
0.5193 |
0.8000 |
0.6603 |
0.8882 |
0.4059 |
0.5500 |
0.7035 |
0.7948 |
0.7493 |
0.8886 |
0.5346 |
0.5000 |
0.8965 |
0.7878 |
0.8419 |
0.8839 |
0.6893 |
0.4500 |
0.9614 |
0.6783 |
0.8192 |
0.7907 |
0.6450 |
0.4000 |
0.9912 |
0.1896 |
0.5886 |
0.5500 |
0.2730 |
0.3000 |
0.9912 |
0.0087 |
0.4978 |
0.4978 |
-0.0004 |
0.2000 |
0.9912 |
0.0087 |
0.4978 |
0.4978 |
-0.0004 |
0.1000 |
0.9912 |
0.0087 |
0.4978 |
0.4978 |
-0.0004 |
Table S2: The performance of RNN on classifying
neurotoxin sequences and non-toxin sequences a hidden node 60
Thresold |
Sensitivity |
Specificity |
Accuracy |
PPV |
MCC |
1.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.9000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.8000 |
0.0035 |
0.4000 |
0.2026 |
0.4000 |
0.0266 |
0.7000 |
0.1581 |
0.9913 |
0.5769 |
0.9450 |
0.2690 |
0.6000 |
0.5544 |
0.9913 |
0.7738 |
0.9844 |
0.6074 |
0.5500 |
0.7281 |
0.9913 |
0.8603 |
0.9881 |
0.7464 |
0.5000 |
0.8228 |
0.9809 |
0.9022 |
0.9771 |
0.8144 |
0.4500 |
0.8912 |
0.9635 |
0.9275 |
0.9603 |
0.8572 |
0.4000 |
0.9877 |
0.7548 |
0.8707 |
0.8004 |
0.7633 |
0.3000 |
0.9912 |
0.0087 |
0.4978 |
0.4978 |
-0.0004 |
0.2000 |
0.9912 |
0.0087 |
0.4978 |
0.4978 |
-0.0004 |
0.1000 |
0.9912 |
0.0087 |
0.4978 |
0.4978 |
-0.0004 |
The performance PSI-BLAST for prediction of neurotoxins at different E-values.
Table S3: Results of PSI-BLAST on various E-values on
neurotoxins dataset using five fold cross-validation.
E-value |
Hits(570) |
False positive (570) |
1.0 |
390 (68.42%) |
42 |
10-1 |
391(68.60%) |
32 |
10-2 |
391(68.60%) |
28 |
10-3 |
387(67.89%) |
23 |
10-6 |
343(60.18%) |
19 |
10-9 |
281(49.30%) |
13 |
The performances of MAST at different E-values were performed on MEME matrix formed on neurotoxins sequences based on five-fold validation. Five MEME matrices have been created corresponding to five learning sets, one matrix for one learning set (four sets). Then each matrix was used in as input file for searching motifs in remaining one set (testing set) using program MAST.
Table S4: The MEME/MAST result of classification of neurotoxins and non toxins at different E-value
E-value |
Neurotoxin (570) |
Non-toxin(570) |
0.0001 |
165 (28.95%) |
4 (0.7%) |
0.001 |
174(29.82%) |
4 (0.7%) |
0.01 |
184 (32.28%) |
4 (0.7%) |
.1 |
205(35.96%) |
5(0.87%) |
1 |
246 (43.16%) |
13(2.28%) |
10 |
293(51.40%) |
64(11.22%) |
20 |
311(54.56%) |
99(17.36%) |
50 |
353(61.92%) |
196 (34.38%) |
100 |
395(69.29%) |
311 (54.56%) |
Approach |
Sensitivity |
Specificity |
PPV |
Accuracy |
MCC |
Composition (C) |
96.32% |
97.22% |
97.72% |
97.72% |
0.9416 |
Dipeptide |
93.68% |
98.42% |
98.41% |
96.05% |
0.9247 |
Meme/mast( E-value 0.1) |
35.96% |
0.87% |
|
|
|
Meme/mast( E-value 0.01) |
32.28% |
0.7% |
|
|
|
Meme/mast( E-value 0.001) |
29.82% |
0.7% |
|
|
|
Meme/mast( E-value 0.0001) |
28.94% |
0.7% |
|
|
|
FNN (0.5 threshold) |
89.65% |
78.78% |
88.39% |
84.19% |
0.6890 |
RNN (0.45 threshold) |
89.12% |
96.35% |
96.03% |
92.75% |
.8572 |
(C )+meme/mast (E-value .1) |
96.84% |
96.84% |
96.84% |
96.84% |
.9368 |
(C )+meme/mast (E-value .01) |
96.67% |
97.02% |
97.00% |
96.84% |
.9368 |
(C )+meme/mast (E-value .001) |
96.49% |
97.02% |
97.00% |
96.75% |
.9351 |
(C )+meme/mast (E-value .0001) |
96.49% |
97.02% |
97.00% |
96.75% |
.9351 |
Comp.+length |
97.54% |
97.19% |
97.25% |
97.37% |
0.9485 |
Dipep +length |
96.67% |
95.09% |
95.28% |
95.88% |
0.9195 |
Table S6: The MEME/MAST result
of classification of neurotoxins coming from different source at different
E-value. In column 1, the value within bracket is the total number of
sequences, where as in 2-6 columns, the value within bracket is the false
positive number.
Source |
Ev .0001 |
Ev .001 |
Ev .01
|
Ev .1 |
Ev 1 |
Ev 10 |
Ev 50 |
Arthropoda (310) |
131(1) |
141(1) |
148(6) |
166(15) |
190(33) |
245(80) |
273(159) |
Bacteria(10) |
|
|
|
|
|
|
|
Chordata(135) |
104(0) |
105(0) |
107(0) |
109(1) |
112(9) |
113(34) |
124(215) |
Cnidaria(20) |
12(0) |
12 (0) |
12(1) |
12(1) |
12(7) |
13(52) |
14(136) |
Mollusca(95) |
42(0) |
42(0) |
43(0) |
47(3) |
52(22) |
58(65) |
68(177) |
Total (570) |
289(1) |
300(1) |
310(7) |
334(20) |
366(71) |
429(231) |
479(528) |
Table S7: The MEME/MAST result
of classification of neurotoxins based on target of action at different
E-value. In column 1, the value within bracket is the total number of
sequences, where as in 2-6 columns, the value within bracket is the false
positive number.
Function |
Ev .0001 |
Ev.001
|
Ev.01
|
Ev .1
|
Ev 1 |
Ev 10 |
Ev 50 |
BIC(330) |
66(0) |
70(0) |
78(1) |
89(1) |
105(13) |
145(25) |
205(25) |
BAR(85) |
60(2) |
61(2) |
64(3) |
67(4) |
67(9) |
67(35) |
73(109) |
IAR1(5) |
5(0) |
5(0) |
5(2) |
5(7) |
5(26) |
5(95) |
5(282) |
IAR2(20) |
17(1) |
18(1) |
19(1) |
19(3) |
19(14) |
20(46) |
20(156) |
FAR(10) |
6(2) |
6(8) |
6(22) |
6(44) |
6(79) |
6(196) |
7(321) |
Total(450) |
154(5) |
160(11) |
172(29) |
186(59) |
202(131) |
243(397) |
310(893) |
Table S8: The MEME/MAST result
of classification ion channels blockers at different E-value. In column 1, the
value within bracket is the total number of sequences, where as in 2-6 columns,
the value within bracket is the false positive number.
Blockers of ion channels |
Ev .0001 |
Ev .001 |
Ev .01
|
Ev .1 |
Ev 1 |
Ev 10 |
Ev 50 |
Calcium (80) |
19(4) |
21(4) |
24(4) |
28(5) |
32(9) |
46(36) |
65(109) |
Chlorine(5) |
5(0) |
5(1) |
5(7) |
5(13) |
5(41) |
5(67) |
5(189) |
Potassium(90) |
38(1) |
42(1) |
47(4) |
53(11) |
62(28) |
71(85) |
82(168) |
Sodium(150) |
78(4) |
81(5) |
83(7) |
84(11) |
94(19) |
100(51) |
113(115) |
Total(325) |
140(9) |
149(11) |
159(22) |
170(40) |
193(97) |
222(239) |
265 |
The performance various approaches used classification of neurotoxins based on source, function and sub-classification of ion channels blockers. The Hybrid approaches used at different E-value is shown.
Table S9.The performance of various approaches used
in classification of neurotoxins based
on source.
________________________________________________________________________
Approach Eubacteria Cnidaria Mollusca Arthropoda Chordata Overall
ACC MCC ACC MCC ACC MCC ACC M CC ACC MCC ACC
Composition (A) 100
.9134 30.0 0 .4441 63.16
.5804 86.45 .6426 78.52 .7495 78.94
Dipeptide (B) 100 .8671
50.00 .6776 76.84 .7426
92.58 .8069 90.37 .8911
88.07
A + length ( C ) 90
.8656 50.00 .5893 70.53
.7433 95.16 .7385 76.30
.7854 84.91
B + length (D) 90 .8656
55.00 .6169 83.16 .7998
94.19 .8075 80.74 .8079
87.72
PSI-BLAST (E) 90 70.00 54.74 67.74 90.37 71.40
Meme/Mast (F1) - 60.00 49.47
53.54
80.74
58.59
Meme/Mast(F2)
60.00
45.26 47.74 79.26 54.38
Meme/Mast(F3)
60.00
44.21 45.48 77.78 52.63
Meme/Mast(F4)
60.00 44.21 42.26 77.04 50.70
Hybrid1 (E+A) 100
.8421 70.00 .7757 78.95
.7527 92.58 .8409 97.04 .9518 90.70
Hybrid2 (E+B) 100
.8128 70.00 .8321 81.05
.7726 93.87 .8692 96.30
.9421 91.58
Hybrid3 (E+C) 90 .8182
70.00 .8025 80.00 .8361
97.74 .8611 91.11 .9314
92.10
Hybrid4 (E+D) 90 .8182
70.00 .8026 85.26 .8176
95.48 .8762 90.37 .8970
91.58
Hybrid5 (F1+A) 100
.8421 65.00 .7430 77.89
.7145 90.97 .8128 95.56
.9418 89.12
Hybrid6 (F1+B) 100
.7863 65.00 .8011 82.11
.7799 93.23 .8515
93.33 .9126 90.53
Hybrid7 (F1+C) 90
.8182 60.00 .7373 77.89
.7968 96.13 .8053 84.44
.8713 88.95
Hybrid8 (F1+D 90 .8182
60.00 .7373 85.26 .8068
94.19 .8269 83.70 .8365
88.94
Hybrid9 (F2+A) 100
.8421 65.00 .7430 74.74
.6918 90.65 .7985 95.56
.9371 88.43
Hybrid10 (F2+B) 100 .7863 65.00
.8011 80.00 .7600 92.90
.8408 93.33 .9126 88.99
Hybrid11 (F2+C) 90
.8182 60.00 .7373 77.89
.7968 96.13 .8053 84.44
.8713 88.95
Hybrid12 (F2+D) 90
.8182 60.00 .7373 84.21
.7996 94.19 .8234 83.70
.8365 88.77
Hybrid13 (F3+A) 100 .8421 65.00 .7430 73.68
.6842 90.65 .7877 94.07
.9270 87.89
Hybrid14 (F3+B) 100
.7863 65.00 .8011 78.95
.7474 92.58 .8338 93.33 .9126 89.65
Hybrid15 (F3+C) 90
.8182 60.00 .7373 76.84
.7897 96.13 .7953 82.96
.8612 88.42
Hybrid16 (F3+D) 90
.8182 60.00 .7373 83.16
.7925 94.19 .8165 82.96
.8313 88.42
Hybrid17 (F4+A) 100
.8421 65.00 .7430 73.68
.6842 90.32 .7842 94.07
.9223 87.72
Hybrid18 (F4+B) 100
.7863 65.00 .8011 78.95
.7474 92.58 .8338 93.33
.9126 89.65
Hybrid19 (F4+C) 90
.8182 60.00 .7373 76.84
.7897 95.81 .7914 82.96
.8561 88.25
Hybrid20 (F4+D) 90
.8182 60.00 .7373 83.16
.7925 94.19 .8165 82.96
.8313 88.42
ACC: Accuracy; MCC: Matthew’s correlation coefficient.
Table S10.The performance of various approaches
used on classification of neurotoxins based on function.
________________________________________________________________________
Approach BIC BAR IAR1 IAR2 FAR Overall
ACC MCC ACC MCC ACC MCC ACC M CC ACC MCC ACC
Composition(A) 87.58 .6255
75.29 .6902 100.00 1.000 65.00 .5860 30.00
.2209 83.11
Dipeptide (B) 94.24 .7664
85.88 .8318 100.00 .9406 90.00 .9199 30.00
.3406 91.10
A+length ( C ) 94.24 .6857
69.41 .6737 100.00 1.000 85.00 .8752 50.00 .6219 88.22
B+length(D) 95.45 .8805
97.65 .9660 100.00 1.000 90.00 .8035 60.00
.7625 94.88
PSI-BLAST (E) 52.73 84.71 100.00 100.00 70.00 61.78
Meme/Mast (F1) 26.97 78.82 100.00 95.00 60.00
41.33
Meme/Mast(F2) 23.64 75.29 100.00 95.00 60.00 38.22
Meme/Mast(F3) 21.21 71.64 100.00 90.00 60.00 35.53
Meme/Mast(F4) 20.00 70.59 100.00 85.00 60.00 34.22
Hybrid1 (E+A) 90.61 .7843
85.88 .7715 100.00 1.00
100.00 .8536 70.00 .5598 89.83
Hybrid2 (E+B) 93.33 .8130
88.24 .8232 100.00
.9118 100.00 .9512 70.00
.6300 92.22
Hybrid3 (E+C) 95.45 .8418
85.88 .8259 100.00 1.00
100.00 .9292 80.00 .8399 93.55
Hybrid4 (E+D) 94.85 .8869
97.65 .9501 100.00 1.00
100.00 .7666 70.00 .8338 95.11
Hybrid5 (F1+A) 90.91 .7631
82.35 .7640 100.00 1.00 95.00 .8247 80.00
.6027 89.33
Hybrid6 (F1+B) 94.24 .8277
88.24 .8420 100.00 .9118
100.00 .9748 80.00 .6940 93.11
Hybrid7 (F1+C) 95.15 .8058 81.18 .7870 100.00
1.00 100.00 .9512 80.00
.8399 92.44
Hybrid8 (F1+D) 95.45 .9029
98.82 .9714 100.00 1.00
100.00 .7795 80.00 .8924 96.00
Hybrid9 (F2+A) 90.30 .7477
81.18 .7497 100.00 1.00
95.00 .8082 80.00 .6027 88.67
Hybrid10 (F2+B) 94.24
.8215 87.06 .8340 100.00 .9118
100.00 .9748 80.00 .6940 92.89
Hybrid11 (F2+C) 94.85
.7944 80.00 .7723 100.00
1.00 100.00 .9512 80.00
.8399 92.00
Hybrid12 (F2+D) 95.45
.9029 98.82
.9714 100.00 1.00 100.00
.7795 80.00 .8924 96.00
Hybrid13 (F3+A) 90.00
.7303 80.00 .7354 100.00
1.00 90.00 .7785 80.00 .6027 88.00
Hybrid14 (F3+B) 94.24
.8215 87.06 .8340 100.00
.9118 100.00 .9748 80.00
.6940 92.89
Hybrid15 (F3+C) 94.24
.7716 78.82 .7513 100.00 1.00 95.00
.9236 80.00 .8399 91.12
Hybrid16 (F3+D) 95.45
.8968 98.82 .9714 100.00
1.00 95.00 .7503 80.00
.8924 95.77
Hybrid17 (F4+A) 90.00
.7172 78.82 .7270 100.00
1.00 85.00 .7480 80.00
.6027 87.55
Hybrid18 (F4+B) 94.24
.8153 87.06 .8340 100.00
.9118 95.00 .9477 80.00 .6940 92.67
Hybrid19 (F4+C) 94.24
.7590 77.65 .7429 100.00
1.00 90.00 .8953 80.00
.8399 90.67
Hybrid20 (F4+D) 95.45
.8907 98.82 .9714 100.00
1.00 90.00 .7205 80.00
.8924 95.55
BIC= Blocks ion channels;BAR= Blocks acetylcholine receptors; IAR1 =Inhibits Ach release by metalloproteolytic activity; IAR2= Inhibits Ach release by phospholipase A2 activity; FAR= Facilitates acetylcholine release;ACC: Accuracy; MCC: Matthew’s correlation coefficient.
Table S11.The performance of various modules
including SVM modules on classification of
ion channel inhibitors.
________________________________________________________________________
Approach Sodium Potassium Calcium Chloride Overall
ACC MCC ACC MCC ACC MCC
ACC MCC ACC
Composition(A) 72.00 .5369
68.89 .5219 35.00 .1605
100.00 .8379 62.46
Dipeptide(B) 75.33 .3635 70.00
.6908 50.00 .3502 100.00 1.000 68.00
A+length ( C ) 63.33 .4782
73.33 .5481 48.75 .3211
60.00 .5403 62.46
B+length(D) 78.00 .5508
73.33 .7082 55.00 .4234
60.00 .6000 70.77
PSI-BLAST (E) 62.67 62.22 55.00 100.00 61.23
Meme/mast (F1) 56.00 58.88 35.00 100.00 52.31
Meme/Mast(F2) 55.33 52.22 30.00 100.00 48.92
Meme/Mast(F3) 54.00 46.67 26.25 100.00 45.85
Meme/Mast(F4) 52.00 42.22 23.75 100.00 43.08
Hybrid1 (E+A) 74.00 .6228
73.33 .6248 65.00 .4600
100.00 .7868 72.00
Hybrid2 (E+B) 73.33 .6038
74.44 .7386 71.25 .4564
100.00 1.000 73.54
Hybrid3 (E+C) 66.67 .5637
75.56 .5890 63.75 .4285
100.00 .9111 68.92
Hybrid4 (E+D) 75.33 .6284 78.89 .7498
68.75 .4702 100.00 1.000
75.08
Hybrid5 (F1+A) 74.00 .5719
81.11 .6515 48.75 .3721
100.00 .7407 70.03
Hybrid6 (F1+B) 75.33 .5479
76.67 .7478 57.50 .3754
100.00 1.00 71.69
Hybrid7 (F1+C) 66.67 .5502 80.00 .6200
60.00 .4179 100.00 .91114
69.23
Hybrid8 (F1+D) 78.00 .5795
77.78 .7637 65.00 .4877
100.00 1.00 75.08
Hybrid9 (F2+A) 74.00 .5595
76.67 .6096 45.00 .3144
100.00 .7407 68.00
Hybrid10 (F2+B) 75.33 .5241
74.44 .7310 53.75
.3379 100.00 1.00 70.15
Hybrid11 (F2+C) 66.67 .5367
78.89 .5999 56.25 .3859
100.00 .9114 68.00
Hybrid12 (F2+D) 78.00 .5618
76.67 .7554 62.50 .4668
100.00 1.00 74.15
Hybrid13 (F3+A) 74.00 .5473
75.56 .5948 41.25 .2740
100.00 .7407 66.77
Hybrid14 (F3+B) 75.33 .5065
73.33 .7225 51.25 .3160
100.00 1.00 69.23
Hybrid15 (F3+C) 66.67 .5170 77.78 .5907
53.75 .3642 100.00 .9114
67.08
Hybrid16 (F3+D) 78.00 .5501
76.67 .7554 60.00 .4458
100.00 1.00 73.54
Hybrid17 (F4+A) 74.00 .5473
74.44 .5856 41.25 .2683
100.00 .7407 66.46
Hybrid18 (F4+B) 75.33 .5065
72.22 .7063 50.00
.2998 100.00 1.00 68.61
Hybrid19 (F4+C) 65.33 .4982
75.56 .5668 52.50 .3373
100.00 .9114 65.54
Hybrid20 (F4+D) 78.00 .5443
74.44 .7310 58.75 .4295
100.00 1.00 72.61
ACC:
Accuracy; MCC: Matthew’s correlation coefficient.
RBF kernel and g, c and j values were used for developing SVM modules using various features.
SVM module |
g |
C |
J |
Composition |
10 |
10 |
1 |
Dipeptide |
1 |
100 |
1 |
Comp+Length |
1 |
100 |
1 |
Dipep+Length |
5 |
10 |
1 |
SVM module |
g |
C |
J |
Composition |
25 |
1 |
13 |
Dipeptide |
5 |
10 |
1 |
Comp+Length |
25 |
1 |
3 |
Dipep+Length |
5 |
10 |
1 |
SVM module |
g |
C |
J |
Composition |
20 |
100 |
1 |
Dipeptide |
10 |
100 |
1 |
Comp+Length |
25 |
10 |
1 |
Dipep+Length |
1 |
50 |
3 |
SVM module |
g |
C |
J |
Composition |
1 |
1000 |
1 |
Dipeptide |
5 |
10 |
5 |
Comp+Length |
25 |
10 |
5 |
Dipep+Length |
0.1 |
100 |
5 |