Prediction of neurotoxins based on their function and source

Sudipto Saha and G. P. S. Raghava*

Institute of Microbial Technology, Okhla Phase 3, New Delhi, India

Running Tite: Neurotoxins prediction

Address correspondence to: Dr. G. P. S. Raghava, Professor, Department of Computational Biology Institute of Microbial Technology  Okhla Phase 3, New Delhi, INDIA, Phone: +91-11-26907444; Fax: +91-172-26907444 E-mail: raghava@iiitd.ac.in

 

 

Supplemental data

 

The performance of  ANN for prediction of neurotoxins at different threshold

Feed forward network and recurrent neural network were used for the classification of neurotoxins and non toxins. Different fidden nodes were used with single hidden layer. The performance at of the best module at different thresholds was shown below.

 

Table S1: The performance of FNN on classifying neurotoxin sequences and non-toxin sequences a hidden node 35

 

Thres

Sensitivity

Specificity

Accuracy

PPV

MCC

1.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.9000

0.0000

0.0000

0.0000

0.0000

0.0000

0.8000

0.0018

0.2000

0.1013

0.2000

0.0133

0.7000

0.0246

0.7983

0.4131

0.7600

0.0859

0.6000

0.5193

0.8000

0.6603

0.8882

0.4059

0.5500

0.7035

0.7948

0.7493

0.8886

0.5346

0.5000

0.8965

0.7878

0.8419

0.8839

0.6893

0.4500

0.9614

0.6783

0.8192

0.7907

0.6450

0.4000

0.9912

0.1896

0.5886

0.5500

0.2730

0.3000

0.9912

0.0087

0.4978

0.4978

-0.0004

0.2000

0.9912

0.0087

0.4978

0.4978

-0.0004

0.1000

0.9912

0.0087

0.4978

0.4978

-0.0004

 

 

 

Table S2: The performance of RNN on classifying neurotoxin sequences and non-toxin sequences a hidden node  60

 

Thresold

Sensitivity

Specificity

Accuracy

PPV

MCC

1.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.9000

0.0000

0.0000

0.0000

0.0000

0.0000

0.8000

0.0035

0.4000

0.2026

0.4000

0.0266

0.7000

0.1581

0.9913

0.5769

0.9450

0.2690

0.6000

0.5544

0.9913

0.7738

0.9844

0.6074

0.5500

0.7281

0.9913

0.8603

0.9881

0.7464

0.5000

0.8228

0.9809

0.9022

0.9771

0.8144

0.4500

0.8912

0.9635

0.9275

0.9603

0.8572

0.4000

0.9877

0.7548

0.8707

0.8004

0.7633

0.3000

0.9912

0.0087

0.4978

0.4978

-0.0004

0.2000

0.9912

0.0087

0.4978

0.4978

-0.0004

0.1000

0.9912

0.0087

0.4978

0.4978

-0.0004

 

 

The performance PSI-BLAST for prediction of neurotoxins at different E-values.

 

Table S3: Results of PSI-BLAST on various E-values on neurotoxins dataset using five fold cross-validation.

 

 

E-value

Hits(570)

False positive (570)

1.0

390 (68.42%)

42

10-1

391(68.60%)

32

10-2

391(68.60%)

28

10-3

387(67.89%)

23

10-6

343(60.18%)

19

10-9

281(49.30%)

13

 

 

 

 

The performance MEME/MAST for prediction of neurotoxins, classification of neurotoxins based on source, function and sub-classification of ion channels blockers.

The performances of MAST at different E-values were performed on MEME matrix formed on neurotoxins sequences based on five-fold validation. Five MEME matrices have been created corresponding to five learning sets, one matrix for one learning set (four sets).  Then each matrix was used in as input file for searching motifs in remaining one set (testing set) using program MAST.

 

Table S4: The MEME/MAST result of classification of neurotoxins and non toxins at different E-value

 

E-value

Neurotoxin (570)

Non-toxin(570)

0.0001

165 (28.95%)

4 (0.7%)

0.001

174(29.82%)

4 (0.7%)

0.01

184 (32.28%)

4 (0.7%)

.1

205(35.96%)

5(0.87%)

1

246 (43.16%)

13(2.28%)

10

293(51.40%)

64(11.22%)

20

311(54.56%)

99(17.36%)

50

353(61.92%)

196 (34.38%)

100

395(69.29%)

311 (54.56%)

 

 

T S5.  The performance of various approaches used in prediction of neurotoxins and nontoxins

 

 

Approach

Sensitivity

Specificity

PPV

Accuracy

MCC

Composition (C)

96.32%

97.22%

97.72%

97.72%

0.9416

Dipeptide

93.68%

98.42%

98.41%

96.05%

0.9247

Meme/mast( E-value 0.1)

35.96%

0.87%

 

 

 

Meme/mast( E-value 0.01)

32.28%

0.7%

 

 

 

Meme/mast( E-value 0.001)

29.82%

0.7%

 

 

 

Meme/mast( E-value 0.0001)

28.94%

0.7%

 

 

 

FNN (0.5 threshold)

89.65%

78.78%

88.39%

84.19%

0.6890

RNN (0.45 threshold)

89.12%

96.35%

96.03%

92.75%

.8572

(C )+meme/mast (E-value .1)

96.84%

96.84%

96.84%

96.84%

.9368

(C )+meme/mast (E-value .01)

96.67%

97.02%

97.00%

96.84%

.9368

(C )+meme/mast (E-value .001)

96.49%

97.02%

97.00%

96.75%

.9351

(C )+meme/mast (E-value .0001)

96.49%

97.02%

97.00%

96.75%

.9351

Comp.+length

97.54%

97.19%

97.25%

97.37%

0.9485

Dipep +length

96.67%

95.09%

95.28%

95.88%

0.9195

 

 

Table S6: The MEME/MAST result of classification of neurotoxins coming from different source at different E-value. In column 1, the value within bracket is the total number of sequences, where as in 2-6 columns, the value within bracket is the false positive number.

 

 

Source

Ev .0001

Ev .001

Ev .01

Ev .1

Ev 1

Ev 10

Ev 50

Arthropoda (310)

131(1)

141(1)

148(6)

166(15)

190(33)

245(80)

273(159)

Bacteria(10)

 

 

 

 

 

 

 

Chordata(135)

104(0)

105(0)

107(0)

109(1)

112(9)

113(34)

124(215)

Cnidaria(20)

12(0)

12 (0)

12(1)

12(1)

12(7)

13(52)

14(136)

Mollusca(95)

42(0)

42(0)

43(0)

47(3)

52(22)

58(65)

68(177)

Total (570)

289(1)

300(1)

310(7)

334(20)

366(71)

429(231)

479(528)

 

 

Table S7: The MEME/MAST result of classification of neurotoxins based on target of action at different E-value. In column 1, the value within bracket is the total number of sequences, where as in 2-6 columns, the value within bracket is the false positive number.

 

Function

Ev .0001

Ev.001

Ev.01

Ev .1

Ev 1

Ev 10

Ev 50

BIC(330)

66(0)

70(0)

78(1)

89(1)

105(13)

145(25)

205(25)

BAR(85)

60(2)

61(2)

64(3)

67(4)

67(9)

67(35)

73(109)

IAR1(5)

5(0)

5(0)

5(2)

5(7)

5(26)

5(95)

5(282)

IAR2(20)

17(1)

18(1)

19(1)

19(3)

19(14)

20(46)

20(156)

FAR(10)

6(2)

6(8)

6(22)

6(44)

6(79)

6(196)

7(321)

Total(450)

154(5)

160(11)

172(29)

186(59)

202(131)

243(397)

310(893)

 

 

 

 

 

 

 

 

 

 

 

 

 

Table S8: The MEME/MAST result of classification ion channels blockers at different E-value. In column 1, the value within bracket is the total number of sequences, where as in 2-6 columns, the value within bracket is the false positive number.

 

Blockers of ion channels

Ev .0001

Ev .001

Ev .01

Ev .1

Ev 1

Ev 10

Ev 50

Calcium (80)

19(4)

21(4)

24(4)

28(5)

32(9)

46(36)

65(109)

Chlorine(5)

5(0)

5(1)

5(7)

5(13)

5(41)

5(67)

5(189)

Potassium(90)

38(1)

42(1)

47(4)

53(11)

62(28)

71(85)

82(168)

Sodium(150)

78(4)

81(5)

83(7)

84(11)

94(19)

100(51)

113(115)

Total(325)

140(9)

149(11)

159(22)

170(40)

193(97)

222(239)

265

 

 

The performance various approaches used  classification of neurotoxins based on source, function and sub-classification of ion channels blockers. The Hybrid approaches used at different E-value is shown.

 

Table S9.The performance of various approaches used in classification of neurotoxins based on source.

________________________________________________________________________

Approach                 Eubacteria                  Cnidaria                      Mollusca            Arthropoda                   Chordata                 Overall 

                              ACC       MCC        ACC        MCC          ACC       MCC       ACC        M CC        ACC      MCC                ACC  

 

Composition (A)  100       .9134           30.0 0       .4441         63.16       .5804        86.45     .6426         78.52        .7495            78.94      

Dipeptide (B)       100       .8671           50.00        .6776         76.84       .7426        92.58     .8069         90.37        .8911            88.07      

A + length ( C )     90        .8656           50.00        .5893         70.53       .7433       95.16      .7385         76.30        .7854            84.91

B + length (D)       90        .8656           55.00        .6169         83.16       .7998       94.19      .8075         80.74        .8079            87.72  

PSI-BLAST (E)     90                            70.00                          54.74                       67.74                       90.37                             71.40

 

Meme/Mast (F1)    -                               60.00                         49.47                       53.54                       80.74                              58.59 

Meme/Mast(F2)                                     60.00                         45.26                       47.74                       79.26                             54.38

Meme/Mast(F3)                                     60.00                         44.21                       45.48                       77.78                             52.63

Meme/Mast(F4)                                     60.00                         44.21                       42.26                        77.04                             50.70    

 

Hybrid1 (E+A)    100        .8421           70.00        .7757         78.95       .7527       92.58      .8409         97.04        .9518            90.70

Hybrid2 (E+B)    100        .8128           70.00        .8321         81.05       .7726        93.87     .8692         96.30        .9421            91.58

Hybrid3 (E+C)      90        .8182           70.00        .8025         80.00       .8361        97.74     .8611         91.11        .9314            92.10

Hybrid4 (E+D)      90        .8182           70.00        .8026         85.26       .8176        95.48     .8762         90.37        .8970            91.58 

 

Hybrid5 (F1+A)   100        .8421          65.00         .7430         77.89       .7145       90.97      .8128         95.56         .9418           89.12

Hybrid6 (F1+B)   100        .7863          65.00         .8011         82.11       .7799       93.23      .8515         93.33        .9126            90.53

Hybrid7 (F1+C)     90        .8182          60.00         .7373         77.89       .7968       96.13      .8053         84.44         .8713           88.95

Hybrid8 (F1+D     90         .8182          60.00         .7373         85.26       .8068       94.19      .8269         83.70        .8365            88.94

 

Hybrid9 (F2+A)   100       .8421            65.00        .7430        74.74        .6918       90.65       .7985       95.56         .9371           88.43 

Hybrid10 (F2+B) 100       .7863            65.00        .8011        80.00        .7600       92.90       .8408       93.33         .9126           88.99

Hybrid11 (F2+C)   90       .8182            60.00        .7373        77.89        .7968       96.13       .8053       84.44         .8713           88.95  

Hybrid12 (F2+D)   90       .8182            60.00        .7373        84.21        .7996       94.19       .8234       83.70         .8365           88.77  

 

Hybrid13 (F3+A) 100       .8421            65.00        .7430        73.68        .6842       90.65        .7877       94.07         .9270           87.89

Hybrid14 (F3+B)  100      .7863            65.00         .8011        78.95        .7474      92.58         .8338      93.33          .9126          89.65 

Hybrid15 (F3+C)    90      .8182            60.00         .7373        76.84        .7897      96.13         .7953      82.96          .8612          88.42      

Hybrid16 (F3+D)    90      .8182            60.00         .7373        83.16        .7925      94.19         .8165      82.96          .8313          88.42 

 

Hybrid17 (F4+A)  100      .8421            65.00         .7430        73.68        .6842      90.32         .7842       94.07         .9223          87.72   

Hybrid18 (F4+B)  100      .7863            65.00         .8011         78.95       .7474      92.58         .8338        93.33         .9126         89.65

Hybrid19 (F4+C)   90       .8182            60.00          .7373         76.84      .7897       95.81        .7914        82.96         .8561         88.25         

Hybrid20 (F4+D)   90       .8182            60.00          .7373         83.16      .7925       94.19        .8165        82.96         .8313         88.42

 

 

 

 

 

ACC: Accuracy; MCC: Matthew’s correlation coefficient.

F1 = E value 0.1; F2 = E value 0.01; F3 = E value 0.001; F4 = E value 0.0001

 

 

Table S10.The performance of various approaches used on classification of neurotoxins based on function.

________________________________________________________________________

Approach                         BIC                         BAR                          IAR1                     IAR2                          FAR                   Overall 

                                 ACC       MCC        ACC        MCC          ACC       MCC        ACC        M CC        ACC      MCC             ACC  

 

Composition(A)     87.58      .6255         75.29         .6902           100.00       1.000     65.00       .5860         30.00      .2209        83.11 

Dipeptide (B)         94.24      .7664         85.88         .8318           100.00       .9406     90.00       .9199         30.00      .3406        91.10

A+length ( C )        94.24      .6857         69.41         .6737          100.00       1.000      85.00       .8752         50.00      .6219        88.22

B+length(D)           95.45      .8805         97.65         .9660          100.00       1.000     90.00        .8035         60.00      .7625        94.88   

PSI-BLAST (E)      52.73                        84.71                            100.00                     100.00                        70.00                       61.78

 

Meme/Mast (F1)    26.97                        78.82                             100.00                     95.00                          60.00                       41.33

Meme/Mast(F2)     23.64                        75.29                             100.00                     95.00                          60.00                      38.22

Meme/Mast(F3)     21.21                        71.64                             100.00                     90.00                          60.00                       35.53

Meme/Mast(F4)     20.00                        70.59                             100.00                     85.00                          60.00                      34.22

 

 

Hybrid1 (E+A)       90.61      .7843         85.88         .7715          100.00        1.00      100.00      .8536         70.00      .5598        89.83

Hybrid2 (E+B)       93.33      .8130         88.24         .8232          100.00        .9118    100.00      .9512         70.00      .6300        92.22 

Hybrid3 (E+C)       95.45      .8418         85.88         .8259          100.00        1.00      100.00      .9292         80.00      .8399        93.55

Hybrid4 (E+D)       94.85      .8869         97.65         .9501          100.00        1.00      100.00      .7666         70.00      .8338        95.11

 

Hybrid5 (F1+A)      90.91      .7631         82.35         .7640          100.00        1.00        95.00       .8247         80.00      .6027       89.33

Hybrid6 (F1+B)      94.24       .8277        88.24          .8420         100.00         .9118    100.00      .9748          80.00     .6940        93.11

Hybrid7 (F1+C)      95.15      .8058         81.18         .7870          100.00        1.00       100.00      .9512          80.00     .8399        92.44

Hybrid8 (F1+D)      95.45      .9029         98.82         .9714          100.00        1.00       100.00      .7795          80.00     .8924        96.00   

 

Hybrid9 (F2+A)     90.30       .7477         81.18         .7497          100.00         1.00       95.00       .8082          80.00     .6027        88.67   

Hybrid10 (F2+B)   94.24       .8215         87.06         .8340          100.00         .9118    100.00       .9748          80.00     .6940       92.89  

Hybrid11 (F2+C)   94.85       .7944         80.00         .7723          100.00         1.00       100.00      .9512          80.00      .8399      92.00  

Hybrid12 (F2+D)   95.45       .9029         98.82         .9714          100.00         1.00       100.00      .7795          80.00      .8924      96.00 

 

Hybrid13 (F3+A)   90.00       .7303         80.00         .7354           100.00        1.00        90.00        .7785          80.00      .6027      88.00 

Hybrid14 (F3+B)   94.24       .8215         87.06         .8340           100.00         .9118     100.00       .9748         80.00      .6940      92.89 

Hybrid15 (F3+C)   94.24       .7716         78.82         .7513           100.00         1.00         95.00       .9236         80.00      .8399      91.12

Hybrid16 (F3+D)   95.45       .8968         98.82         .9714           100.00         1.00         95.00       .7503         80.00      .8924      95.77 

 

Hybrid17 (F4+A)   90.00       .7172        78.82          .7270           100.00         1.00        85.00        .7480         80.00       .6027      87.55

Hybrid18 (F4+B)   94.24       .8153        87.06          .8340           100.00         .9118      95.00       .9477          80.00       .6940      92.67 

Hybrid19 (F4+C)   94.24       .7590        77.65          .7429           100.00          1.00       90.00        .8953          80.00      .8399      90.67

Hybrid20 (F4+D)   95.45      .8907         98.82          .9714           100.00          1.00       90.00        .7205          80.00      .8924      95.55 

 

                  

 

BIC= Blocks ion channels;BAR= Blocks acetylcholine receptors;  IAR1 =Inhibits Ach release by metalloproteolytic activity; IAR2= Inhibits Ach release by phospholipase A2 activity; FAR= Facilitates acetylcholine release;ACC: Accuracy; MCC: Matthew’s correlation coefficient.

F1 = E value 0.1; F2 = E value 0.01; F3 = E value 0.001; F4 = E value 0.0001

 

 

Table S11.The performance of various modules including SVM modules on classification of  ion channel inhibitors.

________________________________________________________________________

Approach                             Sodium                     Potassium                       Calcium                          Chloride                              Overall 

                                       ACC       MCC           ACC        MCC              ACC       MCC               ACC        MCC                       ACC  

Composition(A)            72.00       .5369           68.89         .5219            35.00       .1605              100.00      .8379                      62.46

Dipeptide(B)                 75.33       .3635           70.00         .6908            50.00       .3502              100.00     1.000                       68.00

A+length ( C )               63.33       .4782           73.33         .5481            48.75       .3211               60.00       .5403                      62.46

B+length(D)                  78.00       .5508           73.33         .7082            55.00       .4234               60.00       .6000                      70.77

PSI-BLAST (E)             62.67                           62.22                              55.00                              100.00                                     61.23

 

Meme/mast (F1)            56.00                           58.88                              35.00                              100.00                                     52.31

Meme/Mast(F2)             55.33                           52.22                              30.00                              100.00                                    48.92

Meme/Mast(F3)             54.00                           46.67                              26.25                              100.00                                    45.85

Meme/Mast(F4)             52.00                           42.22                              23.75                               100.00                                   43.08

 

Hybrid1 (E+A)              74.00       .6228           73.33         .6248            65.00       .4600              100.00      .7868                      72.00

Hybrid2 (E+B)              73.33       .6038           74.44         .7386            71.25       .4564              100.00      1.000                      73.54

Hybrid3 (E+C)              66.67       .5637           75.56         .5890            63.75       .4285               100.00     .9111                      68.92

Hybrid4 (E+D)              75.33       .6284           78.89         .7498            68.75       .4702              100.00      1.000                      75.08

 

Hybrid5 (F1+A)             74.00       .5719           81.11         .6515            48.75       .3721              100.00      .7407                      70.03

Hybrid6 (F1+B)              75.33       .5479          76.67          .7478           57.50        .3754              100.00     1.00                        71.69   

Hybrid7 (F1+C)              66.67       .5502           80.00         .6200            60.00       .4179              100.00      .91114                   69.23

Hybrid8 (F1+D)              78.00       .5795           77.78         .7637            65.00       .4877              100.00      1.00                       75.08 

 

Hybrid9 (F2+A)             74.00       .5595            76.67        .6096            45.00        .3144              100.00       .7407                    68.00

Hybrid10 (F2+B)           75.33       .5241            74.44        .7310            53.75        .3379              100.00       1.00                      70.15

Hybrid11 (F2+C)           66.67       .5367            78.89        .5999            56.25        .3859              100.00        .9114                   68.00

Hybrid12 (F2+D)           78.00       .5618            76.67        .7554            62.50        .4668              100.00       1.00                      74.15

 

Hybrid13 (F3+A)           74.00       .5473            75.56        .5948            41.25        .2740              100.00       .7407                   66.77 

Hybrid14 (F3+B)           75.33       .5065            73.33        .7225            51.25        .3160              100.00       1.00                     69.23

Hybrid15 (F3+C)           66.67       .5170            77.78        .5907            53.75        .3642              100.00         .9114                 67.08

Hybrid16 (F3+D)           78.00       .5501            76.67        .7554            60.00        .4458              100.00        1.00                    73.54

 

Hybrid17 (F4+A)           74.00       .5473            74.44        .5856             41.25        .2683             100.00         .7407                 66.46

Hybrid18 (F4+B)            75.33      .5065            72.22        .7063             50.00        .2998              100.00        1.00                    68.61

Hybrid19 (F4+C)            65.33      .4982            75.56        .5668             52.50        .3373              100.00         .9114                 65.54  

Hybrid20 (F4+D)            78.00      .5443            74.44        .7310             58.75        .4295              100.00         1.00                   72.61

 

 

 

ACC: Accuracy; MCC: Matthew’s correlation coefficient.

F1 = E value 0.1; F2 = E value 0.01; F3 = E value 0.001; F4 = E value 0.0001

 

 

 

The different parameters used for developing SVM modules

 

RBF kernel and g, c and j values were used for developing SVM modules using various features.

 

SVM module  used for discrimination of neurotoxin and non toxin

SVM module

g

C

J

Composition

10

10

1

Dipeptide

1

100

1

Comp+Length

1

100

1

Dipep+Length

5

10

1

 

 

 

SVM module  used for discrimination of neurotoxin based on source

SVM module

g

C

J

Composition

25

1

13

Dipeptide

5

10

1

Comp+Length

25

1

3

Dipep+Length

5

10

1

 

 

SVM module  used for discrimination of neurotoxin based on function

SVM module

g

C

J

Composition

20

100

1

Dipeptide

10

100

1

Comp+Length

25

10

1

Dipep+Length

1

50

3

 

 

SVM module  used for discrimination of ion channel blockers

SVM module

g

C

J

Composition

1

1000

1

Dipeptide

5

10

5

Comp+Length

25

10

5

Dipep+Length

0.1

100

5