Algorithm of MDRIpredDatasets
We obtaines experimentally validated 2294 molecules from NCBI PubChem BioAssay that have been tested against M. Tuberculosis. After carefull examination all compounds containing salt/mixture were removed. Finally, we got 2135 inhibitors/non-inhibitors against different phase of M. Tb., moleculeswere downloaded from NCBI in SDF format. We created two tpye of datsets from above molecules.
1) Datatset-1: This dataset composed of 1206 inhibitors and 929 non-inhibitors against the growing phase of M.tb.
2) Dataset-2: This dataset composed of 1355 inhibitors and 780 non-inhibitors against the latent phase of M.tb.
We calculate descriptors of these molecules using PaDEL software, an open source software that can calculates ~10 different types of binary fingerprints along with 1D, 2D and 3D descriptors. In this work, we have used 4 class of fingerprints (FP).
Feature Selection: For efficient model building, selection of a preferred set of molecular descriptors is an important step. Initially, the descriptors were selected using correlation criteria (>=0.6) with the help of Rapid Miner software. It means no two descriptors have more than 0.60 correlation. In addition to removal of redundant descriptors, we used MCC based and Frequency based algorithms for feature selection
We have used support vector machine (SVM) for building models for predicting inhibitors against different phase of drug tolerant M.tb. SVM is based on the statistical and optimizations theory, handle complex structural features. SVMlight software package has been used to develop SVM based Classification models.The performance of models was optimized using a systematic variation of different SVM parameters and kernels.
Performance Measures: Once a regression model was constructed, goodness about the fit and statistical significance was assessed using the statistical parameters like sensitivity, specificity, accuracy and MCC.