MetaPred Dataset
Main Dataset:All substrates that are metabolize by any of the following isoform CYP 3A4, 2D6, 1A2, 2C9 and 2C19 were obtained from DrugBank2.5 [11,12]. We got total 372 drug molecules where each molecule metabolized by at least one of the five isoforms. In order to create exclusive dataset, we remove all those molecules that are metabolized by more than one isoforms. Finally, we got a dataset of 216 drug molecules, which consists of 111, 47, 29, 20 and 19 molecules metabolized through CYP 3A4, 2D6, 1A2, 2C9 and 2C19 isoforms respectively.
Independent dataset
We created an independent dataset in order to evaluate performance without any bias. Thus we downloaded 146 molecules from DrugBank where each molecule is reported to be metabolizing by one or more isoform used in this study. This independent dataset consists of total 146 molecules, where 92, 74, 41, 47 and 49 molecules have metabolic specificity for CYP 3A4, 2D6, 1A2, 2C9 and 2C19 isoform respectively. Name of the molecule used in main dataset and independent dataset are given in Supplementary datasheet (Table: S1 - S6).
Name of the molecule used in main dataset and independent dataset are given in Supplementary data.