Algorithm of ntEGFR server
In this server there are three QSAR models are developed: This model was developed using experimetally validated 128 anti-EGFR quinazoline derivative inhibitors.In order to provide unbiased evaluation of our models, we randomly divide our dataset into training (80% inhibitors) and validation (20% inhibitors) dataset. In summary, we created three datasets called wild_whole, wild_train and wild_valid which contanis 128, 103 and 25 inhibitors respectively. |
Datasets:The dataset was compiled from the literature. It comprises of three types of datasets: 1. Wild EGFR inhibitors 2. Mutant EGFR inhibitors 3. Hybrid inhibitors. Figure 1: Flow chart showing training and validating datasets used in developing prediction models. Descriptor Calculation:For QSAR model we have calculated descriptors from different softwares like Dragon, V-life, Web-Cdk,, PowerMv, PaDEL Docking based energy descriptors. These descriptors falls in different category like Topological descriptors, molecular descriptors, constitutional descriptors etc. Feature Selection:Feature selection is an important criteria in QSAR modeling. It is generally seen that some descriptors shows negative contribution in model thus is necessery to identify those descriptors and remove them from model. For this purpose we used Weka software cfsubseteval feature selection method that give highly important descriptors. After that we used F-steping approach to further reduce descriptors without any significant change in model performance. Machine learning techniques:For model building we used both light SVM and Weka-based SMOreg and SVMreg statistical approach. Our finding suggest that SMOreg method perform better over SVM light technique. Thus finally we developed a QSAR model on SMOreg techniques. Performance Evaluation:
The performance of constructed model were evaluated using a five fold and LOOCV cross-validation technique. In the LOOCV cross-validation, every time a molecule comes under testing and remaining(n-1) comes under training.The performance of the methods was computed using the following formulas: |