ProPrInt: Help Page

Navigation

Suggested links

Bioinformatics at IMTECH

IMTECH at New Delhi

CSIR in INDIA

Resources

web-servers predicting Protein-Protein Interaction

Databases of Protein-Protein Interaction

Datasets of Protein-Protein Interaction

ProPrInt: Algorthim

Dataset

Dataset of protein- protein interaction are used separately for each three species namely Escherichia coli, Saccharomyces cerevisiae and Helicobacter pylori. In Escherichia coli dataset, Negative dataset is made from combination of periplasmic and cytoplasmic protein.In Saccharomyces cerevisiae, non-interacting pairs are selected randomly from 4233 yeast proteins. In Helicobacter pylori dataset, if Pairs are not specified explicitly then considered as non-interacting. Datasets are described as follows.

SI.No Species Source Dataset
Interacting Non Interacting

1 Escherichia coli Yellaboina et al., 2007 1082 13840

2 Saccharomyces cerevisiae Ben-Hur and Noble, 2005 10517 10517

3 Helicobacter pylori Martin et al., 2005 1458 1458

Approach

Different methods are used to extract the features from the different length of sequences. The methods are Amino Acid composition, Dipeptide composition, Biochemical class tripeptide composition, Pseudo Amino Acid composition and Position specific scoring matrix. Amino Acid composition have 20 features for each protein sequence so totally 40 features for each pair. In this same manner, 800, 232 and 42 features are extracted for Dipeptide composition, Biochemical class tripeptide composition and Pseudo Amino Acid composition

Support Vector Machine

Support Vector Machine is a supervised machine learning methods, used for classifing the protein-protein pair as interacting or non-interacting.SVMlight is an implementation of Support Vector Machines. SVM-light is freely downloadable from this site http://svmlight.joachims.org/. here, we make use of Radial basis function as a kernel function and the results are optimized using the various input parameters. The SVM model was created separately for each species and methods. The training dataset contains both positive data and negative data which are mentioned as 1 and –1 respectively. The model is created from training dataset of known class by using svm_learn and this is used for classifying test dataset using svm_classify.

Performance measure

Various parameters are used for assessing the performance of a method such as specificity, sensitivity, accuracy, Mathew’s correlation coefficient (MCC), positive predictive value (PPV) and negative predictive value (NPV). In order to evaluate this parameter, we calculated true positive (TP), true negative (TN), false positive (FP) and false negative (FN).