SVM based classification of amine type of GPCR's

Help

Sequence Submission

Sequence Name:-This is an optional field.The sequence name may have letters and number with the "-" or "_". All other character are non-permissible.The field is assigned a default name "GPCR". The sequence name is just used for only your information. It may be a problem with #,$,@ or an empty space within the name of the sequence, which is not allowed for reasons of security.

Input Sequence:-The server provides two options for submitting the query sequence. The user can paste plain sequence in the provided inbox.The server also has the facility for uploading the local sequence files. Amino acid sequences must be entered in the one-letter code.All the non standard characters will be ignored from the sequence.A sample of submission form with lebeled fields is shown below.

Sequence Format:-The server can accept both the formatted or unformatted raw antigenic sequences.The server uses ReadSeq routine to parse the input.The user should choose wether the sequence uploaded or pasted is plain or formatted before running prediction.The results of the prediction will be wrong if the format choosen is wrong.

Prediction Approach

The method allow the prediction on the basis of two different appraoches.

Composition of amino acids:-

A SVM was developed on the basis of composition amino acids of protein. The SVM was provided with a 20 dimensional vector. The amino acid composition is fraction of each amino acid in a protein. The ovaerall accuracy of composition based method for four subcellular locations (Nuclear,Cytoplasm,Mitochondria and Extracellular) is 89.8%. The performence of the method is evalvuated using five fold cross-validation.
Dipeptide composition A SVM was developed on the basis of composition of dipeptides of protein sequence. This will give a fixed pattern length of 400. This representation encompasses the information about amino acid composition along local order of amino acid. The ovaerall accuracy of dipeptide composition based method is 96.4%

Prediction Results

The prediction results are presented in very user friendly format. The results are consist of mainly two parts.

Summary of query sequence:-
This part provides the information about the submitted sequence like the sequence, length of sequence and date of scanning. This part also provides the information about the choosen prediction approach.
Prediction Result:-
This part provides information about the final predicted type of amine receptor. It also provides information about reliability of prediction (in form of reliability index) and expected accuracy.A sample of prediction are shown below

Reliability Index (RI):-

The calculation of reliability index is important to known the reliability of prediction.In this study we have followed the simple statragey of Hua,S. and Sun, Z. (2001) for assigning the reliability index (RI). The RI is assigned on the basis of difference between the highest and second highest output score of SVM's for different types of Amine receptors.A curve is plotted in between the prediction accuracy and reliablity index equal to particular value as shown below.

Reliability index curve on the basis of amino acid composition:-

Reliability index curve on the basis of dipeptide composition:-

The dipeptide composition based method predicted 67.6% of sequences with accuracy of 100% for RI greater than or equal to 5.