CSIR Co mposition based P rotein Id entification Imtech
H o m e S e a r c h A l g o r i t h m H e l p T e a m C o n t a c t
HELP
  1. Name: This is optional. It provides a name to the job submitted to COPid web-server.

  2. E-mail Address: All the jobs submitted to the server are stacked in a 'queue'. So, sometimes computing the output can take time. To solve this problem, e-mail facility is provided. User can provide a valid e-mail address to which an e-mail will be sent to intimate the complition of job.

  3. FASTA Format: FASTA format is the most commonly used format to display both nucleotide and amino acid sequences. It contains two parts: (a) Header- begins with '>' and contain the name, accession number and origin of sequence and (b) Body- contains the sequence in standard one-letter symbol.

    >Sequence 1 ID Field
    MPPSVSRA.......
    LPGFLADE.......
    KGDTHTly.......
    

Download user manual for COPid


Help for search proteins of similar composition

Search Criterion: Whole Protein Composition

It searches proteins of similar composition in standard databases (Swissprot, PDB). Further searching can be done in either 'batch mode' or 'mean mode'. In batch mode if more than one sequence is given then the server will take one sequence at a time and search for specified number of compositionally similar proteins. In mean mode, first mean composition is calculated by averaging over all the composition(s) which is later used as query sequence. It shoud be noted that the number of hits must not exceed the total number of sequences in the database. In this case the results may not be computed and the value for number of hits must be an integer, in case of a float the absolute value will be taken. One can also upload his/her own Database for searching Whole Protein Composition.

Search Criterion: N-terminal Composition & C-terminal Composition

All options are similar to whole protein composition except searching is done on the basis of composition of specified number of N or C terminal amino-acids. The results for N terminal & C terminal can also be computed with Database uploaded by the user.


Download user manual for COPid

Help for Composition


Given amino acid sequence(s) this program will calculate the composition of amino acids or dipeptides or particular chemical properties of amino acids.

Calculate composition of Whole protein

During calculation whole protein length will be taken into consideration.

Calculate composition of C terminal or N terminal of protein

During calculation only number of amino acids specified at option 'Terminal Length' will be taken into consideration.


Under the heading of Physico-chemical property composition followings can be calculated:

  1. Molecular weight of protein $
  2. Number of amino acids in the protein sequnece $
  3. % Composition of charged residues (DEKHR)
  4. % Composition of aliphatic residues (ILV)
  5. % Composition of Aromatic residues (FHWY)
  6. % Composition of Polar residues (DERKQN)
  7. % Composition of Neutral residues (AGHPSTY)
  8. % Composition of Hydrophobic residues (CVLIMFW)
  9. % composition of Positive charged residues (HKR)
  10. % Composition of Negative charged residues (DE)
  11. % Composition of tiny residues (ACDGST)*
  12. % Composition of Small residues (EHILKMNPQV)* and
  13. % Composition of Large residues (FRWY)*.
 $Only for whole protein
*Chothia, Nature 254, 304 (1975)

Download user manual for COPid

Help for Analysis

Phylogenetic Tree

Input for this form is amino acid sequences of proteins. Each sequence is taken and on the basis of euclidian distance between the composition of sequences a distance matrix is calculated. Two type of matrices can be calculated (a) for OC and (b) for Phylip. These matrices can be directly used in 'Web-phylip' and OC for deriving the tree on basis of amino acid or dipeptide composition.

Compare between two group of sequences

Amino acid or dipeptide composition of two sequences can be compared in this section of COPid server. If comparision mode is selected for amino acid composition then average composition is also displayed in bar plot. But for dipeptide composition it will be displayed in a simple tabular form.

Creation of patterns for different softwares

In this section of COPid user can generate pattern file(s) of amino acid or di-peptide composition. This option requires two group of sequences. One group will be treated as positive example while other group as negative example. For 'n' fold cross validation with SVM, user will get 'n' training and testing file(s). In case of SNNS validation patterns will also be created in addition of training and testing files. But for Timbl, only two files will be created as Timble can itself generate the pattern files for training and testing.

Download user manual for COPid