Search Algorithm

Searching is basically based on the difference of composition between the two protein sequences. The composition can be in form of amino acid or dipeptide composition. During searching first composition of query protein(s) is calculated which is then search against the corrosponding composition of database proteins. The inference of relatedness is drawn on basis of Euclidian Distance (ED) between the two sequences. Lesser the ED, more related the two sequences are and vice-versa.

The processing of query protein occur in two ways.

  • Batch Mode: Each protein will be treated as an independent protein or query. The searching will be done for each protein.
  • Mean Mode: All the proteins submitted will be treated as closely related proteins. First a representative protein composition will be derived by averaging individual composition. This average composition inturn will be used during searching
The search result is a list of proteins in ascending order of their ED.

In the search result shown above, four proteins are used as query in BATCH MODE. For each protein top two hits i.e. two proteins with minimum ED are displayed.