GMAP (VER. 2.0)

     Computer  program GMAP has been developed for assisting  the 
biologist  working on synthetic gene design and modular  redesign 
of  natural genes (wild type) with a view to ease the  design  of 
useful ``cassettes'' for future manipulations of the genes.  GMAP 
uses  the  `e-generic' algorithm for  searching  the  restriction 
sites  in DNA sequences. The e-generic algorithm is based on  set 
theory.  The  main function of GMAP is to  search  for  potential 
restriction  sites in protein coded DNA sequences and  to  search 
the restriction sites in nob-ambiguous DNA sequences which can be 
introduced  into  the  sequence by one or  two  three  mutations, 
without  altering the amino acid sequence (ie. silent  mutation). 
Moreover,  it has the additional option  whereby  translationally 
non-silent R.E. sites that can be generated by limited  mismatch
ing of bases can also be mapped.




Files contained in this package:

FILES DESCRIPTION
GMAP.EXE EXECUTIVE VERSION OF GMAP PROGRAM
CODON.DOC DOCUMENT FILE FOR CODON-USAGE
RNASE.AMN AMINO ACID SEQUENCE FILE OF RIBONUCLEASE A
SK.DNA DNA SEQUENCE OF STREPTOKINASE
REST.RES RESTRICTION ENZYME AND THEIR RECOGNITION SEQUENCE
RESTC.COM COMMERCIAL AVAILABLE RESTRICTION ENZYME AND THEIR RECOGNITION SEQUENCE
ECOLI.COD CODON USAGE TABLE CREATED FROM ECOLI USING MOST FREQUENT CODON
ECOLI1.COD CODON USAGE TABLE CREATED FROM ECOLI USING PARTIAL AMBIGUOUS CODONS
gmap.tar All files in GMAP package
gmap.tar uuencoded gmap.tar file


GMAP: An Introduction

GMAP is a multi-purpose computer program that aids in the de novo design of synthetic genes as well as the cassette mutagene sis of natural genes by predicting potential restriction enzyme (R.E.) sites in the target DNA sequences. Specifically, it car ries out the following tasks.

i) Mapping the potential restriction endonuclease (R.E.) sites in non-ambiguous DNA sequence, such as that of natural genes, that can be introduced in the DNA sequence with or without alter ing the amino acid sequence i.e. through non-silent or silent mutations;

ii) predicting the number and type of mutations required to introduce unique R.E. sites in the non-ambiguous DNA sequences after a limited number (1, 2 or 3 bp per R.E. site) of transla tionally silent/non-silent mutations;

iii) searching all R.E. sites in ambiguous DNA sequence obtained by reverse translation of a given amino acid sequence;



iv) searching R.E. sites in DNA sequence obtained from reverse translation of amino acid sequence employing user-defined codon usage.

Finding translationally silent R.E. sites in DNA sequences has become particularly important for biologists, especially those dedicated to the investigation of protein structure/func tion relationships. The ability to predict potential R.E. sites that are resident in an ambiguous DNA sequence, such as those obtained by reverse translation of protein amino acid sequences, allows one to construct synthetic genes with appropriately placed sites for cutting and joining DNA segments; similarly, the abili ty to introduce translationally silent R.E. sites by limited mutagenesis into a non-ambiguous DNA sequence (eg., the open- reading-frames of natural genes) or in a translationally non- silent manner elsewhere in genes (such as promoters, splice junctions and other control elements that are not normally ex pressed into proteins) permits the modular redesign of genes for cassette' mutagenesis. A pertinent example of the latter type of application is when enhancing the expression of whole genes by cassette mutagenesis wherein one desires to cut just outside of a coding region in order to fuse it to a stronger promoter.

The program GMAP is fully menu driven. The option `Input amino acid sequence allows the user to input the amino acid sequence (in single or three letter code) using keyboard or text file, and also allows one to create and update the amino acid sequence file. The sequence data obtained from PIR or NBRF can also be directly used to create the input amino acid sequence file. The option `Input DNA sequence file allows one to create and update the DNA sequence file. Nucleotides can be inputted using only NC-IUB designated symbols (cf. Eur. J. Biochem. 1985, 150:1-5), and other symbols will be rejected. The data can being inputted using keyboard or from text (or ASCII) file, so that the sequence data extracted from GenBank or EMBL can be directly used for creating a DNA sequence file. This option also allows one to convert amino acid sequence into DNA sequence by using a user- defined codon preference table. The option `Input restriction en zyme sequence' allows the user to create and update the restric tion enzyme data file. The prototype restriction endonuclease recognition sequences of type II enzymes are already stored in a file REST.RES (cf. R.R. Roberts and D. Macelis; Nucleic Acids Res. 1993, 21:3125). The `Input Codon Usage Table' allows one to create and update the codon preference table. A file containing the codons preferred by E. coli is included with the program (cf. Wada et. al., Nucleic Acids Res. 1992, 20:2111).

The `Search R.E. sites in amino acid sequence' option allows the user to i) search for all the R.E. sites in fully ambiguous DNA sequence obtained from reverse translation of amino acid se quence; ii) search the sites for a specific restriction enzyme in reverse translated ambiguous DNA sequence iii) reverse translated a given amino acid sequence into fully or partially ambiguous DNA sequence or into completely non-ambiguous DNA sequence using user-defined codon preference iv) search all R.E. sites in par tial (or non-ambiguous) DNA sequence obtained from reverse trans lation of amino acid sequence employing user defined codon pref erence table v) search the sites for user-specified enzyme in partially ambiguous or completely non-ambiguous DNA sequence obtained from reverse translation of amino acid sequence with user-defined codon usage.

The `Search R.E. sites in DNA sequences' option allows the user to i) search all the potential R.E. sites which can be introduced in DNA sequence by limited site-directed silent/non- silent mutagenesis and the number of mutations required to intro duce a site ii) search the potential sites for a specific re striction enzyme, which can be introduced in DNA sequence by site-directed silent/non-silent mutagenesis, and the number of mutations required to introduce a site iii) translate the DNA sequence into amino acid sequence iv) search the preexisting sites of all R.E.'s in DNA sequence , and v) search exiting sites of a specific R.E. in the DNA sequence.



The `Output DNA/Amino acid/R.E./Codon usage table' option allows the display (or printout or save in file) of the amino acid sequence, DNA sequence, restriction enzyme data and codon preference usage table. Besides the main options and sub options there are other available options that allow the user to output the results in the desired format.



LICENCE:

This program remains the copyright property of the Institute of Microbial Technology, Chandigarh, INDIA an institution of the CSIR, Govt of India. This program may be freely used by anybody subject to the following conditions:

1. The authors nor the Institute of Microbial Technology Chandigarh assume any responsibility for any losses or damage that may be caused by the use or misuse of the accompanying software.

2. The authors nor the Institute of Microbial Technology Chandigarh give any warranty with regards to the software being able to function on any computer.

3. The accompanying software may not be copied nor distributed with any modifications, and this document file MUST be included with all copies

4. No fee may be charged for the copying and/or distribution of the accompanying software.

5. Users must agree to accept any risk as a condition of the free use of the accompanying software.

Any suggestion, bug report will be greatly appreciated. Please send them to:



                 G P S Raghava, Scientist
                 Computer Center
                 Institute of Microbial Technology,
                 Sector 39A, Chandigarh 160 014,
                 India.
                 Email address: raghava@imtech.ernet.in


                          or


                 Girish Sahni, Scientist
                 Section of Molecular Biology
                 Institute of Microbial Technology,
                 Sector 39A, Chandigarh 160 014,
                 India.
                 Email address: girish@imtech.ernet.in