# Modification and Annotation in Proteins (MAP) Script

## Introduction

This Python script, MAP (Modification and Annotation in Proteins), developed by Prof. G. P. S. Raghava's group, facilitates the annotation of protein sequences with various modifications and features in a standardized MAP format. This format allows for the inclusion of both protein-level and residue-level annotations.

**Please cite:** MAP; available at https://webs.iiitd.edu.in/raghava/maprepo/

## Requirements

- Python 3.x

## Usage

1.  **Clone or copy the script:** Save the provided Python code as a `.py` file (e.g., `map_script.py`).

2.  **Run the script:** Open a terminal or command prompt, navigate to the directory where you saved the file, and execute the script using:
    ```bash
    python seq_to_map_cli.py
    ```

3.  **Enter protein sequence:** The script will prompt you to paste your protein sequence (without any existing header). You can either paste your sequence or press Enter to use the default sequence "ACDEFGHIK". The script will automatically remove any whitespace or numbers and convert the sequence to uppercase.

4.  **Protein-level annotations:**
    -   You will be asked to enter the organism of the protein. Leave it empty to skip.
    -   You will be asked to enter the function of the protein. Leave it empty to skip.
    -   You will be presented with a list of additional annotation types (Subcellular location, Source database, etc.). You can select a number corresponding to the annotation type you want to add and then enter its value. Enter `0` to skip adding additional protein-level annotations.

5.  **Residue-level annotations (Modifications):**
    -   The script will display your protein sequence with residue numbers.
    -   You will be presented with a menu of modification categories (Post-Translational Modifications, Non-Natural Modifications, etc.).
    -   Select the number corresponding to the desired modification category.
    -   Depending on the category, you might be presented with further options (e.g., specific PTM types like Phosphorylation, Glycosylation). Select the corresponding number.
    -   You will be asked to select the position(s) where the modification occurs. You can enter the residue number (e.g., `3`), or specify 'N-term' for the N-terminus or 'C-term' for the C-terminus.
    -   For certain modifications like mutations, insertions, deletions, and cyclization, you will be prompted for additional details (e.g., the new residue for a mutation, the sequence to insert, the residues to delete, or the positions for disulfide bonds).
    -   You can add multiple residue-level annotations. Select `0` when you are done adding modifications.

6.  **MAP Format Output:** The script will display the protein sequence in the MAP format, including the header with protein-level annotations and the sequence with residue-level modification tags.

7.  **Save to file:** You will be asked if you want to save the output to a file. If you choose 'y', you will be prompted for a filename (default is `sequence.map`).

## MAP Format Explanation

The MAP format consists of a header line starting with `>` followed by the annotated protein sequence.

**Header Annotations:** Protein-level annotations are enclosed in curly braces `{}` with a key-value pair separated by a colon `:`. For example:

-   `{org:Homo sapiens}`: Indicates the organism is *Homo sapiens*.
-   `{func:Signal peptide}`: Indicates the protein has a signal peptide function.
-   `{loc:Cytoplasm}`: Indicates the subcellular location is *Cytoplasm*.
-   `{src:UniProt}`: Indicates the source database is *UniProt*.
-   `{len:100}`: Indicates the sequence length is 100 residues.
-   `{exp:Predicted}`: Indicates the experimental status is *Predicted*.
-   `{bind:LigandX}`: Indicates a binding partner *LigandX*.
-   `{target:ReceptorY}`: Indicates the protein is a target of *ReceptorY*.
-   `{note:Further details}`: Includes additional notes.

Multiple protein-level annotations are separated by spaces within the header line.

**Residue-Level Annotations (Modifications):** Residue-level modifications are also enclosed in curly braces `{}` and are inserted directly into the sequence. The tag typically consists of a modification type code and, optionally, specific details separated by a colon. For example:

-   `A{Phos}`: Indicates Phosphorylation on Alanine. The position is implicitly before the modified residue. For N-terminal and C-terminal modifications, the tag appears at the beginning or end of the sequence, respectively.
-   `C{Glyc}`: Indicates Glycosylation on Cysteine.
-   `D{mut:E}`: Indicates a mutation where Aspartic acid (D) is replaced by Glutamic acid (E).
-   `F{ins:XY}`: Indicates an insertion of the sequence "XY" after Phenylalanine (F).
-   `G{del}`: Indicates a deletion of Glycine (G).
-   `K{Ac}`: Indicates Acetylation on Lysine.
-   `R{Me}`: Indicates Methylation on Arginine.
-   `S{Ub}`: Indicates Ubiquitination on Serine.
-   `T{Sumo}`: Indicates Sumoylation on Threonine.
-   `Y{OH}`: Indicates Hydroxylation on Tyrosine.
-   `C{palm}`: Indicates Palmitoylation on Cysteine.
-   `N{PEG}`: Indicates PEGylation on Asparagine.
-   `W{Fluoro}`: Indicates Fluorination on Tryptophan.
-   `P{PMe}`: Indicates Phos-Methyl modification on Proline.
-   `H{Biotin}`: Indicates Biotinylation on Histidine.
-   `Nle`: Non-natural residue Norleucine.
-   `Q{13C}`: Carbon-13 labeling on Glutamine.
-   `M{Fluorescein}`: Fluorescein dye labeling on Methionine.
-   `I{DNA}`: DNA interaction involving Isoleucine.
-   `N-term{Amid}`: N-terminal Amidation.
-   `C-term{Acet}`: C-terminal Acetylation.
-   `{cyc:N-C}`: Head-to-tail cyclization.
-   `C{cyc:3-7}` and `V{cyc:3-7}`: A single disulfide bond between Cysteine at position 3 and Valine at position 7.
-   `C{cyc:2-5,6-10}` and `C{cyc:2-5,6-10}`: Multiple disulfide bonds.
-   `L{Lipid}`: Lipid conjugation on Leucine.

The script handles the correct placement and formatting of these tags based on your input.