# Modification and Annotation in Proteins (MAP) Script

## Description

This Python script, `MAP_converter.py`, is designed to convert protein sequences into the MAP format, which allows for detailed annotation of protein modifications and other features. It provides a user-friendly interface via Streamlit to input protein sequences and specify various modifications, generating a MAP-formatted output that can be downloaded.

The MAP format is useful for representing protein sequences with rich annotations, including:

* Post-translational modifications (PTMs)
* Non-natural modifications
* Non-natural residues
* Isotopic and fluorescent labeling
* Interaction residues
* D-amino acids
* Cyclization
* N-terminal modifications
* C-terminal modifications
* Mutations, insertions, and deletions
* Conjugation of macromolecules

This script was developed by Prof. G. P. S. Raghava's group.  Please cite the MAP resource appropriately (see citation in the script's output).

## Requirements

* **Python 3.6 or higher**
* **Streamlit** (`pip install streamlit`)
* **Regular expression library (re)** (which is part of the standard Python library)

## Installation

1.  **Clone the repository (if applicable) or download the script:**
    * If you have downloaded the script, ensure it is saved as `MAP_converter.py`.
2.  **Install Streamlit:**
    ```bash
    pip install streamlit
    ```
    (If you haven't already)

## Usage

1.  **Run the script:**
    ```bash
    streamlit run MAP_converter.py
    ```
2.  **Access the web interface:** Streamlit will open a web browser with the application.
3.  **Input your protein sequence:** Paste your protein sequence into the text area.  The sequence should be in single-letter amino acid code and should not include any header information.
4.  **Annotate the sequence:**
    * The script will display the cleaned sequence and the amino acid positions.
    * Use the dropdown menus and text boxes to specify modifications and annotations.
    * Click "Add Modification" to add each modification to the sequence.  The interface will update to show the applied modifications.
    * You can add protein-level annotations such as organism, function, and other details.
5.  **View the MAP output:** The script will display the MAP-formatted sequence, including the header and annotated sequence.
6.  **Download the MAP file:** Click the "Download MAP File" button to save the output as a `.map` file.

## Script Details

The script performs the following key functions:

* **`clean_sequence(sequence)`:** Removes whitespace and numbers from the input sequence and converts it to uppercase.
* **`get_modification_details(mod_type, seq_num)`:** Provides a Streamlit interface for selecting specific modification details based on the chosen modification type.  Handles special cases for disulfide bonds in cyclization.
* **`create_map_header(seq_num)`:** Creates a MAP format header with protein-level annotations, allowing users to input organism, function, and other details.
* **`annotate_residues(sequence, seq_num)`:** The core function for handling residue-level annotations.  It allows users to add modifications, and it applies these modifications to the sequence string. It also displays current modifications and provides a button to clear all modifications.
* **`main()`:** The main function that sets up the Streamlit interface, takes user input, calls the annotation functions, and displays the MAP output.

## MAP Format Overview

The MAP format uses curly braces `{}` to enclose modification and annotation tags within the protein sequence.

* **Example:** `AC{Phos}DEFGHIK`  indicates a phosphorylation modification at the third amino acid (D).

The header line in the MAP format starts with ">seq" followed by the sequence number and protein-level annotations within curly braces.

* **Example:** `>seq1 {org:Homo sapiens} {func:Signal peptide} ACDEFGHIK`

##  Important Notes

* The script uses Streamlit's session state to store modification information, allowing modifications to persist between interface updates.
* Error handling is included for invalid input formats, such as incorrect disulfide bond positions.
* The script provides a user-friendly way to define a wide range of protein modifications and annotations.
* The output MAP file can be used with other tools or databases that support the MAP format.
