# 🧬 MAP: Modification and Annotation in Proteins

MAP is a command-line tool developed to identify and annotate epitope residues in antigenic protein chains. It does so by analyzing the difference in solvent accessibility between the bound and unbound forms of a protein using DSSP (Define Secondary Structure of Proteins). The tool annotates epitope residues using user-defined markers and outputs the results in CSV or FASTA format.

---

## 🌐 Webserver

Developed by **Prof. G. P. S. Raghava's group**
🔗 [MAP Web Repository](https://webs.iiitd.edu.in/raghava/maprepo/)

---

## 📌 Features

-   Downloads and parses PDB structures in `.cif` format automatically.
-   Computes Relative Solvent Accessibility (RSA) using DSSP.
-   Identifies potential epitope residues based on accessibility differences.
-   Annotates sequences with user-defined markers.
-   Supports both CSV and FASTA output formats.

---

## 🛠️ Requirements

Make sure the following dependencies are installed:

```bash
pip install biopython pandas requests tqdm gemmi


You also need DSSP (mkdssp) installed:
sudo apt-get install dssp    # Debian/Ubuntu


Ensure mkdssp is accessible from the terminal.
📂 Input Format
The input must be a CSV file without a header containing:
Column 1: PDB ID
Column 2: Antigen chain ID
Example (input.csv):
6VXX,A
4G80,B


⚙️ Usage
Basic usage from the command line:
python pdb_to_fasta_cli.py -i input.csv -o output_name -f c


Arguments
Flag
Description
-i, --input
Input CSV file with PDB ID and chain ID
-o, --output
Output file base name (default: Output)
-f, --format
Output format: c for CSV, f for FASTA (default: c)
-m, --marker
Marker for epitope residues (default: {Ab:Int})
-org
Organism name for FASTA headers
-d, --description
Function/description for FASTA headers
-n, --name
Protein sample name for FASTA headers

🧪 Example
python pdb_to_fasta_cli.py -i example_input.csv -o epitope_annotated -f f -m "[E]" -org "Homo sapiens" -d "Envelope glycoprotein" -n "HIV_gp120"


📤 Output
If format = c (CSV):
Output columns:
pdb — PDB ID
antigen_chain — Chain ID
Residues — Sequence
Epitopic — Epitope residues marked
Secondary Structure — From DSSP
Annotated Sequence — Final annotated sequence
If format = f (FASTA):
>HIV_gp120_1 {org:Homo sapiens} {func:Envelope glycoprotein}
MVL


