CoReMicrob: Computational Resources for Micorbiome

Major tools for gene and protein annotaion in metagenomic data

Major tool used for Gene and Protein detection

Tool Name Description PubMed ID
MEGAN It is an interactive tool for taxonomic and functional analysis of metagenomic data, integrating results from alignment tools to visualize microbial community composition and functional potential. 37258860
HUMAnN3 A bioinformatics tool for high-resolution functional profiling of microbial communities, quantifying gene families and metabolic pathways from metagenomic or metatranscriptomic data. 37333206
PathoFact A pipeline for the prediction of virulence factors and antimicrobial resistance genes in metagenomic data. 33597026
Prodigal Protein-coding gene prediction for prokaryotic genomes and metagenomes. 20211023
DIAMOND Fast and Easy Taxonomic and Functional Analysis of Short and Long Microbiome Sequences. 33656283
InterPro It is a database that integrates predictive information about proteins' function from several partner resources. It gives an overview of the families that a protein belongs to and the domains and sites it contains. 39565202
EggNOG-mapper A fast, orthology-based tool for functional annotation of genes or proteins, providing diverse functional insights such as GO terms, KEGG pathways, and COG categories. 34597405
BlastKOALA A KEGG-based functional annotation tool that assigns KO identifiers to protein sequences and reconstructs metabolic and biological pathways for genomic and metagenomic analyses. 26585406
GhostKOALA An HMM-based KEGG tool for functional annotation of protein sequences, linking them to KEGG Orthology (KO) identifiers and reconstructing biological pathways. 26585406
MG-RAST A web-based platform for automated taxonomic and functional annotation of metagenomic data, offering tools for comparative analysis and ecological insights. 26656948
MetaErg A standalone tool for genome-centric functional annotation and metabolic pathway reconstruction, tailored for metagenome-assembled and microbial genomes. 31681429
DeepFRI A deep learning-based tool for predicting protein functions by integrating structural and sequence data through graph convolutional neural networks. 37010293
SmashCommunity A modular and scalable tool for taxonomic profiling and functional annotation of metagenomic datasets, with a focus on reconstructing microbial community metabolism. 20959381
Prokka A widely used bioinformatics tool for the rapid annotation of prokaryotic genomes. 33996285
STRING A widely used bioinformatics database and web tool for exploring protein-protein interactions (PPIs). 39558183
MicroScope A comprehensive web-based platform designed for microbial genome annotation and comparative genomics. 23193269
DFAST A web-based tool for the domain-based functional annotation of prokaryotic genomes, providing insights into gene functions, pathways, and protein families. 29106469
PanPhlAn A strain-level metagenomic profiling tool for identifying the gene composition of individual strains in metagenomic samples. 33944776
MetAML A tool for strain-level identification and multi-locus sequence typing (MLST) of microbial communities from metagenomic data. 27651451
MetaRef A machine learning-based tool for taxonomic and functional analysis of metagenomic data, providing predictive modeling for microbial functions and diversity. 24203705
LDA Effect Size (LEfSe) It can be used for functional profiling indirectly. It identifies significant features (such as microbial taxa or functions) that differentiate experimental groups, making it useful in finding biomarkers and functional differences in microbial communities or other omics data. 31953253
ShortBRED A bioinformatics tool that detects and quantifies protein families in metagenomic datasets using unique peptide markers, enabling high-specificity functional profiling of microbial communities. 26682918
DRAM A bioinformatics tool designed to annotate and functionally profile bacterial, archaeal, and viral genomes, including those derived from metagenomic assemblies. 32766782
PICRUSt2 A bioinformatics tool used for predicting the functional potential of microbial communities based on their taxonomic composition, primarily inferred from 16S rRNA gene sequencing data. 36829693
FAPROTAX A bioinformatics tool that predicts microbial functional potential from 16S rRNA taxonomic profiles using a curated database of known functions. 36829693
Tax4Fun2 A functional profiling tool that predicts the functional capabilities of microbial communities from 16S rRNA gene sequencing data, using the KEGG database for functional annotations. 33902725
HMMER A bioinformatics tool used for protein sequence analysis based on Hidden Markov Models (HMMs). 25937944
dbCAN3 The automated carbohydrate-active enzyme and substrate annotation. 37125649
BiG-FAM The Biosynthetic Gene Cluster Family (GCF) database is an online repository for "homologous" groups of biosynthetic gene clusters (BGCs) putatively encoding the production of similar specialized metabolites. 33010170
BiGG Models A knowledgebase of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. 31696234
KEGG Mapper It is a collection of tools for KEGG mapping including popular KEGG pathway mapping, JOIN BRITE operations and MODULE completeness checks. 34423492

Major tool used for Antimicrobial Resistance Identification

Tool Name Description PubMed ID
PARGT An open-source software package designed to predict antimicrobial resistance genes in bacteria. 32694690
MGS2AMR A pipeline that detects antibiotic resistance genes (ARGs) and their possible hosts in metagenomic sequencing data. 37833777
DeepARG A deep learning approach for predicting antibiotic resistance genes in metagenomic data. 29391044
ARGprofiler A pipeline for large-scale analysis of antimicrobial resistance profiles in metagenomic data. 38377397
CARD-RGI Predicts resistomes from protein or nucleotide data based on homology and SNP models. 36263822
AMRPlusPlus An improved software pipeline for classification using high-throughput sequencing. 36382407
Meta-MARC Hierarchical Hidden Markov models enable accurate and diverse detection of antimicrobial resistance sequences. 31396574
PLM-ARG Antibiotic resistance gene identification using a pretrained protein language model. 37995287
HMD-ARG Hierarchical multi-task deep learning for annotating antibiotic resistance genes. 33557954
ARG-SHINE Improve antibiotic resistance class prediction by integrating sequence homology, functional information, and deep convolutional neural networks. 34377977
PPR-META A tool for identifying phages and plasmids from metagenomic fragments using deep learning. 31220250
PlasTrans Identification of the conjugative and mobilizable plasmid fragments in the plasmidome using sequence signatures. 33074084
SurHMM A shortest unique representative approach to detect protein toxins, virulence factors, and antibiotic resistance genes. 33785071
SourceFinder Random Forest tool for identifying antibiotic resistance gene sources. 36377945
ARG-BERT Prediction of antibiotic resistance mechanisms using a protein language model. 39254573
ResFinder Identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes. 35072601

Major Tools for identifying Pathogenic Strains from Metagenomic data

Tool Name Description PubMed ID
PathoScope Rapid and accurate pathogen identification from metagenomic data using Bayesian modeling. 23843222
StrainPhlAn High-resolution strain-level microbial identification from metagenomic data. 28167665
MIDAS Microbial Detection and Strain Typing system for identifying pathogenic strains from metagenomic data. 27803195
SRST2 Short-read sequence typing tool for identifying antimicrobial-resistant pathogenic strains. 25422674
ConStrains Identifies strain-level differences in metagenomic samples using single-copy genes. 26344404
Mykrobe Detects pathogenic bacteria and antibiotic resistance directly from metagenomic data. 32055708
PathSeq Detects pathogenic sequences in metagenomic data using host-subtraction and taxonomic classification. 29982281
KmerResistance Predicts antimicrobial resistance genes in pathogenic strains using k-mer profiling. 27365186
PathoFact A modular pipeline for identifying virulence factors, bacterial toxins, and antimicrobial resistance genes in metagenomic data. 32012943

Major tools used for Virulent Protein Identification from metagenomic data

Tool Name Description PubMed ID
BV-BRC Bacterial and Viral Bioinformatics Resource Center for virulence factor identification. 36350631
VFDB Database of virulence factors for bacterial pathogens with computational tools for detection. 39278950
VirulentPred Machine learning-based tool for predicting virulent proteins from sequence data. 37872744
IslandViewer Identifies genomic islands, including virulence-associated elements in microbial genomes. 28472413
VirulenceFinder Identifies virulence genes in metagenomic sequences using a curated database. 32669379
DeepVF Deep learning-based tool for predicting virulence factors from genomic data. 32599617
PathoFact Identifies pathogenicity and virulence factors in metagenomic and genomic datasets. 33597026
PHI-base Database of experimentally verified pathogenicity, virulence, and effector genes. 39588765

Major tools for Toxin and Allergen identification from metagenomic data

Tool Name Description PubMed ID
T3DB A database of toxic compounds and their effects, including microbial toxins. 25378312
ToxinPred3 Predicts toxic peptides and proteins based on sequence properties. 39038391
AllerTOP Machine learning-based tool for predicting allergenic proteins. 24878803
AlgPred Integrates multiple approaches to predict allergenic proteins from sequence data. 33201237
ProTox-II Predicts toxicity of small molecules and proteins based on multiple computational models. 38647086
AllerCatPro Predicts protein allergenicity using structure and sequence-based models. 35640594
Toxin-Antitoxin Database (TADB) Identifies bacterial toxin-antitoxin systems, including those found in metagenomes. 29106666