Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
A high-quality regional and functional annotation of bacterial genomes plays a crucial role in modern microbiology, as it is the basis for many downstream analyses. Here, we present a new method and CLI / web software tool for the comprehensive but rapid annotation of bacterial genomes, plasmids and metagenome-assembled genomes.
Background
Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes.
Results
Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis.
We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes.
Availability
CLI: https://github.com/oschwengers/bakta
Web: https://bakta.computational.bio
Citation
Schwengers O., Jelonek L., Dieckmann M. A., Beyvers S., Blom J., Goesmann A. (2021)
Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification.
Microbial Genomics
DOI | PubMed