ASA³P
Nowadays, bacterial whole genome sequencing has become daily routine. New technologies and dropping costs result in tremendously increasing availability of sequence data. However, comprehensive analysis of such data remains an arduous and time consuming task. In order to keep pace and transform this data into valuable information and new insights we developed ASA³P, an automatic and scalable assembly, annotation and analysis pipeline for closely related bacterial genomes.
TL;DR
To provide a first glimpse into the results of the pipeline, we offer a public login to a static web server for demonstration purposes only at:
https://www.computational.bio.uni-giessen.de/asap
- login: "asap-test"
- pwd: "asap-test"
Introduction
ASA³P is a command line tool creating standard bioinformatics file formats as well as sophisticated HTML5 documents. Its main purpose is the automatic processing of large scale NGS data of multiple closely related bacterial isolates, thus transforming raw reads into assembled and annotated genomes and finally getting as much information on every single genome as possible.
Features
The pipeline conducts all necessary data processing steps, i.e. quality clipping and assembly of sequencing reads as well as scaffolding of resulting contigs and subsequent annotation of genome sequences. Furthermore, ASA³P performs comprehensive genome characterizations and analyses, e.g. detection of antibiotic resistance genes and virulence factors as well as taxonomic classification and MLST subtyping. Per-isolate analyses are finally complemented by first comparative evaluations. Hereby, the pipeline incorporates many best-in-class open source bioinformatics tools and thus minimizes the burden of ever repeating tasks.
Envisaged as an upfront tool ASA³P provides a general overview and comparison of analyzed genomes as well as comprehensive insights along with necessary result files for subsequent deeper analyses. All results are presented via a modern HTML5 based user interface providing interactive visualizations and access to intermediate results as well as aggregated information.
Availability & Versions
ASA³P is available in two versions: a Docker container and an OpenStack compatible cloud version. The lightweight Docker container is well suited for small to medium projects, i.e. sets of bacterial isolates. The setup process is very easy and user friendly; special computer knowledge is not required.
The OpenStack compatible cloud version targets medium to (very) large projects. Therefore, it is able to automatically create, configure and manage its own grid engine based compute cluster. All necessary software and hardware installation and preferences are as automized as possible. Thus, analysis of thousands of bacterial genomes becomes feasible within a single day.
License
ASA³P itself is published and distributed under GPL3 license. In contradiction, some of its dependencies bundled within the ASA³P directory (asap.tar.gz file) are published under different licenses, e.g. GPL2, BSD, MIT, LGPL, etc. A file (README.md) within the ASA³P directory contains a list of all dependencies and licenses.
Please notice that some bundled dependecies are published under a free-for-academic or free-for-non-commercial usage license model. To our best knowledge this is true for at least the following databases:
- CARD: free for academic usage
- PubMLST: proprietary but free to use