Rapid protein alignment in the cloud: HAMOND combines fast DIAMOND alignments with Hadoop parallelism
The introduction of next generation sequencing has caused a steady increase in the amounts of data that have to be processed in modern life science. Sequence alignment plays a key role in the analysis of sequencing data e.g. within whole genome sequencing or metagenome projects. BLAST is a commonly used alignment tool that was the standard approach for more than two decades, but in the last years faster alternatives have been proposed including RapSearch, GHOSTX, and DIAMOND.
Here we introduce HAMOND, an application that uses Apache Hadoop to parallelize DIAMOND computation in order to scale-out the calculation of alignments. HAMOND is fault tolerant and scalable by utilizing large cloud computing infrastructures like Amazon Web Services. HAMOND has been tested in comparative genomics analyses and showed promising results both in efficiency and accuracy.
Yu J, Blom J, Sczyrba A, Goesmann A (2017)
Rapid protein alignment in the cloud: HAMOND combines fast DIAMOND alignments with Hadoop parallelism.
Journal of biotechnology
DOI | PubMed | EuropePMC