PhD project of Patrick Blumenkamp
The yearly increasing citations of DESeq2, edgeR, and limma (an increase of 535 % from 2015 to 2018) show that differential gene expression (DGE) analyses are still on an emerging path. The vast amount of data generated by current sequencing instruments underpins the need for automated and reproducible analysis pipelines.
Thus, we develop a two-component software for analyzing and visualizing RNA-Seq data focusing on DGE analyses. The first part is a modularized Snakemake pipeline generator consisting of quality control, preprocessing, mapping, and in-depth analysis modules, called Curare. The pipelines are built for high-throughput analyses and can be executed on local machines as well as on high-performance compute clusters. Each pipeline is entirely reproducible, and the existing collection of modules, which are customizable and extendable, increases the flexibility of the pipeline generation. The second component is a tool for visualizing DGE results. With the Gene Expression Visualizer (GenExVis), DGE results can be interactively analyzed, and numerous charts can be created. All charts can be saved in common image file formats for usage in presentations and publications. Both components combined create an environment that supports the full process of data analysis from the initial handling of RNA-seq raw data to the final DGE analyses and result visualization.