Open thesis topics

Within our group we can offer various topics in the field of applied bioinformatics, high-throughput data analysis, genome and metagenome research as well as postgenomics and systems biology. Below you can find a list of suggested open topics for BSc and MSc theses and student projects. For further details on each topic or alternative projects please contact us.

Exploring the Role of Nasal Microbiota in Neurological Diseases (M.Sc.)

Background
Microorganisms, including those in the human nasal cavity, maintain stability and
functionality. Recent research suggests a potential link between the nasal microbiota and
neurological diseases such as Parkinson’s disease (PD), Alzheimer’s disease (AD), and
multiple sclerosis (MS)(1). However, the nature of this relationship remains unclear due to
a limited number of studies.
While much focus has been on the gut-brain axis, the influence of the nose-brain axis on
the immune system and respiratory homeostasis requires further investigation (2). Some
studies have indicated that altering the nasal microbiota could potentially prevent or treat
neurological diseases, highlighting the need to understand the complex interactions
between the nasal microbiota and the brain. Evidence suggests that the nasal microbiome
may travel through the olfactory pathway to the brain (2, 3). The diversity of bacteria in the
nasal cavity is highly dynamic and can vary depending on age, physiology, and lifestyle.
This project will investigate how nasal microbiota stability impacts the blood-brain barrier
(BBB) and its potential role in the development and progression of neurological diseases.
Our goal is to gain a comprehensive understanding of the nasal microbial community, the
conditions under which it remains stable, and how disruptions in nasal homeostasis might
contribute to neurodegeneration.

Objective
The primary objective of this project is to explore the conditions under which nasal
microbiota stability or instability is associated with neurological diseases, focusing on
potential diagnostic and therapeutic applications.

Methodology
1. Literature Review: Conduct a thorough review of existing studies on the nasal
microbiota and its potential impact on neurological diseases.
2. Data Comparison and Analysis: Compare data gathered from literature on the nasal
microbiota, analyzing differences in composition and diversity, and identifying potential
patterns.
3. Mechanistic Studies: Explore how alterations in the nasal microbiota might influence
the BBB and contribute to the pathology of neurological diseases.
4. Model Creation and Analysis: Develop a model based on literature data to analyze
the stability of the nasal microbiota and its potential role in modulating the risk of
neurological diseases.

Expected Outcome
This project aims to shed light on the role of the nasal microbiota in neurological diseases,
potentially leading to novel diagnostic and therapeutic strategies. By understanding the
dynamics of nasal microbiota stability, we hope to uncover new insights into preventing
and treating neurodegenerative conditions.

Reference
1.García-Jiménez, Beatriz, et al., Computational and Structural Biotechnology Journal 19
(2021): 226-246.
2. Xie, Jin, et al. Pharmacological Research 179 (2022): 106189.
3. Thangaleela, Subramanian, et al., Microorganisms 10.7 (2022): 1405

Contact: Dr. Reihaneh Mostolizadeh

The Landscape of the Oral Microbiome and Its Relationship with Other Body Site
Microbiomes in Humans

Background
This project aims to investigate the complex microbial ecosystem of the human
oral cavity and its interactions with microbiomes at other body sites (6). The oral cavity is a
key interface between the human body and the external environment. As the second
largest microbial community in the human body, the oral microbiota plays a crucial role in
maintaining host health locally and systemically. Recent research suggests significant
interactions between the oral microbiome and microbial communities at other body sites
(1, 2, 3, 4, 5), with evidence of microbial migration contributing to infections and disease.
Under circumcision, microbes migrate from the oral cavity to other body sites, and the link
to infections is still unclear. Understanding these dynamics is critical for advancing human
health research and developing targeted therapies.

Objective
The main objective of this project is to explore how the composition and diversity of the
oral microbiome vary between healthy and diseased states. Additionally, the potential
interactions and correlations between the oral microbiome and microbiomes at other body
sites will be investigated, which will be important for understanding microbial migration and
its role in disease development.

Methods
1. Literature Review: Conduct a comprehensive review of existing research on the oral
microbiome and its interactions with other body site microbiomes in health and disease
contexts.
2. Data Analysis: Analyze data to identify significant correlations between the oral
microbiome and microbiomes at other body sites, focusing on how these relationships
impact health.
3. Categorization: Group findings by intrinsic correlations and microbial migration
patterns to identify links between microbiome shifts and disease.
4. Visualization: Create visual maps using appropriate tools to present key results and
findings in a standardized format.

Expected Outcome
This project aims to enhance understanding of the oral microbiome’s role in disease by
identifying potential oral microbial biomarkers. By characterizing the variations in the oral
microbiome under different health conditions, these biomarkers could be valuable for
disease risk assessment and the development of targeted therapies.

References
1. Jameie, Melika, et al. "The hidden link: how oral and respiratory microbiomes affect multiple
sclerosis." Multiple Sclerosis and Related Disorders (2024): 105742.
2. Liao, Ying, et al. "Microbes translocation from oral cavity to nasopharyngeal carcinoma in
patients." Nature Communications 15.1 (2024): 1645.
3. Xu, Tiansong, et al. "The relationship of oral and other body sites microbiome in human
diseases." Frontiers in Cellular and Infection Microbiology 13 (2023): 1276473.
4. Thangaleela, Subramanian, et al. "Nasal microbiota, olfactory health, neurological disorders and
aging—a review." Microorganisms 10.7 (2022): 1405.
5. Peng, Xian, et al. "Oral microbiota in human systematic diseases." International journal of oral
science 14.1 (2022): 14.
6. Lamont, Richard J., Hyun Koo, and George Hajishengallis. "The oral microbiota: dynamic
communities and host interactions." Nature reviews microbiology (2018).

Contact: Dr. Reihaneh Mostolizadeh

Automated Reconstruction of High-Quality Genome-Scale Models Using Machine
Learning (B.Sc. or M.Sc.)

Background

Genome-scale metabolic models (GEMs) are essential in biological research and
biotechnological development, as they enable the comprehensive analysis of metabolic
networks and fluxes. Reconstructing a high-quality genome-scale model (GEM) involves a
detailed workflow of 96 steps (6).
Despite the standard protocols and operating procedures available for GEM construction,
the process remains time-consuming. This has led to recent efforts aimed at automating
the reconstruction steps. Researchers have developed various protocols that combine
automated steps to streamline the reconstruction and refinement of GEMs.
In recent years, machine learning (ML) has played a significant role in the reconstruction
and analysis of GEMs, enhancing their quality and accuracy (1, 4).

Objective:
This project aims to develop an automated protocol for reconstructing high-quality
genome-scale models using available ML approaches. We have compiled all available
literature focusing on the application of ML in the reconstruction of GEMs. By integrating
these ML-based methods into a cohesive automated procedure, we intend to facilitate the
reconstruction and refinement of GEMs.

Methodology:
1. Literature Review and Compilation: Gather and analyze literature on ML approaches
used in GEM reconstruction.
2. Automation Protocol Development: Combine the identified ML-based steps into an
automated workflow.
3. Comparison and Selection: In the first step, for organism with multiple annotated
genomes, for instance, compare the annotations and select the most comprehensive
one.
4. GEM Reconstruction: Apply the automated protocol to reconstruct the GEM.
5. Refinement Using ML: To refine the reconstructed GEM, employ ML algorithms such
as GapFill, pathway Tool prediction (2), Gene Essentiality (5), EC numbers (3), etc.

Expected Outcome:
This project will result in an automated, ML-based protocol for GEM reconstruction. It will
allow for comparing different ML approaches and improve the efficiency and quality of
GEMs.

Reference:
1. Kim, Yeji, Gi Bae Kim, and Sang Yup Lee. "Machine learning applications in genomescale metabolic modeling." Current Opinion in Systems Biology 25 (2021): 42-49.
2. Dale, Joseph M., Liviu Popescu, and Peter D. Karp. "Machine learning methods for
metabolic pathway prediction." BMC bioinformatics 11 (2010): 1-14.
3. Ryu, Jae Yong, Hyun Uk Kim, and Sang Yup Lee. "Deep learning enables high-quality
and high-throughput prediction of enzyme commission numbers." Proceedings of the
National Academy of Sciences 116.28 (2019): 13996-14001.
4. Zampieri, Guido, et al. "Machine and deep learning meet genome-scale metabolic
modeling." PLoS computational biology 15.7 (2019): e1007084.
5. Hasibi, Ramin, Tom Michoel, and Diego A. Oyarzún. "Integration of graph neural
networks and genome-scale metabolic models for predicting gene essentiality." npj
Systems Biology and Applications 10.1 (2024): 24.
6. Thiele, Ines, and Bernhard Ø. Palsson. "A protocol for generating a high-quality
genome-scale metabolic reconstruction." Nature protocols 5.1 (2010): 93-121.

Contact: Dr. Reihaneh Mostolizadeh

Comparative genome analysis of Streptococcus agalactiae (GBS) from elephants (M.Sc.)

Background

Group B Streptococci are fairly common. In livestock, they are the causative agent of an udder inflamation, most often seen in dairy cows.

In elephants, S. agalactiae is associated with Paronchya.
Under human care, elephants are known to reach a high age. This comes with an age-related decline in their immune system, which can lead usually harmless skin- or foot diseases to become chronic. Gaining a better knowledge about the bacterial infections is a vital foundation for optimized treatments and therapeutic approaches.

In a newer study done by the "Hessische Landeslabor" (Hesse state labratory (LHL)), some S. agalactiae isolates were compared, using microbiological methods and had extensive biochemical profiles created.
Noticable was the high number of isolates, for which the serotypes could not be determined. For this reason some isolates got sequenced, so a full comparative genome analysis could be done, using the latest methods in bioinformatics.

Thesis aims

Implementation of typical bioinformatic analyses (Assembly, mapping, annotation...)
Comparative analysis of GBS Isolates (ABR, pan- and coregenome, virulence factors...)
Closer inspection of Genes for serotyping

Prerequisites

Interested in solving biological/veterenary questions by usage of bioinformatics
Extensive knowledge of the Linux command line
Ability to work independently and methodical

Contact: Linda Fenske

Workflow Design (Nextflow) (M.Sc.)

Background

Analysing (bacterial) sequence data for biological/medical questions means often repeating certain standard processes (QC, Assembly, Annotation etc.)

For better reproduceability and simplification of these processes, flexible pipelines with a wide palette of tools are used. Often Nextflow (of similar workflow tools) is used to enable support for a variety of enviroments or to simplify the installation.

With DSL2, Nextflow recently introduced a significant development of the Nextflow language, which promises a better scalability and modulariziation of pipelines, along with a better design of workflows.

Thesis aims

Revision and updating of an existing workflow for analysing bacerial data
Transmission of the workflow from nf-DSL1 to DSL2
Visualising the results (creating a GUI)

Prerequisites

Extensive knowledge of the Linux command line
Knowledge of Nextflow or motivation to become acquainted with Nextflow
Programming knowledge in Python, Groovy (Nextflow) or similar
Knowledge and interest in visualisation and processing of data

Contact: Linda Fenske

Platon Bioinformatics Tool Enhancement for Faster Plasmid Identification (M.Sc.) - taken

Background

Modern high-throughput sequencing devices enable the rapid determination of sequence data obtained from interacting microbial communities without a prior cultivation step. Hereby, access to genetic information from otherwise unculturable microbiota is easily achieved. (Computational) Interpretation of such data relies on either assignment of raw sequencing reads to corresponding source organisms in order to infer their taxonomic origin or gene-coding content, or, these metagenome datasets can be assembled, thereby recovering longer contiguous DNA stretches of the underlying microbial genomes.

Assembled metagenomic contigs are typically clustered (most often, depending on coverage or nucleotide composition), yielding individual draft or complete genomes of novel bacterial species. In this process, however, contigs of non-chromosomal origin such as plasmids are often overlooked.

Still, the analysis of plasmids is of utmost imoprtance, since they constitute a key mechanism of horizontal gene transfer between microbial hosts. They are known to harbor essential genes that are beneficial or important for microbial fittness or survival under certain environmental conditions (e.g. in the presence of certain antimicrobial agents) or perform metabolic processes that they otherwise wouldn‘t have been able to (e.g. degradation of novel substrates).

Several bioinformatics applications have been developed for the computational identification of plasmid-borne contigs, most typically focusing on the extraction of plasmid contigs from the assemblies of individual draft genomes. Among these tools are Platon (Schwengers et al., 2020), PlasClass (Pellow et al., 2020) and PlasFlow (Krawczyk et al., 2018), of which Platon exhibits excellent performance, but its runtime characteristics currently impede its application to potentially large metagenome assemblies.

Thesis aims

Overhaul of the Platon code base, switching from a contig-centered approach to one based on bulk data processing in order to significantly decrease overall runtime.
Inlining of certain sub-analysis steps such as circularity testing into the python codebase instead of relying on the invocation of external tools: (Pyrodigal, pyHMMER, PyTrimal)
Conditional tool execution: Do not invoke additional tools if preceding steps already exclude a sequence from being a plasmid
Runtime and performance assessment with regard to the original implementation

Requirements

Familiarity with Linux and (modular) python programming (incl. unit testing)
Methodological way of working
Able to work independently

Contact: Oliver Schwengers

Reconstruction and visualization of KEGG metabolic pathways in the EDGAR platform (M.Sc.)

Background

EDGAR is a web-based platform for analyzing microbial data. It is developed by employees of the Bioinformatics and Systems Biology department at JLU Giessen and provides multifaceted methods for investigating genomes.

KEGG ( Kyoto Encyclopedia of Genes and Genomes) provides curated databases and resources for (among other things) the functional annotation and classification of genes. In previous projects, KEGG functional categories for all organisms and their corresponding genes were computed in the EDGAR platform. These are currently displayed directly in two analysis modules, in purely quantitative terms.

MinPath is a program for reconstructing biological/metabolic pathways. It attempts to infer a minimal biological metabolic network by excluding redundant metabolic pathways that can explain the genes found in a given dataset. The above-mentioned KEGG categories will be used as input for this program.

The goal of the project is to develop a comparative analysis module, based on KEGG pathway information, for the EDGAR platform.

Thesis Aims

Parse the available KEGG data in a structured manner and compute KEGG metabolic pathways for all given genomes in EDGAR using MinPath.
Design comparative visualizations for the EDGAR frontend using the resulting data, allowing users to interactively explore their data (see fig. 4 here as an example)
Adjust the project scope in consultation with the student depending on the project status to accommodate shared ideas, as EDGAR incorporates a wide selection of data with potential for creative analysis methods.

Requirements

Programming skills in Python and JavaScript (can also be learned during the process)
Basic SQL database knowledge

PlasmidHunter: Validation of a metagenome-based plasmid search using public plasmid sequences (M.Sc.)

Background

Plasmids play an important role in the genetic variability of organisms. They replicate independently and between organisms - within and between species. Therefore, plasmids are key drivers of horizontal gene transfer. Often, they are the effective and only difference between commensal and pathogenic bacterial strains. In recent years, it became obvious that plasmids belong to the main mechanisms for the dissemination of antimicrobial resistances and hence are of special interest in medical microbiology. Detecting plasmids and analyzing their dissemination is an important epidemiological and scientific topic that might help to detect current and prevent future outbreaks of antibiotic resistances.

One promising data source containing known and unknown plasmids are whole-metagenome datasets of samples from different sources (soil, waste water, the human gut). For many of these samples, sequencing data is freely accessible in public databases, often annotated with additional meta information such as date, source and location of each sample.

Our project processes these datasets from the MGnify database in a standardized way via modern cloud technologies and makes them accessible to users for a fast search of new plasmids within this huge amount of data.

This master thesis should validate this search via existing plasmid databases (such as PLSDB) and analyze search results including comprehensive visualizations.

Thesis Aims

Implementation of a workflow to process PLSDB entries with our existing search workflow
Statistical analysis of the results, and screen for potential interesting candidates for further analysis
Visualization of the results

Prerequisites

Knowledge of command line tools and Python
Interest in cloud technologies
Prior experience with workflow systems, like Nextflow or Snakemake

Contact: Sebastian Beyvers

Webservice for searching gene families in plants (M. Sc.)

Background

The input is a list of protein sequences. In step 1a, a Pfam search is performed with the sequences to find common domains. In step 1b, a multiple sequence alignment of the sequences is calculated. The conserved regions are automatically extracted from the alignment to calculate HMMs. In step 2, the HMMs of the domains from 1a and 1b are used to search a database of plant proteins.

Thesis Aims

The results are visualized and made available for download
Steps 1 and 2 are also provided as a command-line tool

Prerequisites

The programming language(s) and frameworks can be freely chosen
Test data will be provided

Contact: Oliver Rupp

Ribosomal binding site prediction based on 16S-rRNA (M.Sc.)

Background

Bacterial translation is initiated by the assembly of ribosomal proteins as part of the translation initiation complex at the coding sequence (CDS) start site. For most CDS, there is a ribosomal binding site (RBS) immediately upstream of the gene, consisting of a 5-10bp spacer and a (partial or complete) Shine-Dalgarno sequence (SD) 5’-AGGAGG-3’ to which the ribosome binds. However, some genes have neither an SD nor a known RBS and are still expressed (Omotajo, D. et al., 2015). The Shine-Dalgarno sequence was first described in E. coli but is found in many bacterial genomes and is complementary to the anti-SD sequence at the 3′-end of 16S-rRNA.

The exact Shine-Dalgarno and spacer sequences vary between bacterial species. However, because the anti-Shine-Dalgarno sequence is present in the 16S-rRNA of each bacterial genome, it can be used to predict RBS in a species-independent manner. Therefore, a deep learning approach using the 16S-rRNA sequences and the sequence upstream of the CDS is promising for accurately predicting the presence of RBS independent of species-specific variants.

Thesis Aims

Design and implementation of a neural network for ribosomal binding site prediction in bacteria,
evaluation of the features used by the neural network, and
analysis of the presence of RBS in exemplary bacterial genomes

Prerequisites

Prior experience with deep learning frameworks such as Tensorflow/Keras, or willingness to learn them
Prior experience in the development of documented code and dependency management or willingness to learn them

Contact: Julian Hahnfeld

Integrative Omics FAIR Workflow (M.Sc.)

Background

Processing and analysing 'omics data often requires applying predefined building blocks of code, i.e. for performing quality control, statistical analysis or machine learning. However, biologists and ecologists are often overwhelmed with the technical complexity of programmatic approaches and interfaces. Hence, scientific workflows can not just automate, but also facilitate important re-occuring processes in high-throughput 'omics analysis.

The existing modularized iESTIMATE pipeline aims at automating and facilitating the complex analysis of ecological metabolomics data and the integration with other phenomics and preparation for sequencing and (meta-)genomics data. The central aim of the pipeline is to extract so called molecular traits that explain molecular mechanisms in plants or microorganisms.

Thesis Aims

Revision and modularisation of existing code to create the R package "iESTIMATE"
Implementing a workflow in NextFlow or Common Workflow Language (CWL) using test data, implementing unit tests and capture provenance information
Publish R package and the workflow following the FAIR principles

Prerequisites

Knowledge of R and a bit of Python
Knowledge of Linux command line, containers, NextFlow (Groovy), YAML, or motivation to become acquainted with them
Keen interest in analysis of integrative 'omics data and in topics in molecular ecology

Contact: Kristian Peters

Navigation

Open thesis topics

Comparative genome analysis of Streptococcus agalactiae (GBS) from elephants (M.Sc.)

Workflow Design (Nextflow) (M.Sc.)

Platon Bioinformatics Tool Enhancement for Faster Plasmid Identification (M.Sc.) - taken

Reconstruction and visualization of KEGG metabolic pathways in the EDGAR platform (M.Sc.)

PlasmidHunter: Validation of a metagenome-based plasmid search using public plasmid sequences (M.Sc.)

Webservice for searching gene families in plants (M. Sc.)

Ribosomal binding site prediction based on 16S-rRNA (M.Sc.)

Integrative Omics FAIR Workflow (M.Sc.)

Background

Direct Links 2

Comparative genome analysis of Streptococcus agalactiae (GBS) from elephants (M.Sc.)

Workflow Design (Nextflow) (M.Sc.)

Platon Bioinformatics Tool Enhancement for Faster Plasmid Identification (M.Sc.) - taken

Reconstruction and visualization of KEGG metabolic pathways in the EDGAR platform (M.Sc.)

PlasmidHunter: Validation of a metagenome-based plasmid search using public plasmid sequences (M.Sc.)

Webservice for searching gene families in plants (M. Sc.)

Ribosomal binding site prediction based on 16S-rRNA (M.Sc.)

Integrative Omics FAIR Workflow (M.Sc.)Background

Integrative Omics FAIR Workflow (M.Sc.)

Background