Project INF: Data management, bioinformatics and statistics

Associated Doctoral Researcher

Associated Principal Investigator

Background and current state of research

Patients with inflammatory bowel disease (IBD) are observed to have a significantly altered microbiome compared to healthy individuals. It is not yet clear whether this is a trigger or an effect of the disease. The investigation of the microbiome is complex and quickly results in large amounts of data that need to be analyzed and processed with appropriate statistical methods. When working with research data, concepts of good scientific practice should be applied so that data is documented, retrievable and reproducible. This is especially important for the success of collaborations. However, data management should not prevent scientists from their actual work, generating data and answering scientific questions. It is therefore important to relieve scientists of the task of managing data in the best possible way.

For the reproducibility of results, it has already been shown in many areas that the establishment of entire software workflows is of great advantage. This relieves researchers of the need to learn the usage of many individual software tools. We will establish various documented workflows with the tool Nextflow, which offers a platform-independent, scalable pipeline to overcome all hurdles between input of raw data and output of results.

This is a project within the DFG Research Unit RU5042 miTarget.

Our goals

Our aims are:

  • Coordinate the data management to ensure common standards that data can be integrated across individual projects for meta-analyses and cross-project comparisons concept.
  • Developing bioinformatic software pipelines based on Nextflow to enable reproducible and scalable analysis for metagenomic data.

How to get there

  • Data management workflows and data policies are under development. The Yoda platform is based on the iRODS system. The platform features a graphical user interface for easy to use access via web browser. In order to enable standardized data storage, metadata masks are created that are customized for the requirements of the project. In addition, the upload of data to the platform is monitored by a data manager to ensure compliance with the standards.
  • For metagenome analysis, software tools must be designed that follow the principles of good scientific practice. The focus is to establish workflows that are reproducible, scalable and easy to use. For this purpose, experience with Nextflow for bioinformatics workflows can be drawn upon. Nextflow has great advantages, as these pipelines are easy to use and can be established on various computer platforms due to the container-based design.