This repository contains code to analyse the ITS amplicon sequencing data with DADA2 for further comparisons with RNA-seq data from the same samples. It uses data collected for Haas et al (2018), but using dada2 and Swarm clustering instead of OTU clustering.
The repository consists of two units that are run separately, and an additional folder with R scripts for analysis and plotting of the results:
The folder contains instructions on how to run the workflow as a docker container. It will download the raw data from the ENA and demultiplex the sequences into files per sample, as well as concatenate technical replicates.
The folder contains a snakemake workflow that will reproduce the preprocessing of the demultiplexed ITS amplicon sequencing data as used in the study.
The workflow is run through snakemake, from the root folder of the repository (where this readme sits). To continue with the demultiplexed data, we need to move it from the demultiplex_wf subfolder.
mkdir $(pwd)/data
mv $(pwd)/demultiplex_wf/data/ $(pwd)/
Next we will create a conda environment ("its_wf") needed to execute the snakemake workflow, and then activate it:
conda env create -n its_wf -f $(pwd)/environment.yml
conda activate its_wf
Once the conda environment has been created successfully, we can execute the workflow with the following command:
snakemake -s $(pwd)/workflow/Snakefile -pr -j 4 --use-conda
This will output the final count matrix and other results (such as sequences for every Swarm OTU and taxonomic assignments) into the results/ folder.
The folder contains scripts to reproduce the figures in the publication.