This is a fork of DiVA.wgs (DNA Variant Analysis of WGS), a Snakemake-based pipeline for Next-Generation Sequencing Whole-Genome data analysis, developed at CRS4 Next Generation Sequencing Core Facility. Software dependencies are directly managed by Snakemake using Conda, ensuring the reproducibility of the workflow according to FAIR principles.
- Clone the repository from git-hub:
git clone https://github.com/igg-bioinfo/diva.wgs.git
- Rename the folder, from
diva.wgs
to your PROJECT_NAME:
mv diva.wgs PROJECT_NAME
- cd into the newly created folder:
cd PROJECT_NAME
-
Edit the configuration files in conf subfolder:
- config.yaml - paths to your reference files: genome, target regions, etc.
- samples.tsv - associate samples to FASTQ files
- samples.ped - pedigree file in ped format
- units.tsv - paths to FASTQ files
-
Edit the Snakefile and uncomment the output files you need
-
If conda package manager is not available, install miniconda.
-
Create a virtual environment containing snakemake, as suggested here. First install mamba as a replacement of the default conda solver:
conda install -c conda-forge mamba
- Then, install snakemake:
mamba env create --name snakemake --file environment.yaml
- Activate the enviroment:
conda activate snakemake
- Run snakemake in dry-run mode to check if everything is fine. YOUR_WORKING_DIR could follow the format: YYYY-MM-DD.
snakemake --cores 32 --use-conda --configfile conf/config.yaml --printshellcmds -d YOUR_WORKING_DIR --rerun-incomplete --keep-going --dryrun
- For verbose output:
snakemake --cores 32 --use-conda --configfile conf/config.yaml --printshellcmds -d YOUR_WORKING_DIR --rerun-incomplete --keep-going --verbose --reason --dryrun
- If you are happy with the --dryrun, run snakemake:
snakemake --cores 32 --use-conda --configfile conf/config.yaml --printshellcmds -d YOUR_WORKING_DIR --rerun-incomplete --keep-going --conda-frontend mamba
Tip: For large projects, we suggest to run snakemake in a screen session.