Giter VIP home page Giter VIP logo

16s_sarcoidosis's Introduction

About

Code used for the analysis of the 16S rRNA gene sequencing dataset of pulmonary microbiota from sarcoidosis patients and patients with other interstitial lung disease.

The raw reads were submitted to Sequence Read Archive (SRA): BioProject ID PRJNA510549.

Requirements

Python

# create virt. env.
virtualenv -p /usr/bin/python3 venv
# activate it
source venv/bin/activate
# install requirements
pip install -r requirements.txt

R

See file requirements_R.txt.

Tools

  • FastQC: v0.11.7
  • KronaTools: 2.5
  • Optional: (see note below: "Note: precomputed output")
    • LotuS: 1.565
    • BLAST: 2.6.0+
      • Requires the NCBI nr/nt database
    • EDirect: 9.50

You may need to adjust the paths to these tools in pipeline.json. Currently all paths are linking to tools/.

Data

Raw FASTQ files are expected to be in data/fastq/ (otherwise adjust the path in pipeline.json) and their naming scheme should be {sid}_{aid}_{rid}.fastq where sid is the sample ID (same as in the column SampleID in data/meta_01.tsv), aid the amplicon ID (V1V2 or V3V4) and rid the read ID (R1 or R2). A mapping of the SRA IDs to the sample IDs can be found in data/sra_map.tsv.

Note: precomputed output

LotuS output

Rerunning LotuS should produce the same OTUs but their IDs can change. Thus, the output files are provided in data/precomputed/lotus and will be used by the script. If you want to rerun LotuS, comment out the currently used rule rule run_lotus_placeholder and remove the comment flags for the rule run_lotus (see comments in pipeline.snake).

BLASTn hits

Since the results depend on the used database version, the output files are provided in data/precomputed/blastn_nt and will be used by the script. The original rules run_blastn_asn, run_blastn_tab, and run_blastn_tax were commented out and a placeholder for run_blastn_tax was added (see comments in pipeline.snake).

Data

  • Raw reads (FASTQ files, paired): BioProject ID PRJNA510549
  • SRA ID mapping: data/sra_map.tsv
  • Sample meta information: files meta_01.tsv and meta_02.tsv in data/

There are: 51 subjects

  • 31 sarcoidosis subjects
  • 19 interstitial lung disease subjects
  • 1 NTC

For each subject there are V1/V2 and V3/V4 amplicons (i.e. 102 samples in total) and for each sample there are R1/R2 FASTQ files (104 FASTQ files in total).

Pipeline

Run

# virt. env. must be activated
snakemake -s pipeline.snake

Input data

  • Samples
    • data/meta_01.tsv: Meta data including IDs and Indication
    • data/ID_map.tsv: Mapping of sample IDs to new IDs for the manuscript:
      • NTC for NTC sample
      • S<i> for sarcoidosis samples
      • C<i> for "control" samples
    • data/meta_02.tsv: Additional sample information, e.g. smoker status etc.
  • NGS data (see section "Data" in "Requirements")
    • data/fastq/
  • Other
    • data/primer.txt: Primer sequences (needed for LotuS; see "Note: precomputed output")
    • data/blastn_nt/: BLASTn hits (see "Note: precomputed output")

Steps/resulting files

  • Creating quality reports:
    • FASTQC on eahc FASTQ file: data/fastqc/
    • MultiQC: Summary over all files: data/multiqc/
      • multiqc.html: MultiQC report
      • multiqc_plots.pdf: Plots based on MultiQC report
        • Read counts per sample, group and amplicon
      • multiqc_data/multiqc_general_stats.txt: General statistics of FASTQ files (e.g. read count)
  • LotuS analysis to create OTUs: data/lotus/ (see "Note: precomputed output")
  • OTU filtering: data/filtered_otus/
    • <amplicon>.tsv: Table with OTU taxonomy (LotuS, BLAST), filtering flag, and sample counts
    • <amplicon>.pdf: Plots
      • Percentage of counts covered by removed/kept OTUs
      • Phylogenetic tree of OTUs where removed OTUs are highlighted
  • Krona plots (taxonomic composition): data/krona/krona.html
    • Per amplicon, over all samples and per indication group
  • Analysis using ampvis2: data/ampvis2
    • <amplicon>.pdf:
      • Rarefaction curves
      • Alpha-diversity
    • <amplicon>_alphadiv.tsv: Alpha-diversity values per sample
  • CLR-transformation of OTU counts: data/clr
  • Analysis of transforomed values: data/clr_analysis/
    • <amplicon>_trans_pca.pdf: PCA plots
    • <amplicon>_trans_pheatmap.pdf: Heatmap plots
  • Looking at specific taxa using transformed values: data/clr_taxa/
    • <amplicon>.pdf: Heatmaps using OTUs from specified taxa
      • All OTUs
      • Per taxon
  • Diff. analysis using ALDEx2: aldex2_analysis/
    • <amplicon>_aldex2.tsv: Table with results, important columns are:
      • OTU ID (OTU)
      • Difference within/between groups (diff.win, diff.btw)
      • Effect size (effect)
      • P-value of the Wilcoxon test (wi.ep)
    • <amplicon>_aldex2_ov.pdf: Rresults summary plot
      • Within diff. vs. between diff. + boundaries where both are equal
      • CLR-value vs. between diff.
      • Between diff. vs. log(p-value) + p-value boundary
      • Effect size vs. log(p-value) + cutoff boundaries

16s_sarcoidosis's People

Stargazers

Ldh avatar

Watchers

Valentina Galata avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.