This is the repository for storing scripts attempting to compare pipelines for TGIRT-seq gene quantifications.
We used MAQC samples as test data.
Set up environment
conda create -n benchmark \
python=3.6 \
bioconductor-deseq2=1.14 \
salmon=0.8.2 \
kallisto=0.43 \
bioconductor-tximport=1.4.0 \
r-tidyverse \
hisat2=2.1.0 \
bowtie2=2.3.3
python download_data/download.py
and run all the command it generates
- download Ensembl transcripts
- download ERCC transcripts
- get rRNA fasta file from NCBI
- merge with customized tRNA fasta
- Make table matching transcripts to genes
- bowtie2 index [rRNA and tRNA]
- downlaod ensembl genes from biomart
- Make bed file with rRNA, tRNA and ERCC and all ensembl genes
- Make SAF file for featureCounts
- split bed file for counts
- download Ensembl hg38 genome
- spike in rRNA and ERCC fasta
- make HISAT2 index from the reference
cd download_data
bash make_reference.sh
All path should be changed accordingly if running on other machines.
All scripts are under: pipelines/alignment_free/kallisto/
General command is as followed:
kallisto quant \
-i ${INDEX} -o ${COUNT_PATH}/${SAMPLENAME} \
--fr-stranded --threads=${THREADS}\
${R1} ${R2}
bash build_index.sh
bash kallisto.sh > command.sh
bash command.sh
Rscript kallisto_DESeq.R #Tximport then DESeq2
All scripts are under: pipelines/alignment_free/salmon/
General command is as followed:
salmon quant \
--seqBias --gcBias \
--index $INDEX --libType ISF \
--writeMappings \
--threads=${THREADS} \
--auxDir $RESULTPATH/${SAMPLENAME} \
--numBootstraps 100 \
--mates1 ${R1} --mates2 ${R2} \
--output $RESULTPATH/${SAMPLENAME} \
\| samtools view -b@ $THREADS \
\> ${RESULTPATH}/${SAMPLENAME}.bam
bash build_index.sh
bash salmon.sh > command.sh
bas command.sh
Rscript salmon_DESeq.R #Tximport then DESeq2
All scripts are under: pipelines/genome_mapping/conventional/
bash hisat2.sh > command.sh
bash command.sh
python feature_counts.py
python clean_output.py # for cleaning the featureCounts output for DESeq2
All scripts are under: pipelines/genome_mapping/tgirt_map_pipeline/
The pipeline can be downloaded here
prepair_command.sh > command.sh
bash command.sh
python makeCountTable.py #for making a gene count table
Rscript pipelines/genome_mapping/deseq.R
Analysis and plottings are mostly done in R scripts under find_DE/
.