REMOve CONtaminant reads before variant calling
- Perl - https://www.perl.org
- BWA - http://bio-bwa.sourceforge.net
- SAMtools 1.x - http://www.htslib.org
- Common linux commands: bash, gzip, ...
If you already have Git (https://git-scm.com) installed, you can get the latest development version using Git.
git clone https://github.com/jiwoongbio/REMOCON.git
- Prepare BWA index files
bwa index <genome.fasta>
bwa index <genome.contaminant.fasta>
- Remove contaminant reads
- Use remocon.sh
./remocon.sh <output.prefix> <genome.fasta> <genome.contaminant.fasta> <threads> <input.1.fastq> [input.2.fastq]
- Use remocon.pl
# Align reads
bwa mem <genome.fasta> <input.1.fastq> [input.2.fastq] | gzip > output.sam.gz
bwa mem <genome.contaminant.fasta> <input.1.fastq> [input.2.fastq] | gzip > output.contaminant.sam.gz
# (Optional) Recalculate alignment scores
perl remocon_alignment_score.pl output.sam.gz | gzip > output.alignment_score_added.sam.gz && mv output.alignment_score_added.sam.gz output.sam.gz
perl remocon_alignment_score.pl output.contaminant.sam.gz | gzip > output.contaminant.alignment_score_added.sam.gz && mv output.contaminant.alignment_score_added.sam.gz output.contaminant.sam.gz
# Compare alignment scores and remove contaminant reads
perl remocon.pl output.sam.gz output.contaminant.sam.gz | gzip > output.contaminant_removed.sam.gz
- Use remocon.sort.sh - take SAM files as input instead of FASTQ files
./remocon.sort.sh <output.prefix> <input.sam> <input.contaminant.sam>