crg-cnag / callings-nf Goto Github PK

View Code? Open in Web Editor NEW

130.0 9.0 53.0 29.71 MB

GATK RNA-Seq Variant Calling in Nextflow

License: Mozilla Public License 2.0

R 35.82% Nextflow 64.18%

nextflow ngs variant-calling rna-seq gatk bioinformatics genomics

callings-nf's Issues

Documentation issue?

The documentation at https://github.com/CRG-CNAG/CalliNGS-NF/blob/master/docker/README.adoc states

A Docker container with all tools except the Genome Analysis Toolkit can be built from the Dockerfile present in this folder

However, the Docker file does install GATK as well in https://github.com/CRG-CNAG/CalliNGS-NF/blob/master/docker/Dockerfile#L46:

&& curl -fsSL https://github.com/broadinstitute/gatk/releases/download/4.1.1.0/gatk-4.1.1.0.zip > gatk-4.zip \

Process POST_PROCESS_VCF failes when order of chromosomes in result.DP8.vcf differs from that in the GRCm38/Annotation/Variation/Mus_musculus.vcf

To fix, I added a call to vcf-sort in the middle of the POST_PROCESS_VCF script - I tried installing and using bcftools, but it requires a header with the "contig" section which is not present in these intermediate files, and vcftools is already included in the container. Will submit PR for review.

Error from vcftools on process failure is:

Comparing sites in VCF files...
  Error: Cannot determine chromosomal ordering of files, both files must contain the same chromosomes to use the diff functions.
  Found 10 in file 1 and 1 in file 2.

Looking in the working directory associated with the failing task, POST_PROCESS_VCF produces the file result.DP8.vcf with chromosomes ordered as grep -v "#" result.DP8.vcf | cut -f 1 | uniq | tr "\n" " ":

grep -v "#" result.DP8.vcf | cut -f 1 | uniq | tr "\n" " "
# 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 MT X Y
singularity exec callings-nf_gatk4.sif vcf-sort result.DP8.vcf > result.DP8.vcf.sorted
# unix command printed on execution is "sort -k1,1d -k2,2n"
grep -v "#" result.DP8.vcf.sorted | cut -f 1 | uniq | tr "\n" " "
# 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y
grep -v "#" filtered.recode.vcf | cut -f 1 | uniq | tr "\n" " "
# 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y

Possible confounder here is that we are trying to use the Boyle lab's https://github.com/Boyle-Lab/Blacklist for mm10, but using Ensembl build of GRCm38 available from iGenomes - I wrote in a profile into the config as such:

singularity {
    singularity.enabled = true
    singularity.cacheDir = './singularity_cache'
    process {
        container = 'quay.io/nextflow/callings-nf:gatk4'
        executor = 'slurm'
        queue = 'our_queue'
        memory = 16.GB
        errorStrategy = 'finish'
        withLabel: mem_large { memory = 48.GB }
        withLabel: mem_xlarge { memory = 64.GB }
            params {
                genome  = "iGenomes/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/genome.fa"
                reads = "reads/*_{1,2}.fastq.gz"
                variants  = "iGenomes/Mus_musculus/Ensembl/GRCm38/Annotation/Variation/Mus_musculus.vcf"
                denylist  = "iGenomes/Blacklist/lists/mm10-blacklist.v2.bed"
                results    = "./results"
            }
    }

ERROR: Input files reference and features have incompatible contigs

I don't konw how to set --variants parameter, and what's wrong with the following code:
nextflow run main.nf --reads '/home/liukai/postd/msi_project/RNAseq1101/00.CleanData/C15*RNA{1,2}.clean.fq.gz' --denylist ~/db/human_genome_index/hg38/S1667195179_agilent_region.hg38.bed —variants /home/liukai/db/human_genome_inde
x/hg38/hg38_VCF/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --results my_msi_results_newref --genome /home/liukai/db/human_genome_index/hg38/hg38_VCF/Homo_sapiens_assembly38.fasta -profile docker -resume

Error in SplitNCigarReads step

Thanks for developing and (maintaining?) this pipeline!
I tried to run it but ran into some issues . Do you have any ideas?

ERROR ~ Error executing process > '3_rnaseq_gatk_splitNcigar (S31)'

Caused by:
  Process `3_rnaseq_gatk_splitNcigar (S31)` terminated with an error exit status (1)

Command executed:

  # SplitNCigarReads and reassign mapping qualities
  java -jar /DATA/resources/gatk/GATK-3.7/GenomeAnalysisTK.jar -T SplitNCigarReads           -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -I Aligned.sortedByCoord.out.bam           -o split.bam           -rf ReassignOneMappingQuality           -RMQF 255 -RMQT 60           -U ALLOW_N_CIGAR_READS           --fix_misencoded_quality_scores

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO  01:01:07,799 HelpFormatter - --------------------------------------------------------------------------------
  INFO  01:01:07,801 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
  INFO  01:01:07,801 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
  INFO  01:01:07,802 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
  INFO  01:01:07,802 HelpFormatter - [Wed Mar 06 01:01:07 CET 2019] Executing on Linux 4.4.0-142-generic amd64
  INFO  01:01:07,802 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12
  INFO  01:01:07,806 HelpFormatter - Program Args: -T SplitNCigarReads -R Homo_sapiens.GRCh38.dna.primary_assembly.fa -I Aligned.sortedByCoord.out.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --fix_misencoded_quality_scores
  INFO  01:01:07,813 HelpFormatter - Executing as m.slagter@coley on Linux 4.4.0-142-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12.
  INFO  01:01:07,813 HelpFormatter - Date/Time: 2019/03/06 01:01:07
  INFO  01:01:07,814 HelpFormatter - --------------------------------------------------------------------------------
  INFO  01:01:07,814 HelpFormatter - --------------------------------------------------------------------------------
  INFO  01:01:07,889 GenomeAnalysisEngine - Strictness is SILENT
  INFO  01:01:08,231 GenomeAnalysisEngine - Downsampling Settings: No downsampling
  INFO  01:01:08,241 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
  INFO  01:01:08,286 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
  INFO  01:01:08,537 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
  INFO  01:01:08,545 GenomeAnalysisEngine - Done preparing for traversal
  INFO  01:01:08,546 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
  INFO  01:01:08,546 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
  INFO  01:01:08,547 ProgressMeter -        Location |     reads | elapsed |     reads | completed | runtime |   runtime
  INFO  01:01:08,572 ReadShardBalancer$1 - Loading BAM index data
  INFO  01:01:08,574 ReadShardBalancer$1 - Done loading BAM index data
  ##### ERROR ------------------------------------------------------------------------------------------
  ##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67):
  ##### ERROR
  ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
  ##### ERROR The error message below tells you what is the problem.
  ##### ERROR
  ##### ERROR If the problem is an invalid argument, please check the online documentation guide
  ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
  ##### ERROR
  ##### ERROR Visit our website and forum for extensive documentation and answers to
  ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
  ##### ERROR
  ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
  ##### ERROR
  ##### ERROR MESSAGE: Bad input: while fixing mis-encoded base qualities we encountered a read that was correctly encoded; we cannot handle such a mixture of reads so unfortunately the BAM must be fixed with some other tool
  ##### ERROR ------------------------------------------------------------------------------------------

GATK4 branch does not require separate Picard

The GATK4 jar has all of the Picard tools integrated, so it is no longer necessary to include a separate jar for Picard in the docker image nor to use the old command syntax which differs from the new GATK4 style.

I have made these changes in a local repository and can submit a PR if you'd like.

Using Single end data

Hi-

I have a single end RNA-seq data set that I would like to use the pipeline on. I've tried, but it seems to only complete processes 1A-1D and doesn't begin any of the others. I'm guessing this is due to only having one fastq, but I'm not 100% that's the issue.

Is there a way to specific to use single end data --- or could you point me in the right direction to update the pipeline for this purpose?

Any help would be appreciated.

Thanks,
Ben

Implement support for GATK4

Callings should be upgraded to support GTAK4. Unfortunately the new GATK version is not command line compatible with the previous version.

Using GATK4 the process 3_rnaseq_gatk_splitNcigar returns the following error:

 org.broadinstitute.hellbender.exceptions.UserException: '-T' is not a valid command.

To replicate the error run the pipeline with the gatk4 profile, eg:

nextflow run CRG-CNAG/CalliNGS-NF -profile gatk4

The on-the-fly two-pass option could be used to avoid the genome regeneration step

According to the STAR manual...

https://raw.githubusercontent.com/alexdobin/STAR/master/doc/STARmanual.pdf

8.3 2-pass mapping with re-generated genome.

This is the original 2-pass method which involves genome re-generation step in-between 1st and 2nd
passes. Since 2.4.1a, it is recommended to use the on the fly 2-pass options as described above.

It seems to say that genome regeneration is not recommended.

8.1 Multi-sample 2-pass mapping.
For a study with multiple samples, it is recommended to collect 1st pass junctions from all samples.

Run 1st mapping pass for all samples with "usual" parameters. Using annotations is recommended either a the genome generation step, or mapping step.

Run 2nd mapping pass for all samples , listing SJ.out.tab files from all samples in --sjdbFileChrStartEnd /path/to/sj1.tab /path/to/sj2.tab ....

Honestly, I am not sure what 2-pass mapping is, but maybe the following script can be improved by omitting the genome re-generation.

CalliNGS-NF/modules.nf

Lines 113 to 142 in 6492702

  # ngs-nf-dev Align reads to genome 

  STAR --genomeDir $genomeDir \ 

  --readFilesIn $reads \ 

  --runThreadN $task.cpus \ 

  --readFilesCommand zcat \ 

  --outFilterType BySJout \ 

  --alignSJoverhangMin 8 \ 

  --alignSJDBoverhangMin 1 \ 

  --outFilterMismatchNmax 999 

  # 2nd pass (improve alignmets using table of splice junctions and create a new index)  

  mkdir genomeDir  

  STAR --runMode genomeGenerate \ 

  --genomeDir genomeDir \ 

  --genomeFastaFiles $genome \ 

  --sjdbFileChrStartEnd SJ.out.tab \ 

  --sjdbOverhang 75 \ 

  --runThreadN $task.cpus 

  # Final read alignments  

  STAR --genomeDir genomeDir \ 

  --readFilesIn $reads \ 

  --runThreadN $task.cpus \ 

  --readFilesCommand zcat \ 

  --outFilterType BySJout \ 

  --alignSJoverhangMin 8 \ 

  --alignSJDBoverhangMin 1 \ 

  --outFilterMismatchNmax 999 \ 

  --outSAMtype BAM SortedByCoordinate \ 

  --outSAMattrRGline ID:$replicateId LB:library PL:illumina PU:machine SM:GM12878

Cannot locate GATK.jar file

I have over 200 RNAsew raw files and I wanted to run this program on our HPC, which uses slurm as a job scheduler. I keep on getting this error even after downloading the correct version (3.7) of the gatk jar file. Can you help me on how to configure this pipeline to run on HPC?

Reading single end reads

Hi,

I have a single end reads and I would like to use this powerful pipeline to process my samples. Which command can I use to process this type of data since I can see the pipeline only accepts paired end reads.

mm39 variants / black list

Hello,
It is more a question than an issue: where would you look for a good resource of known variants for the mm39 assembly (and one for the "deny list")?
Is it just to soon since mm39 was released to find such data?
Thank you!

samtools: command not found

nextflow run CRG-CNAG/CalliNGS-NF --gatk /home/cllcentosvm/GenomeAnalysisTK.jar

N E X T F L O W ~ version 19.04.1 Launching CRG-CNAG/CalliNGS-NF` [irreverent_pasteur] - revision: e9e0fcf [master]
C A L L I N G S - N F v 1.0
genome : /home/cllcentosvm/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/genome.fa
reads : /home/cllcentosvm/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/reads/rep1_{1,2}.fq.gz
variants : /home/cllcentosvm/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/known_variants.vcf.gz
blacklist: /home/cllcentosvm/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/blacklist.bed
results : results
gatk : /home/cllcentosvm/GenomeAnalysisTK.jar
[warm up] executor > local
executor > local (4)
[22/5f3c38] process > 1C_prepare_star_genome_index [100%] 1 of 1, failed: 1
[5c/fed7e8] process > 1A_prepare_genome_samtools [100%] 1 of 1, failed: 1 ✘
[db/b888c0] process > 1B_prepare_genome_picard [100%] 1 of 1, failed: 1
[13/3808c1] process > 1D_prepare_vcf_file [200%] 2 of 1, failed: 2 ✘
WARN: Killing pending tasks (3)
ERROR ~ Error executing process > '1A_prepare_genome_samtools (genome)'

Caused by:
Process 1A_prepare_genome_samtools (genome) terminated with an error exit status (127)

Command executed:

samtools faidx genome.fa

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: samtools: command not found

Work dir:
/home/cllcentosvm/work/5c/fed7e8c936daf9d7177e140378b822

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option -resume

-- Check '.nextflow.log' file for details`

Operator `phase` is deprecated

When trying to run nextflow run CRG-CNAG/CalliNGS-NF -profile docker with the N E X T F L O W ~ version 22.10.3 I receive the following error: Operator 'phase' is deprecated -- it will be removed in a future release.

Checking the log reveals:

Dec-06 14:15:45.034 [main] ERROR nextflow.cli.Launcher - @unknown
groovy.lang.DeprecationException: Operator `phase` is deprecated -- it will be removed in a future release
	at nextflow.extension.OpCall.checkDeprecation(OpCall.groovy:327)
	at nextflow.extension.OpCall.invoke1(OpCall.groovy:319)
	at nextflow.extension.OpCall.invoke0(OpCall.groovy:306)
	at nextflow.extension.OpCall.invoke(OpCall.groovy:166)
	at nextflow.extension.OpCall.call(OpCall.groovy:113)
	at nextflow.plugin.extension.PluginExtensionProvider.invokeExtensionMethod(PluginExtensionProvider.groovy:279)
	at groovy.runtime.metaclass.NextflowDelegatingMetaClass.invokeMethod(NextflowDelegatingMetaClass.java:59)
	at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:44)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at Script_6487ce9d.group_per_sample(Script_6487ce9d:371)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at nextflow.script.FunctionDef.invoke_a(FunctionDef.groovy:65)
	at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:41)
	at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:94)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:408)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:350)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:194)
	at Script_3558d273$_runScript_closure1$_closure2.doCall(Script_3558d273:122)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:205)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:189)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:52)
	at nextflow.script.ChainableDef$invoke_a.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at nextflow.script.BaseScript.runDsl2(BaseScript.groovy:208)
	at nextflow.script.BaseScript.run(BaseScript.groovy:217)
	at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:230)
	at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:225)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:131)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:354)
	at nextflow.cli.Launcher.run(Launcher.groovy:487)
	at nextflow.cli.Launcher.main(Launcher.groovy:646)

I would appreciate any advice that helps me get forward.

Much obliged,

Blaž

Error: Unable to access jarfile /scratch/oknjav001/transcriptomics/proteogenomics/variabtcalling/gatk/gatk3.7/GenomeAnalysisTK.jar

I am getting the following error. Can you please help with this
Error executing process > '3_rnaseq_gatk_splitNcigar (rep1)'

Caused by:
Process 3_rnaseq_gatk_splitNcigar (rep1) terminated with an error exit status (1)

Command executed:

SplitNCigarReads and reassign mapping qualities

java -jar /scratch/oknjav001/transcriptomics/proteogenomics/variabtcalling/gatk/gatk3.7/GenomeAnalysisTK.jar -T SplitNCigarReads -R genome.fa -I Aligned.sortedByCoord.out.bam -o spli
t.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS --fix_misencoded_quality_scores

Command exit status:
1

Command output:
(empty)

Command error:
Error: Unable to access jarfile /scratch/oknjav001/transcriptomics/proteogenomics/variabtcalling/gatk/gatk3.7/GenomeAnalysisTK.jar

Work dir:
/scratch/oknjav001/transcriptomics/proteogenomics/analscripts/work/84/39ebb85b80c6ec21edc450c6f70222

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

The genomeDir dir name is confusing

The genomeDir name is used twice as a variable name:

CalliNGS-NF/main.nf

Line 181 in 8913434

STAR --genomeDir $genomeDir \

and as a local directory name:

CalliNGS-NF/main.nf

Line 193 in 8913434

--genomeDir genomeDir \

CalliNGS-NF/main.nf

Line 200 in 8913434

STAR --genomeDir genomeDir \

Besides it works it's really confusing when reading the task command. A different name should be used.

containerOverrides for AWS BATCH

Hello,
I am running CalliNGS workflow using AWS BATCH on AWS Sagemaker using AWS s3 storage drives. I am using the following containerOverides:

containerOverrides={
        'command': [
            "s3://{0}/{1}".format(workflowBucket, workflowFolderPrefix),
            "--reads", "s3://nextflowdataegenesis1/RNASeq_workflow/payload_9/raw_fastq_test1/1839-{1,2}_R{1,2}_001.fastq.gz",
            "--genome", "s3://nextflowdataegenesis1/RNASeq_workflow/payload_9/reference_test1/Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa",
            "--variants", "s3://ngsexperiments/processed_data/WGS_Payload_9_pigs_05_2019/1839_Huck/1839_PL9_sample_short_reads_raw.snps.indels.vcf",
            "--results", "s3://nextflowdataegenesis/RNASeq_workflow/results_payload_9/output_RNASeq_variants_payload_9/1839_Huck"
            
        ]
    }

I am getting the following error:

Waiting for head job to start...
Head job is running...
s3://nextflow1/scripts --reads s3://payload_9/raw_fastq_test1/1839-{1,2}_R{1,2}_001.fastq.gz --genome s3://nextflow/RNASeq/payload_9/reference_test1/Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa --variants s3://ngsexperiments/processed_data/WGS_Payload_9_pigs_05_2019/1839_Huck/1839_PL9_sample_short_reads_raw.snps.indels.vcf --results s3://nextflow1/RNASeq_workflow/results_payload_9/output_RNASeq_variants_payload_9/1839_Huck
Transitioning to Nextflow
nextflow run ./main.nf --reads s3://nextflow/RNASeq/payload_9/raw_fastq_test1/1839-{1,2}_R{1,2}_001.fastq.gz --genome s3://nextflow/RNASeq/payload_9/reference_test1/Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa --variants s3://ngsexperiments/processed_data/WGS_Payload_9_pigs_05_2019/1839_Huck/1839_PL9_sample_short_reads_raw.snps.indels.vcf --results s3://nextflow1/RNASeq_workflow/results_payload_9/output_RNASeq_variants_payload_9/1839_Huck
N E X T F L O W  ~  version 19.04.0
Launching `./main.nf` [fervent_shockley] - revision: ee02720434
C A L L I N G S  -  N F    v 1.0 
================================
genome   : s3://nextflow/RNASeq/payload_9/reference_test1/Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa
reads    : s3://nextflow/RNASeq/payload_9/raw_fastq_test1/1839-{1,2}_R{1,2}_001.fastq.gz
variants : s3://ngsexperiments/processed_data/WGS_Payload_9_pigs_05_2019/1839_Huck/1839_PL9_sample_short_reads_raw.snps.indels.vcf
blacklist: /opt/work/aa6904a6-b74e-4350-a1c5-e631aebfa737/1/data/blacklist.bed
results  : s3://nextflow1/RNASeq_workflow/results_payload_9/output_RNASeq_variants_payload_9/1839_Huck
gatk     : /opt/work/aa6904a6-b74e-4350-a1c5-e631aebfa737/1/GenomeAnalysisTK.jar
Uploading local `bin` scripts folder to s3://nextflow1/dharm_nextflow_logs/runs/tmp/49/0dbd091c08849fbb2c2adcdd095920/bin
executor >  awsbatch (4)
[f6/1503b6] process > 1C_prepare_star_genome_index [  0%] 0 of 1
[ff/3f1fef] process > 1B_prepare_genome_picard     [  0%] 0 of 1
[a4/92c1de] process > 1D_prepare_vcf_file          [  0%] 0 of 1
[3f/a4a3d4] process > 1A_prepare_genome_samtools   [  0%] 0 of 1
Head job FAILED
executor >  awsbatch (4)
[f6/1503b6] process > 1C_prepare_star_genome_index [  0%] 0 of 1
[ff/3f1fef] process > 1B_prepare_genome_picard     [100%] 1 of 1, failed: 1 ✘
[a4/92c1de] process > 1D_prepare_vcf_file          [  0%] 0 of 1
[3f/a4a3d4] process > 1A_prepare_genome_samtools   [  0%] 0 of 1
ERROR ~ Error executing process > '1B_prepare_genome_picard (Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN)'
Caused by:
  Process `1B_prepare_genome_picard (Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN)` terminated with an error exit status (137)
Command executed:
  PICARD=`which picard.jar`
  java -jar $PICARD CreateSequenceDictionary R= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa O= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict
Command exit status:
  137
Command output:
  (empty)
Command error:
  [Thu May 16 14:32:17 UTC 2019] picard.sam.CreateSequenceDictionary REFERENCE=Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa OUTPUT=Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
  [Thu May 16 14:32:17 UTC 2019] Executing as root@ip-10-68-96-187 on Linux 4.14.101-75.76.amzn1.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT
  .command.sh: line 3:   106 Killed                  java -jar $PICARD CreateSequenceDictionary R= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa O= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict
Work dir:
  s3://nextflow1/dharm_nextflow_logs/runs/ff/3f1fef1a119d9c598d6dfaddb2bfa7
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
 -- Check '.nextflow.log' file for details
executor >  awsbatch (4)
[f6/1503b6] process > 1C_prepare_star_genome_index [100%] 1 of 1, failed: 1
[ff/3f1fef] process > 1B_prepare_genome_picard     [100%] 1 of 1, failed: 1 ✘
[a4/92c1de] process > 1D_prepare_vcf_file          [100%] 1 of 1, failed: 1
[3f/a4a3d4] process > 1A_prepare_genome_samtools   [100%] 1 of 1, failed: 1
WARN: Killing pending tasks (3)
ERROR ~ Error executing process > '1B_prepare_genome_picard (Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN)'
Caused by:
  Process `1B_prepare_genome_picard (Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN)` terminated with an error exit status (137)
Command executed:
  PICARD=`which picard.jar`
  java -jar $PICARD CreateSequenceDictionary R= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa O= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict
Command exit status:
  137
Command output:
  (empty)
Command error:
  [Thu May 16 14:32:17 UTC 2019] picard.sam.CreateSequenceDictionary REFERENCE=Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa OUTPUT=Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
  [Thu May 16 14:32:17 UTC 2019] Executing as root@ip-10-68-96-187 on Linux 4.14.101-75.76.amzn1.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT
  .command.sh: line 3:   106 Killed                  java -jar $PICARD CreateSequenceDictionary R= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa O= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict
Work dir:
  s3://nextflow1/dharm_nextflow_logs/runs/ff/3f1fef1a119d9c598d6dfaddb2bfa7
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
 -- Check '.nextflow.log' file for details
executor >  awsbatch (4)
[f6/1503b6] process > 1C_prepare_star_genome_index [100%] 1 of 1, failed: 1
[ff/3f1fef] process > 1B_prepare_genome_picard     [100%] 1 of 1, failed: 1 ✘
[a4/92c1de] process > 1D_prepare_vcf_file          [100%] 1 of 1, failed: 1
[3f/a4a3d4] process > 1A_prepare_genome_samtools   [100%] 1 of 1, failed: 1
WARN: Killing pending tasks (3)
ERROR ~ Error executing process > '1B_prepare_genome_picard (Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN)'
Caused by:
  Process `1B_prepare_genome_picard (Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN)` terminated with an error exit status (137)
Command executed:
  PICARD=`which picard.jar`
  java -jar $PICARD CreateSequenceDictionary R= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa O= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict
Command exit status:
  137
Command output:
  (empty)
Command error:
  [Thu May 16 14:32:17 UTC 2019] picard.sam.CreateSequenceDictionary REFERENCE=Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa OUTPUT=Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
  [Thu May 16 14:32:17 UTC 2019] Executing as root@ip-10-68-96-187 on Linux 4.14.101-75.76.amzn1.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT
  .command.sh: line 3:   106 Killed                  java -jar $PICARD CreateSequenceDictionary R= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.fa O= Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL9_full_plus_pBACN.dict
Work dir:
  s3://nextflow1/dharm_nextflow_logs/runs/ff/3f1fef1a119d9c598d6dfaddb2bfa7
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
 -- Check '.nextflow.log' file for details

If you don't mind can you please let us know if I am using the right commnets and flags in containertOverrides or I am making some other mistake to run this on AWS Batch.

Thanks,

With Regards,
Dharm

could not execute mkdir

Command error:
mkdir: cannot create directory 'genome_dir': Permission denied
I did make all folders writable before

suggest adding docker runOptions to nextflow.config

hi there,

I'm looking at this after taking your nextflow class last week at Fred Hutch - thanks again for that: it was really helpful.

It might be good to add to the nextflow.config file something like this:
docker {
enabled = true
runOptions = "-u $(id -u):$(id -g)"
}

because I just ran the pipeline as is from github and now I have files in a work dir that I cannot delete!

Also enabling docker here would prevent us naive users from having to figure this issue out:
#9

thanks!

Janet

Process `ASE_KNOWNSNPS (SAMPLE123XYZ)` terminated with an error exit status (2) (More then one variant context at position: chr3:XXXXXXX)

Variant calling step failing when using process scratch

When using local scratch folder, the step 5_rnaseq_call_variants returns an error.

This happens because the genome.dict input file contains a reference to a file created in temporary folder not accessible to the task, for example:

# cat genome.dict 
@HD	VN:1.5
@SQ	SN:chr22	LN:51304566	M5:a718acaa6135fdca8357d5bfe94211dd	UR:file:/tmp/nxf.Mzz6eisI1J/genome.fa

picard.jar not found

nextflow run CRG-CNAG/CalliNGS-NF --gatk /home/ubuntu/tools/GenomeAnalysisTK.jar
N E X T F L O W ~ version 19.04.1
Launching `CRG-CNAG/CalliNGS-NF` [curious_kilby] - revision: `8416386` [master]
C A L L I N G S - N F v 1.0

genome : /home/ubuntu/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/genome.fa
reads : /home/ubuntu/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/reads/rep1_{1,2}.fq.gz
variants : /home/ubuntu/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/known_variants.vcf.gz
blacklist: /home/ubuntu/.nextflow/assets/CRG-CNAG/CalliNGS-NF/data/blacklist.bed
results : results
gatk : /home/ubuntu/tools/GenomeAnalysisTK.jar
[warm up] executor > local
executor > local (4)
[b0/9254bf] process > 1D_prepare_vcf_file [100%] 1 of 1, failed: 1
[6c/73da36] process > 1B_prepare_genome_picard [100%] 1 of 1, failed: 1 ✘
[28/6cc9ac] process > 1C_prepare_star_genome_index [100%] 1 of 1, failed: 1
[7f/0b0c2c] process > 1A_prepare_genome_samtools [100%] 1 of 1, failed: 1
WARN: Killing pending tasks (3)
ERROR ~ Error executing process > '1B_prepare_genome_picard (genome)'

Caused by:
Process 1B_prepare_genome_picard (genome) terminated with an error exit status (1)

Command executed:

PICARD=which picard.jar
java -jar $PICARD CreateSequenceDictionary R= genome.fa O= genome.dict

Command exit status:
1

Command output:
(empty)

Work dir:
/mnt/volume1/data/todo/rnaseq/work/6c/73da36c9a0400e4514e65534e58d6d

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details
(base) ubuntu$ which picard
/home/ubuntu/anaconda3/bin/picard
(base) ubuntu$ which picard.jar
(base) ubuntu$

Whether do not generate star index if provide directly?

denylisted genome file

I am working on a project that requires us to test a couple of pipelines and really interested in incorporating this similar pipeline. I am however unaware of what the 'denylisted genome' file would be and its importance in this type of work. Someone help me understand this. thanks.
In addition, the link 'http://gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-practices-for-variant-calling-on-rnaseq-in-full-detail' to the documentation of the gatk workflow is invalid, kindly work on that as well.

will there be a nf of dna variant calling which need paired samples

Thanks a lot for building such a powerfule project
will there be a nf of dna variant calling which need paired samples?

like gatk4 in dna-seq?

Could not build fai index genome.fa.fai

I am getting the below error when I try to run this pipeline on HPC with -profile singularity. Our HPC does not support docker. Could you help in solving this?

`nextflow run CalliNGS-NF/ -profile singularity --genome /scratch/oknjav001/sarsCovRNA/CalliNGS-NF/data/genome.fa -c CalliNGS-NF/nextflow.config`
N E X T F L O W ~ version 21.10.6
Launching `CalliNGS-NF/main.nf` [pensive_kalman] - revision: d02d9193b8
C A L L I N G S - N F v 2.1

genome : /scratch/oknjav001/sarsCovRNA/CalliNGS-NF/data/genome.fa
reads : /scratch/oknjav001/sarsCovRNA/CalliNGS-NF/data/reads/rep1_{1,2}.fq.gz
variants : /scratch/oknjav001/sarsCovRNA/CalliNGS-NF/data/known_variants.vcf.gz
denylist : /scratch/oknjav001/sarsCovRNA/CalliNGS-NF/data/denylist.bed
results : results

executor > local (4)
[77/d01b20] process > PREPARE_GENOME_SAMTOOLS (genome) [ 0%] 0 of 1
[72/3c89cf] process > PREPARE_GENOME_PICARD (genome) [ 0%] 0 of 1
[03/63ccdb] process > PREPARE_STAR_GENOME_INDEX (genome) [ 0%] 0 of 1
executor > local (4)
[77/d01b20] process > PREPARE_GENOME_SAMTOOLS (genome) [100%] 1 of 1, failed: 1 ✘
[- ] process > PREPARE_GENOME_PICARD (genome) -
[03/63ccdb] process > PREPARE_STAR_GENOME_INDEX (genome) [100%] 1 of 1, failed: 1 ✘
[66/5dbe33] process > PREPARE_VCF_FILE (known_variants.vcf) [100%] 1 of 1 ✔
[- ] process > RNASEQ_MAPPING_STAR -
[- ] process > RNASEQ_GATK_SPLITNCIGAR -
[- ] process > RNASEQ_GATK_RECALIBRATE -
[- ] process > RNASEQ_CALL_VARIANTS -
[- ] process > POST_PROCESS_VCF -
[- ] process > PREPARE_VCF_FOR_ASE -
[- ] process > ASE_KNOWNSNPS -
Error executing process > 'PREPARE_GENOME_SAMTOOLS (genome)'
Caused by:
Process PREPARE_GENOME_SAMTOOLS (genome) terminated with an error exit status (255)

Command executed:

samtools faidx genome.fa

Command exit status:
255

Command output:
(empty)

Command error:
[fai_build] fail to open the FASTA file genome.fa
Could not build fai index genome.fa.fai

Work dir:
/scratch/oknjav001/sarsCovRNA/work/77/d01b200797821f93eb4177ceaa3c77

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Typo in the title of the flow-chart picture.

Simplfied instead of Simplified.

	# ngs-nf-dev Align reads to genome
	STAR --genomeDir $genomeDir \
	--readFilesIn $reads \
	--runThreadN $task.cpus \
	--readFilesCommand zcat \
	--outFilterType BySJout \
	--alignSJoverhangMin 8 \
	--alignSJDBoverhangMin 1 \
	--outFilterMismatchNmax 999

	# 2nd pass (improve alignmets using table of splice junctions and create a new index)
	mkdir genomeDir
	STAR --runMode genomeGenerate \
	--genomeDir genomeDir \
	--genomeFastaFiles $genome \
	--sjdbFileChrStartEnd SJ.out.tab \
	--sjdbOverhang 75 \
	--runThreadN $task.cpus

	# Final read alignments
	STAR --genomeDir genomeDir \
	--readFilesIn $reads \
	--runThreadN $task.cpus \
	--readFilesCommand zcat \
	--outFilterType BySJout \
	--alignSJoverhangMin 8 \
	--alignSJDBoverhangMin 1 \
	--outFilterMismatchNmax 999 \
	--outSAMtype BAM SortedByCoordinate \
	--outSAMattrRGline ID:$replicateId LB:library PL:illumina PU:machine SM:GM12878

crg-cnag / callings-nf Goto Github PK

callings-nf's Issues

SplitNCigarReads and reassign mapping qualities

nextflow run CRG-CNAG/CalliNGS-NF --gatk /home/ubuntu/tools/GenomeAnalysisTK.jar N E X T F L O W ~ version 19.04.1 Launching CRG-CNAG/CalliNGS-NF [curious_kilby] - revision: 8416386 [master] C A L L I N G S - N F v 1.0

nextflow run CalliNGS-NF/ -profile singularity --genome /scratch/oknjav001/sarsCovRNA/CalliNGS-NF/data/genome.fa -c CalliNGS-NF/nextflow.config N E X T F L O W ~ version 21.10.6 Launching CalliNGS-NF/main.nf [pensive_kalman] - revision: d02d9193b8 C A L L I N G S - N F v 2.1

Recommend Projects

Recommend Topics

Recommend Org

nextflow run CRG-CNAG/CalliNGS-NF --gatk /home/ubuntu/tools/GenomeAnalysisTK.jar
N E X T F L O W ~ version 19.04.1
Launching `CRG-CNAG/CalliNGS-NF` [curious_kilby] - revision: `8416386` [master]
C A L L I N G S - N F v 1.0

`nextflow run CalliNGS-NF/ -profile singularity --genome /scratch/oknjav001/sarsCovRNA/CalliNGS-NF/data/genome.fa -c CalliNGS-NF/nextflow.config`
N E X T F L O W ~ version 21.10.6
Launching `CalliNGS-NF/main.nf` [pensive_kalman] - revision: d02d9193b8
C A L L I N G S - N F v 2.1