Giter VIP home page Giter VIP logo

nf-core / circdna Goto Github PK

View Code? Open in Web Editor NEW
27.0 151.0 14.0 188.41 MB

Pipeline for the identification of extrachromosomal circular DNA (ecDNA) from Circle-seq, WGS, and ATAC-seq data that were generated from cancer and other eukaryotic cells.

Home Page: https://nf-co.re/circdna

License: MIT License

HTML 0.63% Python 57.54% Groovy 7.13% Nextflow 34.38% Dockerfile 0.33%
nf-core nextflow workflow pipeline circular dna ecdna genomics circle-seq ampliconarchitect

circdna's Introduction

nf-core/circdna

[![GitHub Actions CI Status](https://github.com/nf-core/circdna/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/circdna/actions?query=workflow%3A%22nf-core+CI%22) [![GitHub Actions Linting Status](https://github.com/nf-core/circdna/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/circdna/actions?query=workflow%3A%22nf-core+linting%22)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/circdna/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.8085422?labelColor=000000)](https://doi.org/10.5281/zenodo.8085422)

Nextflow run with conda run with docker run with singularity Nextflow run with conda run with docker run with singularity Launch on Nextflow Tower

Get help on Slack Follow on Twitter Watch on YouTube

Introduction

nf-core/circdna is a bioinformatics best-practice analysis pipeline for the identification of extrachromosomal circular DNAs (ecDNAs) in eukaryotic cells. The pipeline is able to process WGS, ATAC-seq data or Circle-Seq data generated from short-read sequencing technologies. Depending on the input data and selected analysis branch, nf-core/circdna is able to identify various types of ecDNAs. This includes the detection of smaller ecDNAs, often referred to as eccDNAs or microDNAs, as well as larger ecDNAs that exhibit amplification. These analyses are facilitated through the use of prominent software tools that are widely recognized in the field of ecDNA or circular DNA research.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources.The results obtained from the full-sized test can be viewed on the nf-core website.

Pipeline summary

  1. Merge re-sequenced FastQ files (cat)
  2. Read QC (FastQC)
  3. Adapter and quality trimming (Trim Galore!)
  4. Map reads using BWA-MEM (BWA)
  5. Sort and index alignments (SAMtools)
  6. Choice of multiple ecDNA identification routes
    1. Circle-Map ReadExtractor -> Circle-Map Realign
    2. Circle-Map ReadExtractor -> Circle-Map Repeats
    3. CIRCexplorer2
    4. Samblaster -> Circle_finder Does not use filtered BAM file, specificied with --keep_duplicates false
    5. Identification of circular amplicons AmpliconArchitect
    6. De Novo Assembly of ecDNAs Unicycler -> Minimap2
  7. Present QC for raw reads (MultiQC)

Functionality Overview

A graphical view of the pipeline and its diverse branches can be seen below.

nf-core/circdna metromap

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

FASTQ input data:

sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz

BAM input data:

sample,bam
CONTROL_REP1,AEG588A1_S1_L002_R1_001.bam

Each row represents a pair of fastq files (paired end) or a single bam file generated from paired-end reads.

Now, you can run the pipeline using:

   nextflow run nf-core/circdna --input samplesheet.csv --outdir <OUTDIR> --genome GRCh38 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --circle_identifier <CIRCLE_IDENTIFIER> --input_format <"FASTQ"/"BAM">

Test AmpliconSuite-Pipeline with a test data-set

To test the correct installation of the pipeline and the use of AmpliconArchitect inside the AmpliconSuite-Pipeline, a small WGS data set is uploaded to AWS and can be downloaded and used with the parameter -profile test_AA_local. You just need to specify your local paths to the aa_data_repo and the mosek_license_dir. See AmpliconSuite-Pipeline for information about the data repository and the Mosek license. To note, the Mosek license file needs to be named mosek.lic inside the mosek_license_dir.

You can test the pipeline using:

   nextflow run nf-core/circdna -profile test_AA_local,<docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR> --aa_data_repo <path/to/aa_data_repo/> --mosek_license_dir <path/to/mosek_license_directory/>

Available ecDNA identifiers

Please specify the parameter circle_identifier depending on the pipeline branch used for circular DNA identifaction. Please note that some branches/software are only tested with specific NGS data sets.

Identification of putative ecDNA junctions with ATAC-seq or Circle-seq data

circle_finder uses Circle_finder > circexplorer2 uses CIRCexplorer2 > circle_map_realign uses Circle-Map Realign > circle_map_repeats uses Circle-Map Repeats for the identification of repetetive ecDNA

Identification of amplified ecDNAs with WGS data

ampliconarchitect uses AmpliconArchitect inside the AmpliconSuite-Pipeline

De novo assembly of ecDNAs with Circle-seq data

unicycler uses Unicycler for de novo assembly of ecDNAs and Minimap2 for accurate mapping of the identified circular sequences.

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/circdna was originally written by Daniel Schreyer, University of Glasgow, Institute of Cancer Sciences, Peter Bailey Lab.

We thank the following people for their extensive assistance in the development of this pipeline:

  • Sébastian Guizard: Review and Discussion of Pipeline
  • Alex Peltzer: Code Review
  • Phil Ewels: Help in setting up the pipeline repository and directing the pipeline development
  • nf-core community: Answering all nextflow and nf-core related questions
  • Peter Bailey: Discussion of Software and Pipeline Architecture

This pipeline has been developed by Daniel Schreyer as part of the PRECODE project. PRECODE received funding from the European Union’s Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie grant agreement No 861196.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #circdna channel (you can join with this invite).

Citations

If you use nf-core/circdna for your analysis, please cite it using the following doi: 10.5281/zenodo.6685250

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

circdna's People

Contributors

apeltzer avatar dschreyer avatar ewels avatar nf-core-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

circdna's Issues

Fail to load AmpliconArquitech files

Description of the bug

Hi!

I'm running the pipeline with the following pipeline:

nextflow run nf-core/circdna \
-r 1.0.4 \
-profile docker \
-resume \
--max_cpus 9 \
--max_memory 21.GB \
--max_time 500.h \
--circle_identifier circle_map_realign,circle_map_repeats,circle_finder,circexplorer2,ampliconarchitect \
--input work/test_mouse/samplesheets/CIRCDNA.csv \
--outdir results/test_mouse/CIRCDNA \
--genome GRCm38 \
--reference_build mm10 \
--mosek_license_dir src/others \
--fasta database/genomes/GRCm38/genome.fasta \
--aa_data_repo database/indexes/GRCm38/aa_data_repo

In the process NFCORE_CIRCDNA:CIRCDNA:AMPLICONCLASSIFIER_AMPLICONSIMILARITY I get the following error:

Command executed:

  REF=mm10
  export AA_DATA_REPO=/data/Proyectos/NGS_pipeline/database/indexes/GRCm38/aa_data_repo
  export AA_SRC=/home/nanoneuro/.nextflow/assets/nf-core/circdna/bin
  
  amplicon_similarity.py \
      --ref $REF \
       \
      --input ampliconclassifier.input
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:AMPLICONCLASSIFIER_AMPLICONSIMILARITY":
      AmpliconClassifier: $(echo $(amplicon_classifier.py --version | sed 's/amplicon_classifier //g' | sed 's/ .*//g'))
  END_VERSIONS

Command exit status:
  1

Command output:
  Required classifications set to
  set()

Command error:
  Required classifications set to
  set()
  Traceback (most recent call last):
    File "/usr/local/bin/amplicon_similarity.py", line 456, in <module>
      lcD, cg5D = set_lcd(AA_DATA_REPO, args.no_LC_filter)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/bin/amplicon_similarity.py", line 360, in set_lcd
      with open(AA_DATA_REPO + "file_list.txt") as infile:
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  FileNotFoundError: [Errno 2] No such file or directory: 'database/indexes/GRCm38/aa_data_repo/mm10/file_list.txt'

Work dir:
  /data/Proyectos/NGS_pipeline/work/26/406c0d8f5f873a353686003b4eaaee

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

At first I though it could be a problem due to being a relative path or so, but using

--aa_data_repo /data/Proyectos/NGS_pipeline/database/indexes/GRCm38/aa_data_repo

Yields the same error but with the full path.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF

Description of the bug

Hi, I ran circdna pipeline on bam files using identifier circle_finder, it broken with exit status 140 while executing process > 'NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF, and here is the error message:

Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF (e06588d9bca6003c0f0ec945f109b30a)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF (e06588d9bca6003c0f0ec945f109b30a)` terminated with an error exit status (140)

Command executed:

  samtools sort -n -@ 6 -o e06588d9bca6003c0f0ec945f109b30a.qname.sorted.bam -T e06588d9bca6003c0f0ec945f109b30a.qname.sorted e06588d9bca6003c0f0ec945f109b30a.md.filtered.sorted.bam
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF":
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  140

Command output:
  (empty)

Command error:
  [bam_sort_core] merging from 504 files and 6 in-memory blocks...

Work dir:
  /gpfs/share/home/1710305101/testNextflow/circfindertest/work/3e/d2171697cb00563ea61eda353bc918

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

The Command error seems not like an error. How can I solve this problem?
Hope for your reply. Thanks!

Command used and terminal output

command used:
nextflow run nf-core/circdna -profile singularity -r 1.0.1 -resume --circle_identifier ampliconarchitect,circle_finder --email [email protected] --input samplesheet.csv --input_format BAM --outdir . --fasta /gpfs/share/home/1710305101/ref/genome/ICGC/GRCh37/genome.fa --igenomes_base /gpfs/share/home/1710305101/ref/igenomes --custom_config_base /gpfs/share/home/1710305101/.nextflow/configs/singularity --aa_data_repo /gpfs/share/home/1710305101/ref/aa_repo --mosek_license_dir /gpfs/share/home/1710305101/mosek/mosek.lic --reference_build GRCh37
terminal output:
executor >  slurm (47)
[a9/a25249] process > NFCORE_CIRCDNA:CIRCDNA:INPU... [100%] 1 of 1 ✔
[e3/70d852] process > NFCORE_CIRCDNA:CIRCDNA:SAMT... [100%] 2 of 2 ✔
[6c/4bc278] process > NFCORE_CIRCDNA:CIRCDNA:SAMT... [100%] 2 of 2 ✔
[dc/adcc33] process > NFCORE_CIRCDNA:CIRCDNA:BAM_... [100%] 2 of 2 ✔
[ce/9e995e] process > NFCORE_CIRCDNA:CIRCDNA:BAM_... [100%] 2 of 2 ✔
[8c/011c42] process > NFCORE_CIRCDNA:CIRCDNA:BAM_... [100%] 2 of 2 ✔
[07/59c8d8] process > NFCORE_CIRCDNA:CIRCDNA:MARK... [100%] 2 of 2 ✔
[92/b57b98] process > NFCORE_CIRCDNA:CIRCDNA:MARK... [100%] 2 of 2 ✔
[22/7a8e3d] process > NFCORE_CIRCDNA:CIRCDNA:MARK... [100%] 2 of 2 ✔
[42/28ac59] process > NFCORE_CIRCDNA:CIRCDNA:MARK... [100%] 2 of 2 ✔
[87/5918e4] process > NFCORE_CIRCDNA:CIRCDNA:MARK... [100%] 2 of 2 ✔
[a2/dd7889] process > NFCORE_CIRCDNA:CIRCDNA:SAMT... [100%] 2 of 2 ✔
[a8/0c631d] process > NFCORE_CIRCDNA:CIRCDNA:SAMT... [100%] 2 of 2 ✔
[d4/1413fd] process > NFCORE_CIRCDNA:CIRCDNA:SAMT... [100%] 2 of 2 ✔
[b5/2101d4] process > NFCORE_CIRCDNA:CIRCDNA:CNVK... [100%] 2 of 2 ✔
[a2/a6b51d] process > NFCORE_CIRCDNA:CIRCDNA:CNVK... [100%] 2 of 2 ✔
[eb/1415bd] process > NFCORE_CIRCDNA:CIRCDNA:COLL... [100%] 2 of 2 ✔
[4b/7d6393] process > NFCORE_CIRCDNA:CIRCDNA:AMPL... [100%] 2 of 2 ✔
[50/bbe7f6] process > NFCORE_CIRCDNA:CIRCDNA:AMPL... [100%] 2 of 2 ✔
[b9/3ecfa3] process > NFCORE_CIRCDNA:CIRCDNA:AMPL... [100%] 2 of 2 ✔
[9b/ad753e] process > NFCORE_CIRCDNA:CIRCDNA:AMPL... [100%] 2 of 2 ✔
[08/c49e51] process > NFCORE_CIRCDNA:CIRCDNA:SUMM... [100%] 2 of 2 ✔
[7b/5f5ca5] process > NFCORE_CIRCDNA:CIRCDNA:SAMT... [100%] 2 of 2, failed: 2 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMB... -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BEDT... -
[9b/e4dd70] process > NFCORE_CIRCDNA:CIRCDNA:BEDT... [100%] 2 of 2 ✔
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRC... -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CUST... -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:MULTIQC -
-[nf-core/circdna] Sent summary e-mail to [email protected] (sendmail)-
-[nf-core/circdna] Pipeline completed with errors-
Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF (e06588d9bca6003c0f0ec945f109b30a)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF (e06588d9bca6003c0f0ec945f109b30a)` terminated with an error exit status (140)

Command executed:

  samtools sort -n -@ 6 -o e06588d9bca6003c0f0ec945f109b30a.qname.sorted.bam -T e06588d9bca6003c0f0ec945f109b30a.qname.sorted e06588d9bca6003c0f0ec945f109b30a.md.filtered.sorted.bam
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF":
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  140

Command output:
  (empty)

Command error:
  [bam_sort_core] merging from 504 files and 6 in-memory blocks...

Work dir:
  /gpfs/share/home/1710305101/testNextflow/circfindertest/work/c4/cd12e5099a0ac7eb86cac6f62d36e8

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Relevant files

nextflow.log

System information

nextflow version: 22.10.0.5826
hardware: HPC
executor: slurm
container: singularity 3.5.2
OS: Red Hat 4.8.5
version of nf-core/circdna: 1.0.1

CNVKIT_BATCH fails

Description of the bug

The pipeline runs through without problems (on the HPC cluster), but then at the cnvkit step it crashes, saying the file 'GRCh37_cnvkit_filtered_ref.cnn' doesn't exist. It does exist though.

Please assist as to what might be causing this issue. Thanks a lot!

Command used and terminal output

nextflow run nf-core/circdna -work-dir /path/to/wdir --outdir /path/to/results --genome GRCh37 -profile singularity --circle_identifier ampliconarchitect --aa_data_repo /path/to/nfcore_circdna --mosek_license_dir /path/to/nfcore_circdna/mosek --reference_build GRCh37 --max_cpus 64 --max_memory 200.GB --max_time 500.h -c /path/to/circdna.config -with-timeline /path/to/wdir/timeline.html --input path/to/samplesheet.csv


Output:

Command error:
 WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
 CNVkit 0.9.9
 Traceback (most recent call last):
   File "/usr/local/bin/cnvkit.py", line 9, in <module>
     args.func(args)
   File "/usr/local/lib/python3.9/site-packages/cnvlib/commands.py", line 118, in _cmd_batch
     ref_arr = read_cna(args.reference)
   File "/usr/local/lib/python3.9/site-packages/cnvlib/cmdutil.py", line 12, in read_cna
     return tabio.read(infile, into=CNA, sample_id=sample_id, meta=meta)
   File "/usr/local/lib/python3.9/site-packages/skgenome/tabio/__init__.py", line 74, in read
     dframe = reader(infile, **kwargs)
   File "/usr/local/lib/python3.9/site-packages/skgenome/tabio/tab.py", line 17, in read_tab
     dframe = pd.read_csv(infile, sep='\t', dtype={'chromosome': 'str'})
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 610, in read_csv
     return _read(filepath_or_buffer, kwds)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 462, in _read
     parser = TextFileReader(filepath_or_buffer, **kwds)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 819, in __init__
     self._engine = self._make_engine(self.engine)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 1050, in _make_engine
     return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 1867, in __init__
     self._open_handles(src, kwds)
   File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers.py", line 1362, in _open_handles
     self.handles = get_handle(
   File "/usr/local/lib/python3.9/site-packages/pandas/io/common.py", line 642, in get_handle
     handle = open(
 FileNotFoundError: [Errno 2] No such file or directory: '/path/to/aa_data_repo/GRCh37/GRCh37_cnvkit_filtered_ref.cnn'

Relevant files

No response

System information

nextflow version 21.10.6
on HPC cluster
container: singularity ( apptainer version 1.1.3-1.el7)
OS: Linux odcf-worker01 3.10.0-1160.76.1.el7.x86_64
nf-core/circdna 1.0.1

Bug in line 214 of the nextflow.config file

Description of the bug

When trying to run you circdna pipeline I get this error:

$ nextflow run nf-core/circdna --input samplesheet.csv --outdir results2 --genome GRCh38 --circle_identifier circle_map_realign,unicycler

NOTE: Nextflow is not tested with Java 1.8.0_121 -- It's recommended the use of version 11 up to 18

N E X T F L O W  ~  version 22.04.3
Pulling nf-core/circdna ...
Project config file is malformed -- Cause: Compile failed for sources FixedSetSources[name='/groovy/script/Script6453BF79C0729536D1DCD438CC9F093E/_nf_config_cbad999a']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
/groovy/script/Script6453BF79C0729536D1DCD438CC9F093E/_nf_config_cbad999a: 214: Unexpected character: '\'' @ line 214, column 23.
       version         = '1.0.0
                         ^

1 error

It turns out that a " ' " is missing at the end of the line 214 of that file.
version = '1.0.0

Command used and terminal output

$ nextflow run nf-core/circdna --input samplesheet.csv --outdir results2 --genome GRCh38 --circle_identifier circle_map_realign,unicycler

NOTE: Nextflow is not tested with Java 1.8.0_121 -- It's recommended the use of version 11 up to 18

N E X T F L O W  ~  version 22.04.3
Pulling nf-core/circdna ...
Project config file is malformed -- Cause: Compile failed for sources FixedSetSources[name='/groovy/script/Script6453BF79C0729536D1DCD438CC9F093E/_nf_config_cbad999a']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
/groovy/script/Script6453BF79C0729536D1DCD438CC9F093E/_nf_config_cbad999a: 214: Unexpected character: '\'' @ line 214, column 23.
       version         = '1.0.0
                         ^

1 error

Relevant files

No response

System information

Version: 22.04.3 build 5703
System: Linux 2.6.32-696.28.1.el6.x86_64
Runtime: Groovy 3.0.10 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b31
Encoding: UTF-8 (UTF-8)
No container engine used

No validation of --input_format

Description of the bug

When using circdna with AmpliconArchitect for a already aligned BAM file, i ran into the following problem:


Access to 'SAMTOOLS_INDEX_BAM.out' is undefined since the process 'SAMTOOLS_INDEX_BAM' has not been invoked before accessing the output attribute

-- Check script '.nextflow/assets/nf-core/circdna/./workflows/circdna.nf' at line: 302 or see '.nextflow.log' file for more details

It seems that it is due to --input_format being without capital letters.

I think it would be nice it if either the --input_format is validated and clearly stated that this should be with capital letters or maybe simply ignore case.

By the way thanks for implementing the tool! :)

Command used and terminal output

No response

Relevant files

No response

System information

No response

Access Denied for Amazon S3 download

Description of the bug

I am running circdna with the following command:
nextflow run nf-core/circdna --input samplesheet_bam.csv --outdir ./output --genome mm10 -profile singularity --circle_identifier ampliconarchitect

However, we have the following error code:

Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: M7TW9JYNH1PSHSAH; S3 Extended Request ID: Jy3SRFtmkJqSmeBhMrymGKyYRPU7uxsKFxKeP++PbPUL6B1Qf0dxzbbXicNiQa6AB0EPpKGPQsI=; Proxy: null)

Is there something to be specifically configured (aws s3 access?) to get circdna to run? Thanks

-- Isaac

Command used and terminal output

No response

Relevant files

No response

System information

No response

Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REALIGN

Description of the bug

Hello,

I'm reaching out regarding an intermittent error I've been encountering in my workflow. Specifically, I've observed that on occasion, a particular sample runs smoothly, while other times it fails. Additionally, upon rerunning the script, there are instances where it executes successfully.

It appears that the root of the problem may stem from the filename inconsistency. The file in question, "DNA.circular_read_candidates.bam," seems to occasionally be referenced incorrectly as "DNA.circular_read_candidates.circular_read_candidates.bam."

I would greatly appreciate any guidance or assistance you can provide in resolving this issue.

Command used and terminal output

ERROR ~ Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REALIGN (circDNA_cov15)'

Caused by:
  Missing output file(s) `*.bed` expected by process `NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REALIGN (circDNA_cov15)`

Command executed:

  circle_map.py \
      Realign \
       \
      -i circDNA_cov15.circular_read_candidates.sorted.bam \
      -qbam circDNA_cov15.qname.sorted.bam \
      -sbam circDNA_cov15.md.bam \
      -fasta genome.fasta \
      --threads 12 \
      -o circDNA_cov15_circularDNA_coordinates.bed

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REALIGN":
      Circle-Map: $(echo $(circle_map.py --help 2<&1 | grep -o "version=[0-9].[0-9].[0-9]" | sed 's/version=//g'))
  END_VERSIONS

Command exit status:
  0

Command output:
  2024-04-19 11:12:01: Realigning reads using Circle-Map

  2024-04-19 11:12:01: Clustering structural variant reads

  2024-04-19 11:12:09: Splitting clusters to to processors

  2024-04-19 11:12:22: An error happenend during execution. Exiting

Command error:
  [E::idx_find_and_load] Could not retrieve index file for 'circDNA_cov15.qname.sorted.bam'

    0%|          | 0/1200 [00:00<?, ?it/s]

    0%|          | 1/1200 [00:12<4:05:32, 12.29s/it]
    0%|          | 1/1200 [00:12<4:06:14, 12.32s/it]

  0it [00:12, ?it/s]

Work dir:
  /scratch/azabala/work_circdna_short/ed/9b33357cb56bccecc18bb2193ff18a

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

AmpliconArchitect should use AmpliconSuite-pipeline as mode of entry

Description of feature

Thank you Daniel and the rest of the nf-circdna team for putting together this nextflow pipeline!

From my understanding of the nf-circdna workflow file, https://github.com/nf-core/circdna/blob/master/workflows/circdna.nf#L395, the following steps are taken

  1. Run CNVKit (mode with matched normal not currently supported?)
  2. Feed .cns file into collect_seeds.py, which is ported from the PrepareAA.py script.
  3. Run amplified_intervals.py on the output bed from step 2.
  4. Run AmpliconArchitect.py on the output bed from step 3.

However, one important function from PrepareAA is missing in this workflow. PrepareAA.py calls a function between steps 2 and 3 from above called cnv_prefilter (line 828 of PrepareAA). This applies an additional set of filters needed before using amplified_intervals.py. This isn't reflected in the current workflow and for best reproducibility of AmpliconSuite-pipeline on nextflow, it should be included either by adding it in a custom way, as other parts of PrepareAA.py have been added, or by just using the PrepareAA.py wrapper script to take the .cns file, convert and filter it (prefilter + amplified_intervals.py).

It may be simplest to instead of breaking up the PrepareAA.py code into pieces, place AmpliconSuite-pipeline (PrepareAA) in the workflow, give the CNVKit .cns file to PrepareAA.py - let it make the seeds and then you can give the AA_CNV_SEEDS.bed file to AmpliconArchitect as is currently done.

I am happy to provide additional guidance on this if needed.

Thanks again,
Jens

How about to change circle-map to C++ version?

Description of feature

The original version of Circle-Map is a little bit slow, especially when it processes reads that contain a high percentage of ChrM. The C++ reimplementation of Circle-Map can deal with this situation and will be a faster alternative. Thus, It would be a good idea to use C++ version in the pipeline.

Test run fails with singularity `-r 1.0.1`

Description of the bug

Hi together,

I tried to run the circdna test with singularity nextflow run nf-core/circdna -r 1.0.1 -profile test,singularity --outdir test_circdna and it fails for me on the HPC cluster. Below you have the error messages:

Command used and terminal output

# HPC cluster

nextflow run nf-core/circdna -r 1.0.1 -profile test,singularity --outdir test_circdna

# error
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/circdna] Pipeline completed with errors-
WARN: Access to undefined parameter `singularity_pull_docker_container` -- Initialise it to a default value eg. `params.singularity_pull_docker_container = some_value`
Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)` terminated with an error exit status (127)

Command executed:

  check_samplesheet.py \
      samplesheet.csv \
      samplesheet.valid.csv \
      FASTQ
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  127

Command output:
  (empty)

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  WARNING: Skipping mount /fast/work/users/giurgium_c/miniconda/envs/nextflow/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  .command.sh: line 3: check_samplesheet.py: command not found
  INFO:    Cleaning up image...

Work dir:
  /fast/work/users/giurgium_c/amplicon-architect/TR14_nextflow/work/49/26322661fc9e58b820527ba24a85c7

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Relevant files

No response

System information

HPC cluster

Circexplorer2 Annotate

Description of feature

Hello,

Would it be possible to add the annotate module of circexplorer2?

genome.fa: No such file or directory

Description of the bug

It's strange to observe that commands to run samtools sort report errors related to a fasta file with "cram" labelled.

Command used and terminal output

[0c/beb229] process > NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_snu16_circleseq... [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CAT_FASTQ                                                     -
[80/3c44bc] process > NFCORE_CIRCDNA:CIRCDNA:FASTQC (SNU16-HMW)                                            [100%] 2 of 2, cached: 2 ✔
[19/813d2c] process > NFCORE_CIRCDNA:CIRCDNA:TRIMGALORE (SNU16-ZHI)                                        [100%] 2 of 2, cached: 2 ✔
[96/00ac24] process > NFCORE_CIRCDNA:CIRCDNA:BWA_INDEX (genome.fa)                                         [100%] 1 of 1, cached: 1 ✔
[7c/032898] process > NFCORE_CIRCDNA:CIRCDNA:BWA_MEM (SNU16-HMW)                                           [100%] 2 of 2, cached: 2 ✔
[c9/30c7d4] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_BAM (SNU16-HMW)                                [100%] 2 of 2, cached: 2 ✔
[71/4b7aaa] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS_RAW:SAMTOOLS_STATS (SNU16-HMW)             [100%] 2 of 2, cached: 2 ✔
[45/8b4bc4] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS_RAW:SAMTOOLS_FLAGSTAT (SNU16-HMW)          [100%] 2 of 2, cached: 2 ✔
[e1/ef10d2] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS_RAW:SAMTOOLS_IDXSTATS (SNU16-HMW)          [100%] 2 of 2, cached: 2 ✔
[48/4a0270] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:PICARD_MARKDUPLICATES (SNU16-HMW)      [100%] 2 of 2, cached: 2 ✔
[77/be7b27] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX (SNU16-HMW)             [100%] 2 of 2, cached: 2 ✔
[93/c0e657] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS (... [100%] 2 of 2, cached: 2 ✔
[5f/b87a29] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTA... [100%] 2 of 2, cached: 2 ✔
[29/8e65bd] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTAT... [100%] 2 of 2, cached: 2 ✔
[2e/b22b8c] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_VIEW_FILTER (SNU16-HMW)                              [100%] 2 of 2, cached: 2 ✔
[c8/a1b395] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED (SNU16-HMW)                            [100%] 2 of 2, failed: 2 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_FILTERED                                       -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMBLASTER                                                    -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BEDTOOLS_SPLITBAM2BED                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BEDTOOLS_SORTEDBAM2BED                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEFINDER                                                  -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CM                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_READEXTRACTOR                                       -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_RE                                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_RE                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REPEATS                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REALIGN                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCEXPLORER2_PARSE                                           -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS                                   -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:MULTIQC                                                       -
-[nf-core/circdna] Pipeline completed with errors-

Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED (SNU16-ZHI)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED (SNU16-ZHI)` terminated with an error exit status (1)

Command executed:

  samtools sort  -@ 12 -o SNU16-ZHI.md.filtered.sorted.bam -T SNU16-ZHI.md.filtered.sorted SNU16-ZHI.md.filtered.bam
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED":
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1412516-1418583

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418434-1418633

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418484-1418709

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418560-1418792

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
executor >  local (3)
[0c/beb229] process > NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_snu16_circleseq... [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CAT_FASTQ                                                     -
[80/3c44bc] process > NFCORE_CIRCDNA:CIRCDNA:FASTQC (SNU16-HMW)                                            [100%] 2 of 2, cached: 2 ✔
[19/813d2c] process > NFCORE_CIRCDNA:CIRCDNA:TRIMGALORE (SNU16-ZHI)                                        [100%] 2 of 2, cached: 2 ✔
[96/00ac24] process > NFCORE_CIRCDNA:CIRCDNA:BWA_INDEX (genome.fa)                                         [100%] 1 of 1, cached: 1 ✔
[7c/032898] process > NFCORE_CIRCDNA:CIRCDNA:BWA_MEM (SNU16-HMW)                                           [100%] 2 of 2, cached: 2 ✔
[c9/30c7d4] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_BAM (SNU16-HMW)                                [100%] 2 of 2, cached: 2 ✔
[71/4b7aaa] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS_RAW:SAMTOOLS_STATS (SNU16-HMW)             [100%] 2 of 2, cached: 2 ✔
[45/8b4bc4] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS_RAW:SAMTOOLS_FLAGSTAT (SNU16-HMW)          [100%] 2 of 2, cached: 2 ✔
[e1/ef10d2] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS_RAW:SAMTOOLS_IDXSTATS (SNU16-HMW)          [100%] 2 of 2, cached: 2 ✔
[48/4a0270] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:PICARD_MARKDUPLICATES (SNU16-HMW)      [100%] 2 of 2, cached: 2 ✔
[77/be7b27] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX (SNU16-HMW)             [100%] 2 of 2, cached: 2 ✔
[93/c0e657] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS (... [100%] 2 of 2, cached: 2 ✔
[5f/b87a29] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTA... [100%] 2 of 2, cached: 2 ✔
[29/8e65bd] process > NFCORE_CIRCDNA:CIRCDNA:MARK_DUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTAT... [100%] 2 of 2, cached: 2 ✔
[2e/b22b8c] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_VIEW_FILTER (SNU16-HMW)                              [100%] 2 of 2, cached: 2 ✔
[c8/a1b395] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED (SNU16-HMW)                            [100%] 2 of 2, failed: 2 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_FILTERED                                       -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMBLASTER                                                    -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BEDTOOLS_SPLITBAM2BED                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BEDTOOLS_SORTEDBAM2BED                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEFINDER                                                  -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CM                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_READEXTRACTOR                                       -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_RE                                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_RE                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REPEATS                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REALIGN                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCEXPLORER2_PARSE                                           -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS                                   -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:MULTIQC                                                       -
-[nf-core/circdna] Pipeline completed with errors-
Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED (SNU16-ZHI)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED (SNU16-ZHI)` terminated with an error exit status (1)

Command executed:

  samtools sort  -@ 12 -o SNU16-ZHI.md.filtered.sorted.bam -T SNU16-ZHI.md.filtered.sorted SNU16-ZHI.md.filtered.bam
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_FILTERED":
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1412516-1418583

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418434-1418633

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418484-1418709

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418560-1418792

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418643-1418864

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418715-1418957

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418809-1419032

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1418886-1425146

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1424998-1455718

  /data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa: No such file or directory
  [E::refs_load_fai] Failed to open reference file '/data3/wsx/nxf/work/29/2381d613e63ff7f2e52f9eabcba45e/genome.fa'
  [E::cram_get_ref] Failed to populate reference for id 0
  [E::cram_decode_slice] Unable to fetch reference #0:1455570-1483018

Work dir:
  /data3/wsx/nxf/work/80/d4633c50cd4e803a872c37d84b9696

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`


### Relevant files

_No response_

### System information


OS: CentOS
circdna version: dev
Executor: docker

error in NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS (1)

Hello,

  1. Thank you for making this really nice pipeline. When I am trying to test run AA with the following command on a LSF cluster node:

nextflow run nf-core/circdna -profile test_AA, singularity --outdir test_circdna

Most of the tests passed, but it failed at this dump software versions step (part of the error msg is listed below).

Will this affect the function of circdna, or this last step is for 'recording purposes' for the analysis.

  1. Another somewhat related question is regarding
If you are using singularity, please use the [nf-core download](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [NXF_SINGULARITY_CACHEDIR or singularity.cacheDir](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.

from the documentation: https://nf-co.re/circdna/1.0.1

As a newbie to nextflow/nf-core, I am wondering whether this step is necessary (and how to download using nf-core command?). Directly using the run command above, seems having run most of the tests for me (except the last step).

Thanks a lot!
Isaac

  1. Error message:
[6b/ca555a] process > NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS (1)                      [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:MULTIQC                                              -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/circdna] Pipeline completed with errors-
Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS (1)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS (1)` terminated with an error exit status (1)

Command executed [/home/isaac/.nextflow/assets/nf-core/circdna/./workflows/../modules/nf-core/modules/custom/dumpsoftwareversions/te
mplates/dumpsoftwareversions.py]:

  #!/usr/bin/env python

  import yaml
  import platform
  from textwrap import dedent


  def _make_versions_html(versions):
      html = [
          dedent(
              """\
              <style>
              #nf-core-versions tbody:nth-child(even) {
                  background-color: #f2f2f2;
              }
              </style>
              <table class="table" style="width:100%" id="nf-core-versions">
                  <thead>
                      <tr>
                          <th> Process Name </th>
                          <th> Software </th>
                          <th> Version  </th>
                      </tr>
                  </thead>
              """
          )
      ]
      for process, tmp_versions in sorted(versions.items()):
          html.append("<tbody>")
          for i, (tool, version) in enumerate(sorted(tmp_versions.items())):
              html.append(
                  dedent(
                      f"""\
                      <tr>
                          <td><samp>{process if (i == 0) else ''}</samp></td>
                          <td><samp>{tool}</samp></td>
                          <td><samp>{version}</samp></td>
                      </tr>
                      """
                  )
              )
          html.append("</tbody>")
      html.append("</table>")
      return "\n".join(html)


  versions_this_module = {}
  versions_this_module["NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS"] = {
      "python": platform.python_version(),
      "yaml": yaml.__version__,
  }

  with open("collated_versions.yml") as f:
      versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) | versions_this_module

  # aggregate versions by the module name (derived from fully-qualified process name)
  versions_by_module = {}
  for process, process_versions in versions_by_process.items():
      module = process.split(":")[-1]
      try:
          assert versions_by_module[module] == process_versions, (
              "We assume that software versions are the same between all modules. "
              "If you see this error-message it means you discovered an edge-case "
              "and should open an issue in nf-core/tools. "
          )
      except KeyError:
          versions_by_module[module] = process_versions

  versions_by_module["Workflow"] = {

AmpliconArchitect related paths must be absolute not relative

Description of the bug

The AmpliconArchitect related processes use this structure to set an environment variable:
export AA_DATA_REPO=${params.aa_data_repo}
without staging the files in that path into the work directory.
If given an absolute path, this works, but if a relative path is given then the pipeline will fall over with a non-obvious failure message ending in :

ERROR:root:#TIME 1679058301.012 interval_list: Unable to open interval file "data_repo/GRCh38/". Traceback (most recent call last): File "~/.nextflow/assets/nf-core/circdna/bin/amplified_intervals.py", line 154, in <module> bamFileb2b = b2b.bam_to_breakpoint(bamFile, coverage_stats=cstats) File "~/.nextflow/assets/nf-core/circdna/bin/bam_to_breakpoint.py", line 129, in __init__ self.median_coverage(window_list=coverage_windows) File "~/.nextflow/assets/nf-core/circdna/bin/bam_to_breakpoint.py", line 540, in median_coverage coverage_stats_file = open(hg.DATA_REPO + "/coverage.stats", "a") IOError: [Errno 2] No such file or directory: 'data_repo/coverage.stats'

I see several ways to fix this:

  • Ensure that an absolute path is provided, and fail otherwise.
  • Convert the relative path to an absolute path.
  • Pass the folder into the folder to stage it (this would be my preferred solution, and possibly is required for other cases than just giving a relative path rather than an absolute one).

Command used and terminal output

No response

Relevant files

No response

System information

No response

make --input_format a required argument when running local command line

Description of feature

I ran
nextflow run nf-core/circdna -r dev --input colo320dm_samplesheet.csv --outdir colo320dmlc_nftest --genome GRCh38 -profile docker --circle_identifier ampliconarchitect --mosek_license_dir $HOME/mosek --aa_data_repo $AA_DATA_REPO --reference_build GRCh38 --fasta $AA_DATA_REPO/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa

on my local machine, however the module assumed by default an --input_format FASTQ when my samplesheet.csv was

sample,bam
COLO320DM_LC,/home/jens/Desktop/research/bams/COLO320DM_Hung2021.bam

and the pipeline crashed in a way that wasn't exactly clear why until I re-visited the web-form and saw that --input_format was a required field on the web form. Can we update so that --input_format is either required on the command-line or deduced from the samplesheet?

Thanks!
Jens

Pipeline test is not working

Description of the bug

The test profile is not working, appears to need samtools I think

Command used and terminal output

(base) [feshap@n102 ecdna]$ ~/raid/nextflow run nf-core/circdna --outdir ./nf-circ_res/test -profile test
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/circdna` [special_lalande] DSL2 - revision: 8e0e14c84f [master]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/circdna v1.1-g8e0e14c
------------------------------------------------------
Core Nextflow options
  revision                  : master
  runName                   : special_lalande
  launchDir                 : /net/mraid14/export/tgdata/users/yonshap/proj/ecdna
  workDir                   : /net/mraid14/export/tgdata/users/yonshap/proj/ecdna/work
  projectDir                : /home/feshap/.nextflow/assets/nf-core/circdna
  userName                  : feshap
  profile                   : test
  configFiles               :

Input/output options
  input                     : https://raw.githubusercontent.com/nf-core/test-datasets/circdna/samplesheet/samplesheet.csv
  input_format              : FASTQ
  outdir                    : ./nf-circ_res/test

Circular DNA identifier options
  circle_identifier         : circexplorer2,circle_finder,circle_map_realign,circle_map_repeats,unicycler

Reference genome options
  fasta                     : https://raw.githubusercontent.com/nf-core/test-datasets/circdna/reference/genome.fa
  igenomes_ignore           : true

ampliconarchitect options
  reference_build           : GRCh38

Institutional config options
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

Max job request options
  max_cpus                  : 2
  max_memory                : 6.GB
  max_time                  : 6.h

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/circdna for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.7712010

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/circdna/blob/master/CITATIONS.md

executor >  local (3)
[8a/932aee] process > NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)                [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CAT_FASTQ                                                      -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:FASTQC                                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:TRIMGALORE                                                     -
[8e/b0a417] process > NFCORE_CIRCDNA:CIRCDNA:BWA_INDEX (genome.fa)                                          [  0%] 0 of 1
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BWA_MEM                                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_BAM                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT                           -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS                           -
[d7/61c691] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_FAIDX (genome.fa)                                     [  0%] 0 of 1
executor >  local (3)
[8a/932aee] process > NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)                [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CAT_FASTQ                                                      -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:FASTQC                                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:TRIMGALORE                                                     -
[8e/b0a417] process > NFCORE_CIRCDNA:CIRCDNA:BWA_INDEX (genome.fa)                                          [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BWA_MEM                                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_BAM                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT                           -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS                           -
[d7/61c691] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_FAIDX (genome.fa)                                     [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:PICARD_MARKDUPLICATES                -
executor >  local (3)
[8a/932aee] process > NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)                [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CAT_FASTQ                                                      -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:FASTQC                                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:TRIMGALORE                                                     -
[8e/b0a417] process > NFCORE_CIRCDNA:CIRCDNA:BWA_INDEX (genome.fa)                                          [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BWA_MEM                                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_BAM                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT                           -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS                           -
[d7/61c691] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_FAIDX (genome.fa)                                     [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:PICARD_MARKDUPLICATES                -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX                       -
executor >  local (3)
[8a/932aee] process > NFCORE_CIRCDNA:CIRCDNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)                [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CAT_FASTQ                                                      -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:FASTQC                                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:TRIMGALORE                                                     -
[8e/b0a417] process > NFCORE_CIRCDNA:CIRCDNA:BWA_INDEX (genome.fa)                                          [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BWA_MEM                                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_BAM                                             -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT                           -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS                           -
[d7/61c691] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_FAIDX (genome.fa)                                     [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:PICARD_MARKDUPLICATES                -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX                       -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS    -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BAM_MARKDUPLICATES_PICARD:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CF                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMBLASTER                                                     -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BEDTOOLS_SPLITBAM2BED                                          -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:BEDTOOLS_SORTEDBAM2BED                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEFINDER                                                   -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_QNAME_CM                                         -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_READEXTRACTOR                                        -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_SORT_RE                                               -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_INDEX_RE                                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REPEATS                                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCLEMAP_REALIGN                                              -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CIRCEXPLORER2_PARSE                                            -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:UNICYCLER                                                      -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:SEQTK_SEQ                                                      -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:GETCIRCULARREADS                                               -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:MINIMAP2_ALIGN                                                 -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:CUSTOM_DUMPSOFTWAREVERSIONS                                    -
[-        ] process > NFCORE_CIRCDNA:CIRCDNA:MULTIQC                                                        -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/circdna] Pipeline completed with errors-
WARN: Access to undefined parameter `enable_conda` -- Initialise it to a default value eg. `params.enable_conda = some_value`
ERROR ~ Error executing process > 'NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_FAIDX (genome.fa)'

Caused by:
  Process `NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_FAIDX (genome.fa)` terminated with an error exit status (127)

Command executed:

  samtools \
      faidx \
      genome.fa \


  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCDNA:CIRCDNA:SAMTOOLS_FAIDX":
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 3: samtools: command not found

Work dir:
  /net/mraid14/export/tgdata/users/yonshap/proj/ecdna/work/d7/61c691844c59c23edbd02b587fc0a8

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

Latest default circdna version is 1.0.0 and not 1.0.1

Description of the bug

By default the latest version of circdna is 1.0.0 not 1.0.1. Could you please update the latest version tag?
It might be confusing.
Thank you.

nextflow pull nf-core/circdna
Checking nf-core/circdna ...
 done - revision: b0152a3629 [1.0.0]

nextflow.config contains the version 1.0dev :

manifest {
    name            = 'nf-core/circdna'
    author          = 'Daniel Schreyer'
    homePage        = 'https://github.com/nf-core/circdna'
    description     = 'Pipeline for the identification of circular DNAs'
    mainScript      = 'main.nf'
    nextflowVersion = '!>=21.10.3'
    version         = '1.0dev'
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Make bam marking duplicates optional when bams are provided

Description of feature

When bam files are provided, the pipeline runs Picard MarkDuplicates, but this may well have already been performed upstream.
For instance, I have bam files with Unique Molecular Identifiers on the reads, which I have already deduplicated in another pipeline, so Picard will mark reads as duplicated even though the UMI says that they come from different unique molecules.

--bwa_index: string [database/indexes/GRCm38/BWA/genome.bwt] does not match pattern ^\S+\.\{amb,ann,bwt,pac,sa\}$ (database/indexes/GRCm38/BWA/genome.bwt)

Description of the bug

Hi! I'm using nf-core/circdna with the following configuration:

nextflow run nf-core/circdna \
-r 1.0.4 \
-profile docker \
-resume \
--max_cpus 9 \
--max_memory 21.GB \
--max_time 500.h \
--circle_identifier circle_map_realign,circle_map_repeats,circle_finder,circexplorer2,ampliconarchitect \
--input work/test_mouse/samplesheets/CIRCDNA.csv \
--outdir results/test_mouse/CIRCDNA \
--genome GRCm38 \
--bwa_index database/indexes/GRCm38/BWA/genome.bwt \
--reference_build mm10 \
--mosek_license_dir src/others \
--fasta database/genomes/GRCm38/genome.fasta \
--aa_data_repo database/indexes/GRCm38/aa_data_repo

And it throws this error:

ERROR ~ ERROR: Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
ERROR ~ * --bwa_index: string [database/indexes/GRCm38/BWA/genome.bwt] does not match pattern ^\S+\.\{amb,ann,bwt,pac,sa\}$ (database/indexes/GRCm38/BWA/genome.bwt)

I've been searching and I think that the regex pattern might be wrong and instead it is ^\S+\.(amb|ann|bwt|pac|sa)$ (at least according to chatgpt and tested in https://regex101.com/).

Command used and terminal output

No response

Relevant files

No response

System information

No response

ERROR ~ Argument of `file` function cannot be null

Description of the bug

Hi - first time nextflow and circdna user. I am trying to run circdna on a tumor BAM file that has already been processed. I followed the instructions in the README and encountered the error in the title.

I have tested the pipeline using the -profile test_AA,singularity flag and it completed without issue.

The only thing I could find on google suggested this might be a nextflow version issue, but my nextflow is up to date. I am running the pipeline from a conda environment that was created for nextflow.

Thanks in advance for the help, and apologies if this is a trivial fix!

Command used and terminal output

nextflow run nf-core/circdna --input samplesheet.csv --outdir results --genome GRCh38 -profile singularity --circle_identifier ampliconarchitect
N E X T F L O W  ~  version 23.04.1
Launching `https://github.com/nf-core/circdna` [nice_brown] DSL2 - revision: 09a5015cf8 [master]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/circdna v1.0.4-g09a5015
------------------------------------------------------
Core Nextflow options
  revision         : master
  runName          : nice_brown
  containerEngine  : singularity
  launchDir        : /scratch/users/tbencomo/aus-wgs
  workDir          : /scratch/users/tbencomo/aus-wgs/work
  projectDir       : /home/users/tbencomo/.nextflow/assets/nf-core/circdna
  userName         : tbencomo
  profile          : singularity
  configFiles      : /home/users/tbencomo/.nextflow/assets/nf-core/circdna/nextflow.config

Input/output options
  input            : samplesheet.csv
  outdir           : results

Reference genome options
  genome           : GRCh38
  fasta            : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa

Circular DNA identifier options
  circle_identifier: ampliconarchitect

ampliconarchitect options
  reference_build  : null

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/circdna for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.7712010

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/circdna/blob/master/CITATIONS.md
------------------------------------------------------
ERROR ~ Argument of `file` function cannot be null

 -- Check script '/home/users/tbencomo/.nextflow/assets/nf-core/circdna/./workflows/circdna.nf' at line: 57 or see '.nextflow.log' file for more details

Relevant files

Jul-24 21:54:57.292 [main] DEBUG nextflow.cli.Launcher - $> nextflow run nf-core/circdna --input samplesheet.csv --outdir results --genome GRCh38 -profile singularity --circle_identifier ampliconarchitect
Jul-24 21:54:57.481 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 23.04.1
Jul-24 21:54:57.515 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/users/tbencomo/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
Jul-24 21:54:57.541 [main] INFO org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Jul-24 21:54:57.543 [main] INFO org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Jul-24 21:54:57.545 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Jul-24 21:54:57.580 [main] INFO org.pf4j.AbstractPluginManager - No plugins
Jul-24 21:54:57.596 [main] DEBUG nextflow.scm.ProviderConfig - Using SCM config path: /home/users/tbencomo/.nextflow/scm
Jul-24 21:54:59.826 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/users/tbencomo/.nextflow/assets/nf-core/circdna/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/circdna.git
Jul-24 21:54:59.855 [main] DEBUG nextflow.scm.RepositoryFactory - Found Git repository result: [RepositoryFactory]
Jul-24 21:54:59.885 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/users/tbencomo/.nextflow/assets/nf-core/circdna/.git/config; branch: master; remote: origin; url: https://github.com/nf-core/circdna.git
Jul-24 21:55:01.106 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /home/users/tbencomo/.nextflow/assets/nf-core/circdna/nextflow.config
Jul-24 21:55:01.127 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/users/tbencomo/.nextflow/assets/nf-core/circdna/nextflow.config
Jul-24 21:55:01.160 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: singularity
Jul-24 21:55:03.593 [main] DEBUG nextflow.config.ConfigBuilder - Available config profiles: [cfc_dev, ifb_core, denbi_qbic, alice, mjolnir_globe, uppmax, incliva, uge, rosalind_uge, lugh, unibe_ibu, vai, czbiohub_aws, jax, ccga_med, scw, tigem, tubingen_apg, google, ipop_up, googlels, eddie, medair, biowulf, apptainer, bi, bigpurple, sbc_sharc, adcra, cedars, vsc_kul_uhasselt, pawsey_nimbus, ucl_myriad, utd_ganymede, charliecloud, icr_davros, ceres, munin, arm, rosalind, hasta, cfc, uzh, ebi_codon_slurm, ebc, ku_sund_dangpu, ccga_dx, crick, marvin, biohpc_gen, shifter, mana, mamba, wehi, awsbatch, imperial, maestro, genotoul, abims, janelia, nu_genomics, googlebatch, oist, sahmri, mpcdf, leicester, vsc_ugent, sage, cambridge, podman, ebi_codon, cheaha, xanadu, test, computerome, seg_globe, sanger, dkfz, pasteur, test_full, azurebatch, hki, crukmi, docker, engaging, gis, hypatia, psmn, eva, fgcz, conda, crg, singularity, test_AA, uw_hyak_pedslabs, prince, utd_sysbio, debug, genouest, cbe, phoenix, gitpod, seawulf, fub_curta, uct_hpc, aws_tower, binac]
Jul-24 21:55:03.701 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
Jul-24 21:55:03.701 [main] INFO nextflow.cli.CmdRun - Launching https://github.com/nf-core/circdna [nice_brown] DSL2 - revision: 09a5015 [master]
Jul-24 21:55:03.703 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Jul-24 21:55:03.703 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
Jul-24 21:55:03.705 [main] DEBUG nextflow.secret.LocalSecretsProvider - Secrets store: /home/users/tbencomo/.nextflow/secrets/store.json
Jul-24 21:55:03.708 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@3005623b] - activable => nextflow.secret.LocalSecretsProvider@3005623b
Jul-24 21:55:03.819 [main] DEBUG nextflow.Session - Session UUID: f51fdcac-7cc0-43d7-beb2-a0adec6639e2
Jul-24 21:55:03.819 [main] DEBUG nextflow.Session - Run name: nice_brown
Jul-24 21:55:03.820 [main] DEBUG nextflow.Session - Executor pool size: 2
Jul-24 21:55:03.830 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jul-24 21:55:03.893 [main] DEBUG nextflow.cli.CmdRun -
Version: 23.04.1 build 5866
Created: 15-04-2023 06:51 UTC (14-04-2023 23:51 PDT)
System: Linux 3.10.0-1160.92.1.el7.x86_64
Runtime: Groovy 3.0.16 on OpenJDK 64-Bit Server VM 11.0.13+7-b1751.21
Encoding: UTF-8 (UTF-8)
Process: [email protected] [10.18.1.57]
CPUs: 1 - Mem: 4 GB (811.9 MB) - Swap: 0 (0)
Jul-24 21:55:03.952 [main] DEBUG nextflow.Session - Work-dir: /scratch/users/tbencomo/aus-wgs/work [lustre]
Jul-24 21:55:03.987 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Jul-24 21:55:04.023 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Jul-24 21:55:04.183 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Jul-24 21:55:04.205 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 2; maxThreads: 1000
Jul-24 21:55:04.405 [main] DEBUG nextflow.Session - Session start
Jul-24 21:55:04.410 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow started -- trace file: /scratch/users/tbencomo/aus-wgs/results/pipeline_info/execution_trace_2023-07-24_21-55-02.txt
Jul-24 21:55:04.444 [main] DEBUG nextflow.Session - Using default localLib path: /home/users/tbencomo/.nextflow/assets/nf-core/circdna/lib
Jul-24 21:55:04.449 [main] DEBUG nextflow.Session - Adding to the classpath library: /home/users/tbencomo/.nextflow/assets/nf-core/circdna/lib
Jul-24 21:55:04.450 [main] DEBUG nextflow.Session - Adding to the classpath library: /home/users/tbencomo/.nextflow/assets/nf-core/circdna/lib/nfcore_external_java_deps.jar
Jul-24 21:55:06.587 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jul-24 21:55:06.659 [main] INFO nextflow.Nextflow -


                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | |__ __ / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,._,'
nf-core/circdna v1.0.4-g09a5015

Core Nextflow options
revision : master
runName : nice_brown
containerEngine : singularity
launchDir : /scratch/users/tbencomo/aus-wgs
workDir : /scratch/users/tbencomo/aus-wgs/work
projectDir : /home/users/tbencomo/.nextflow/assets/nf-core/circdna
userName : tbencomo
profile : singularity
configFiles : /home/users/tbencomo/.nextflow/assets/nf-core/circdna/nextflow.config

Input/output options
input : samplesheet.csv
outdir : results

Reference genome options
genome : GRCh38
fasta : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa

Circular DNA identifier options
circle_identifier: ampliconarchitect

ampliconarchitect options
reference_build : null

!! Only displaying parameters that differ from the pipeline defaults !!

If you use nf-core/circdna for your analysis please cite:


Jul-24 21:55:08.408 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-amazon version: 1.16.2
Jul-24 21:55:08.469 [main] INFO org.pf4j.AbstractPluginManager - Plugin '[email protected]' resolved
Jul-24 21:55:08.469 [main] INFO org.pf4j.AbstractPluginManager - Start plugin '[email protected]'
Jul-24 21:55:08.509 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started [email protected]
Jul-24 21:55:08.547 [main] DEBUG nextflow.file.FileHelper - > Added 'S3FileSystemProvider' to list of installed providers [s3]
Jul-24 21:55:08.547 [main] DEBUG nextflow.file.FileHelper - Started plugin 'nf-amazon' required to handle file: s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa
Jul-24 21:55:08.562 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Jul-24 21:55:08.569 [main] DEBUG nextflow.Global - Using AWS credential defined in default section in file: /home/users/tbencomo/.aws/credentials
Jul-24 21:55:08.572 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key=4F+oMa.., region=us-east-1, max_error_retry=5, access_key=AKIAR6..}
Jul-24 21:55:09.251 [main] DEBUG com.upplication.s3fs.AmazonS3Client - Setting S3 glacierRetrievalTier=null
Jul-24 21:55:11.333 [main] DEBUG nextflow.Session - Session aborted -- Cause: Argument of file function cannot be null
Jul-24 21:55:11.372 [main] DEBUG nextflow.Session - The following nodes are still active:
[operator] map
[operator] collect

Jul-24 21:55:11.389 [main] ERROR nextflow.cli.Launcher - Argument of file function cannot be null
java.lang.IllegalArgumentException: Argument of file function cannot be null
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:72)
at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:59)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:59)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:277)
at nextflow.Nextflow.file(Nextflow.groovy:99)
at nextflow.Nextflow.file(Nextflow.groovy)
at nextflow.Nextflow$file$0.callStatic(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallStatic(CallSiteArray.java:55)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:217)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callStatic(AbstractCallSite.java:231)
at Script_61043458.runScript(Script_61043458:57)
at nextflow.script.BaseScript.run0(BaseScript.groovy:145)
at nextflow.script.BaseScript.run(BaseScript.groovy:192)
at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:229)
at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:215)
at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:205)
at nextflow.script.IncludeDef.memoizedMethodPriv$loadModule0PathMapSession(IncludeDef.groovy:151)
at nextflow.script.IncludeDef.access$0(IncludeDef.groovy)
at nextflow.script.IncludeDef$__clinit__closure2.doCall(IncludeDef.groovy)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
at groovy.lang.Closure.call(Closure.java:412)
at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.lambda$call$0(Memoize.java:137)
at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:137)
at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentCommonCache.java:113)
at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.call(Memoize.java:136)
at nextflow.script.IncludeDef.loadModule0(IncludeDef.groovy)
at nextflow.script.IncludeDef.load0(IncludeDef.groovy:123)
at nextflow.script.IncludeDef$load0$1.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
at Script_dc663737.runScript(Script_dc663737:36)
at nextflow.script.BaseScript.run0(BaseScript.groovy:145)
at nextflow.script.BaseScript.run(BaseScript.groovy:192)
at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:229)
at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:224)
at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:130)
at nextflow.cli.CmdRun.run(CmdRun.groovy:368)
at nextflow.cli.Launcher.run(Launcher.groovy:494)
at nextflow.cli.Launcher.main(Launcher.groovy:653)

System information

Nextflow: 23.04.1
Hardware: HPC
Executor: Local
Container: Singularity
OS: Cent OS Linux
Version: v1.0.4-g09a5015

bwa index not working

Description of the bug

The bwa index option pattern is not matching. https://nf-co.re/circdna/1.0.1/parameters#bwa_index It seems .alt file is not bwa index output.

-rwxrwxrwx  1 GuoHuangChun GuoHuangChun 2.9G Aug 13  2022 genome.fa
-rw-rw-r--  1 wsx          wsx           17K Feb 15 17:39 genome.fa.amb
-rw-rw-r--  1 wsx          wsx           954 Feb 15 17:39 genome.fa.ann
-rw-rw-r--  1 wsx          wsx          2.9G Feb 15 17:38 genome.fa.bwt
-rwxrwxrwx  1 zhaoqi       zhaoqi       1.1K Feb 15 16:57 genome.fa.fai
-rw-rw-r--  1 wsx          wsx          737M Feb 15 17:39 genome.fa.pac
-rw-rw-r--  1 wsx          wsx          1.5G Feb 15 17:55 genome.fa.sa

Command used and terminal output

Launching `https://github.com/nf-core/circdna` [mighty_saha] DSL2 - revision: 4f03e2641b [dev]

ERROR: Validation of pipeline parameters failed!


* --bwa_index: string [/data1/database/human/hg38/genome.fa.alt] does not match pattern ^\S+\.\{alt,amb,ann,bwt,pac,sa\}$ (/data1/database/human/hg38/genome.fa.alt)


### Relevant files

_No response_

### System information

_No response_

reference_build parameter doesn't accept 'mm10' as input

Description of the bug

Running the following command:

nextflow run nf-core/circdna -r 1.0.1 --input samplesheet_bam.csv --outdir ./output --genome mm10 -profile singularity --circle_identifier ampliconarchitect --aa_data_repo ./aa_data_repo --fasta ... --mosek_license_dir ~/.config/mosek --reference_build mm10

will generate the error:

Reference Build not given! Please specify --reference_build 'hg19', 'GRCh38', or 'GRCh37'.

The code checking this condition is at:

if (params.reference_build != "hg19" & params.reference_build != "GRCh38" & params.reference_build != "GRCh37"){

This condition probably needs to be updated to allow the pipeline to run on mm10 mouse samples.

Thanks

Command used and terminal output

No response

Relevant files

No response

System information

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.