Giter VIP home page Giter VIP logo

hic's Introduction

nf-core/hic nf-core/hic

AWS CICite with Zenodo

Nextflow run with conda run with docker run with singularity Launch on Nextflow Tower

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/hic is a bioinformatics best-practice analysis pipeline for Analysis of Chromosome Conformation Capture data (Hi-C).

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources.The results obtained from the full-sized test can be viewed on the nf-core website.

Pipeline summary

  1. Read QC (FastQC)
  2. Hi-C data processing
    1. HiC-Pro
      1. Mapping using a two steps strategy to rescue reads spanning the ligation sites (bowtie2)
      2. Detection of valid interaction products
      3. Duplicates removal
      4. Generate raw and normalized contact maps (iced)
  3. Create genome-wide contact maps at various resolutions (cooler)
  4. Contact maps normalization using balancing algorithm (cooler)
  5. Export to various contact maps formats (HiC-Pro, cooler)
  6. Quality controls (HiC-Pro, HiCExplorer)
  7. Compartments calling (cooltools)
  8. TADs calling (HiCExplorer, cooltools)
  9. Quality control report (MultiQC)

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2
HIC_ES_4,SRR5339783_1.fastq.gz,SRR5339783_2.fastq.gz

Each row represents a pair of fastq files (paired end). Now, you can run the pipeline using:

nextflow run nf-core/hic \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --genome GRCh37 \
   --outdir <OUTDIR>

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/hic was originally written by Nicolas Servant.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #hic channel (you can join with this invite).

Citations

If you use nf-core/hic for your analysis, please cite it using the following doi: doi: 10.5281/zenodo.2669512

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

hic's People

Contributors

adamrtalbot avatar apeltzer avatar edmundmiller avatar ewels avatar jzohren avatar kevinmenden avatar maxulysse avatar nf-core-bot avatar nservant avatar plarosa avatar robomics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hic's Issues

parameters format

Thanks! It's one of those unwritten rules that we've discussed and implemented to various degrees in some of the main pipelines but haven't had the time to officially document 😅 You could change the other parameters too? If it helps, if things do change then I'll have to update numerous pipelines! I've also updated the template today to change the last remaining parameter to this format...
https://github.com/nf-core/tools/pull/425/files#diff-4ac32a78649ca5bdd8e0ba38b7006a1eR13

Originally posted by @drpatelh in #37

Error when running workflow

I got this error message in one of the steps

executor > local (16)
[90/a19a61] process > get_software_versions [100%] 1 of 1 ✔
[65/61c1e5] process > makeChromSize (virilis.SPolished.asm.wengan.fasta) [100%] 1 of 1 ✔
[18/b98020] process > getRestrictionFragments (virilis.SPolished.asm.wengan.fasta [A^AGCTT]) [100%] 1 of 1 ✔
executor > local (16)
[90/a19a61] process > get_software_versions [100%] 1 of 1 ✔
[65/61c1e5] process > makeChromSize (virilis.SPolished.asm.wengan.fasta) [100%] 1 of 1 ✔
[18/b98020] process > getRestrictionFragments (virilis.SPolished.asm.wengan.fasta [A^AGCTT]) [100%] 1 of 1 ✔
[5d/0c23ab] process > bowtie2_end_to_end (SRR1536175_1) [100%] 2 of 2 ✔
[22/1671da] process > trim_reads (SRR1536175_1) [100%] 2 of 2 ✔
[93/d338c7] process > bowtie2_on_trimmed_reads (SRR1536175_1) [100%] 2 of 2 ✔
[db/1e8ef6] process > merge_mapping_steps (SRR1536175 = SRR1536175_1.bam + SRR1536175_1_trimmed.bam) [100%] 2 of 2 ✔
[d7/7c9b3b] process > combine_mapped_files (input.1 = input.1 + null) [100%] 2 of 2, failed: 2 ✘
[- ] process > get_valid_interaction -
[- ] process > remove_duplicates -
[c4/4fc814] process > merge_sample (mmapstat) [100%] 2 of 2
[- ] process > build_contact_maps -
[- ] process > run_ice -
[- ] process > generate_cool -
[- ] process > multiqc -
[30/204d24] process > output_documentation (1) [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'combine_mapped_files (SRR1536175 = SRR1536175_1 + null)'

Caused by:
Process `combine_mapped_files (SRR1536175 = SRR1536175_1 + null)` terminated with an error exit status (1)

Command executed:

mergeSAM.py -f SRR1536175_1_bwt2merged.bam -r null -o SRR1536175_bwt2pairs.bam -t -q 10

Command exit status:
1

Command output:
(empty)

Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
[E::hts_open_format] Failed to open file null
Traceback (most recent call last):
File "/home/mgabriel/.nextflow/assets/nf-core/hic/bin/mergeSAM.py", line 222, in <module>
with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2:
File "pysam/libcalignmentfile.pyx", line 736, in pysam.libcalignmentfile.AlignmentFile.__cinit__
File "pysam/libcalignmentfile.pyx", line 935, in pysam.libcalignmentfile.AlignmentFile._open
IOError: [Errno 2] could not open alignment file `null`: No such file or directory

Work dir:
/data/mgabriel/wengan/next/work/83/3d5ae4d7e50b043d86dfef8195fcf3

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

My command was

nextflow run nf-core/hic -profile docker --reads /home/mgabriel/Downloads/data/wengan/*_{1,2}.fastq.gz --fasta /home/mgabriel/Downloads/data/wengan/virilis.SPolished.asm.wengan.fasta --bwt2_index /home/mgabriel/Downloads/data/wengan/hic/wengan_vir --igenomesIgnore

Any thoughts?

add a parameter for sort -T

Sorting Hi-C data usually requires to swap in the tmp dir.
So far, it fixed to /tmp/, but in the futur it might be good to let the user specify a temporary folder.

Usage page

I am not sure if you actually need all the description of the different parameters since this is already in the nextflow_schema.json => less redunancy / outdated descriptions. If you want to list different options for the two different pipelines I would just list them without definitions or defaults (if you want you could even link to them, but might be tricky with different versions...)

Originally posted by @mashehu in #101 (comment)

conda environment error

Hi, I have an error while trying to run the hic pipeline on the test data with the conda environment.

Looks like a python module is missing.

Thanks for your help

$ ./nextflow  run nf-core/hic -profile test,conda
N E X T F L O W  ~  version 21.04.3
Launching `nf-core/hic` [distracted_swanson] - revision: ac74763a91 [master]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/hic v1.3.0
------------------------------------------------------

WARN: The `into` operator should be used to connect two or more target channels -- consider to replace it with `.set { fasta_for_index }`
Core Nextflow options
  revision                     : master
  runName                      : distracted_swanson
  container                    : nfcore/hic:1.3.0
  launchDir                    : /env/export/v_bigtmp/eb/test
  workDir                      : /env/export/v_bigtmp/eb/test/work
  projectDir                   : /home/bonnet/.nextflow/assets/nf-core/hic
  userName                     : bonnet
  profile                      : test,conda
  configFiles                  : /home/bonnet/.nextflow/assets/nf-core/hic/nextflow.config

Input/output options
  input_paths                  : [[SRR4292758_00, [https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R1.fastq.gz, https://github.com/nf-core/test-datasets/raw/hic/data/SRR4292758_00_R2.fastq.gz]]]

Reference genome options
  fasta                        : https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa

Digestion Hi-C
  restriction_site             : A^AGCTT
  ligation_site                : AAGCTAGCTT

DNAse Hi-C
  min_cis_dist                 : 0

Alignments
  bwt2_opts_end2end            : --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
  bwt2_opts_trimmed            : --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

Valid Pairs Detection
  max_insert_size              : 600
  min_insert_size              : 100
  max_restriction_fragment_size: 100000
  min_restriction_fragment_size: 100

Contact maps
  bin_size                     : 1000
  ice_filter_high_count_perc   : 0

Downstream Analysis
  res_dist_decay               : 1000
  tads_caller                  : insulation,hicexplorer
  res_tads                     : 1000
  res_compartments             : 1000

Generic options
  max_multiqc_email_size       : 25 MB

Max job request options
  max_cpus                     : 2
  max_memory                   : 4 GB
  max_time                     : 1h

Institutional config options
  config_profile_name          : Hi-C test data from Schalbetter et al. (2017)
  config_profile_description   : Minimal test dataset to check pipeline function

------------------------------------------------------
 Only displaying parameters that differ from defaults.
------------------------------------------------------
executor >  local (29)
[c1/b39a04] process > get_software_versions                                                                                      [100%] 1 of 1 ✔
[9b/c38a4f] process > makeBowtie2Index (W303_SGD_2015_JRIU00000000)                                                              [100%] 1 of 1 ✔
[d5/16c011] process > makeChromSize (W303_SGD_2015_JRIU00000000.fsa)                                                             [100%] 1 of 1 ✔
[a6/f68e54] process > getRestrictionFragments (W303_SGD_2015_JRIU00000000.fsa A^AGCTT)                                           [100%] 1 of 1 ✔
[33/2a727e] process > bowtie2_end_to_end (SRR4292758_00_R1)                                                                      [100%] 2 of 2 ✔
[ea/c7188f] process > trim_reads (SRR4292758_00_R2)                                                                              [100%] 2 of 2 ✔
[3e/3ca322] process > bowtie2_on_trimmed_reads (SRR4292758_00_R2)                                                                [100%] 2 of 2 ✔
[ce/d74ebe] process > bowtie2_merge_mapping_steps (SRR4292758_00_R2 = SRR4292758_00_R2.bam + SRR4292758_00_R2_unmap_trimmed.bam) [100%] 2 of 2 ✔
[bd/fd6221] process > combine_mates (SRR4292758_00 = SRR4292758_00_R1 + SRR4292758_00_R2)                                        [100%] 1 of 1 ✔
[c9/c1e0bc] process > get_valid_interaction (SRR4292758_00)                                                                      [100%] 1 of 1 ✔
[f5/2f1982] process > remove_duplicates (SRR4292758_00)                                                                          [100%] 1 of 1 ✔
[3f/680f09] process > merge_stats (mRSstat)                                                                                      [100%] 4 of 4 ✔
[-        ] process > build_contact_maps                                                                                         -
[-        ] process > run_ice                                                                                                    -
[1a/f1bc79] process > convert_to_pairs (SRR4292758_00)                                                                           [100%] 1 of 1 ✔
[74/2a2b0e] process > cooler_raw (SRR4292758_00 - 1000)                                                                          [100%] 1 of 1 ✔
[7e/a1b205] process > cooler_balance (SRR4292758_00 - 1000)                                                                      [100%] 1 of 1 ✔
[98/37cb6f] process > cooler_zoomify (SRR4292758_00)                                                                             [100%] 1 of 1 ✔
[76/f1f2bd] process > dist_decay (SRR4292758_00)                                                                                 [100%] 1 of 1 ✔
[33/859499] process > compartment_calling (SRR4292758_00 - 1000)                                                                 [100%] 1 of 1, failed: 1 ✘
[80/ec5fcd] process > tads_hicexplorer (SRR4292758_00 - 1000)                                                                    [100%] 1 of 1 ✔
[d5/3959f7] process > tads_insulation (SRR4292758_00 - 1000)                                                                     [100%] 1 of 1, failed: 1 ✘
[a3/21a9d8] process > multiqc                                                                                                    [100%] 1 of 1 ✔
[e5/9db365] process > output_documentation                                                                                       [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/hic] Pipeline completed with errors-
Error executing process > 'tads_insulation (SRR4292758_00 - 1000)'

Caused by:
  Process `tads_insulation (SRR4292758_00 - 1000)` terminated with an error exit status (1)

Command executed:

  cooltools diamond-insulation --window-pixels SRR4292758_00_1000_norm.cool 15 25 50 > SRR4292758_00_insulation.tsv

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/env/export/v_bigtmp/eb/test/work/conda/nf-core-hic-1.3.0-6b228b9d59279339a55b4a54b5df58de/bin/cooltools", line 5, in <module>
      from cooltools.cli import cli
    File "/env/export/v_bigtmp/eb/test/work/conda/nf-core-hic-1.3.0-6b228b9d59279339a55b4a54b5df58de/lib/python3.7/site-packages/cooltools/__init__.py", line 18, in <module>
      from .lib import numutils, download_data, print_available_datasets, get_data_dir, download_file, get_md5sum
    File "/env/export/v_bigtmp/eb/test/work/conda/nf-core-hic-1.3.0-6b228b9d59279339a55b4a54b5df58de/lib/python3.7/site-packages/cooltools/lib/numutils.py", line 8, in <module>
      import numba
  ModuleNotFoundError: No module named 'numba'

Work dir:
  /env/export/v_bigtmp/eb/test/work/d5/3959f75f7d599155c053cca573ae37

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

sample name

Issue with the sample names when we extract the prefix information

sample = prefix.toString() - ~/(_R1|_R2|_val_1|_val_2|_1|_2)/

The pipeline crashed at the MultiQC part, because the stats files do not have the same prefix

To reproduce the issue, use the following sample names ;

  • PRD_Pdev_2031_S9_L001_R1_001.fastq.gz
  • PRD_Pdev_2031_S9_L001_R2_001.fastq.gz

stepwise mode

Offer the possibility to run the pipeline from intermediate steps, ie. from aligned data, or from valid pairs files

Set up AWS megatests

AWS megatests is now running nicely and we’re trying to set up all (most) nf-core pipelines to run a big dataset. We need to identify a set of public data to run benchmarks for the pipeline.

The idea is that this will run automatically for every release of the nf-core/hic pipeline. The results will then be publicly accessible from s3 and viewable through the website: https://nf-co.re/hic/results - this means that people can manually compare differences in output between pipeline releases if they wish.

We need a dataset that is as “normal” as possible, mouse or human, sequenced relatively recently and with a bunch of replicates etc. It can be a fairly large project

I'm hoping that @nservant can help here, but suggestions from anyone and everyone are more than welcome! ✋🏻

In practical terms, once decided we need to:

  • Upload the FastQ files to s3: s3://nf-core-awsmegatests/hic/input_data/ (I can help with this)
  • Update test_full.config to work with these file paths
  • Check .github/workflows/awsfulltest.yml (should be no changes required I think?)
  • Merge, and try running the dev branch manually

get_valid_interaction fails with exitcode 137 with --digestion arima

There isn't much context provided with the following error

$ nextflow run main.nf \
  -profile docker \
  --genome GRCh38 \
  --digestion arima \
  --input '/home/ec2-user/*{1,2}.fastq.gz'

...

Error executing process > 'get_valid_interaction (sample)'

Caused by:
  Process `get_valid_interaction (sample)` terminated with an error exit status (137)

Command executed:

  mapped_2hic_fragments.py -f restriction_fragments.bed -r sample_bwt2pairs.bam --all
  sort -k2,2V -k3,3n -k5,5V -k6,6n -o sample_bwt2pairs.validPairs sample_bwt2pairs.validPairs

Command exit status:
  137

Command output:
  (empty)

Command error:
  .command.sh: line 2:
    27 Killed                  mapped_2hic_fragments.py -f restriction_fragments.bed -r sample_bwt2pairs.bam --all

...

$ cat .exitcode
137

$ cat .command.err
/home/ec2-user/hic/work/19/3e1cb12b33b4f95171a7c8141e2566/.command.sh: line 2:
    27 Killed                  mapped_2hic_fragments.py -f restriction_fragments.bed -r sample_bwt2pairs.bam --all

Might you happen to know of any test datasets that use the ARIMA protocol, so that I may create a reproducible case?

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
  2. See error:

Expected behaviour

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware:
  • Executor:
  • OS:
  • Version

Nextflow Installation

  • Version:

Container engine

  • Engine:
  • version:
  • Image tag:

Additional context

Bowtie2 end to end executing process error

Hello,

I am running an nf-core/hic pipeline for breast samples (Total 6 samples including replicates) 500 Million reads each.
I am getting the following error


executor >  local (26)
[1f/560b97] process > get_software_versions                    [100%] 1 of 1 ✔
[d3/5e1c5f] process > makeChromSize (genome.fa)                [100%] 1 of 1 ✔
[93/6a6ae3] process > getRestrictionFragments (genome.fa [^... [100%] 1 of 1 ✔
[a9/b6cc6d] process > bowtie2_end_to_end (HiChIP_MCF10A-A_S... [  9%] 22 of 234, failed...
[-        ] process > trim_reads                               [  0%] 0 of 5
[-        ] process > bowtie2_on_trimmed_reads                 -
[-        ] process > merge_mapping_steps                      -
[-        ] process > combine_mapped_files                     -
[-        ] process > get_valid_interaction                    -
[-        ] process > remove_duplicates                        -
[-        ] process > merge_sample                             -
[-        ] process > build_contact_maps                       -
[-        ] process > run_ice                                  -
[-        ] process > generate_cool                            -
[-        ] process > multiqc                                  -
[b7/f6bd1d] process > output_documentation (1)                 [100%] 1 of 1 ✔

Error executing process > 'bowtie2_end_to_end (HiChIP_MCF10A-A_S7_R2_001.15)'

Caused by:
  Process exceeded running time limit (5h)

Command executed:

  bowtie2 --rg-id BMG --rg SM:HiChIP_MCF10A-A_S7_R2_001.15 \
  --very-sensitive --end-to-end --reorder \
  -p 4 \
  -x Bowtie2Index/genome \
  --un HiChIP_MCF10A-A_S7_R2_001.15_unmap.fastq \
  	-U HiChIP_MCF10A-A_S7_R2_001.15.fastq | samtools view -F 4 -bS - > HiChIP_MCF10A-A_S7_R2_001.15.bam

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /home/koushik/hichip_fastq/work/58/23e74799c7ab12315e8a9e3a2dd28b

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

The following is the script I ran


 sudo nextflow run nf-core/hic -r 1.0.0  \
       --reads '/home/koushik/hichip_fastq/HiChIP_MCF10A-{A,B}_S{7,8}_R{1,2}_001.fastq.gz' \
       -profile docker \
       -resume \
       --splitFastq 10000000 \
       --max_memory '100.GB' \
       --max_cpus 60 \
       --outdir "/media/HD/HiChIP_results_Sep_16_2021" \
       --genome GRCh37 \
       --bwt2_opts_end2end '--very-sensitive --end-to-end --reorder' \
       --bwt2_opts_trimmed '--very-sensitive --end-to-end --reorder' \
       --ligation_site 'GATCGATC' \
       --restriction_site '^GATC' \
       --min_cis_dist 1000 \
       --min_mapq 20 \
       --bin_size '1000,5000,20000,40000,150000,500000,1000000' \
       --saveReference 

What can I do to overcome this error?
Is it because of the time limit? Can I change the running time limit in base.config file?

Thank you!

--split_fastq option is not working

if ( params.split_fastq ){                                                                                                                                                                                   
   raw_reads_full = raw_reads.concat( raw_reads_2 )                                                                                                                                                          
   raw_reads = raw_reads_full.splitFastq( by: params.splitFastq , file: true)                                                                                                                                
 }else{                                                                                                                                                                                                      
   raw_reads = raw_reads.concat( raw_reads_2 ).dump(tag: "data")                                                                                                                                             
}       

--skip options

Add --skip option for optional steps such as ;

matrix generation
normalization
cooler file

DSL2 implementation of nf-core-hic pipeline

Here is a place where we could exchange about the different modules pour DLS2 implementations.
I'm putting here the list of modules I have in mind

Quality Controls

  • fastqc

Hi-C data processing

From fastq to a list of interactions = sub-workflows

  • hicpro
    • bowtie2 step1 mapping
    • reads trimming (hicpro C++ implementation)
    • bowtie2 step2
    • hicpro bam pairing
    • hicpro valid pairs detection
  • hicup
    • hicup trimming
    • bowtie2 mapping
    • hicup valid pairs detection
  • bwa-mem
    • bwa-mem with Hi-C option (-SP5M)
    • valid pairs detection (could use the hicpro module there)

Format convertion

  • pairs file (4D)
  • (m)cool file
  • hic file

Downstream analysis

All from cool files if possible

  • counts~distance decay with hicExplorer
  • Compartment calling (cooltools)
  • insulation score (cooltools)
  • TADs calling (hicExplorer)
  • loops calling (cooltools, fithic)

design file

Add a design file to specify replicates or group of sample which have to be merged before contact maps generation

installation process

The current 1.0dev version contains compiled C++ code.
Check whether an installation process would be require for these codes

Add Zenodo DOI for release to main README on master

Would be good to add the Zenodo DOI for the release to the main README of the pipeline in order to make it citable. You will have to do this via a branch pushed to the repo in order to directly update master. See PR below for example and file changes:
nf-core/atacseq#38

See https://zenodo.org/record/2669513#.XVZ06OhKhPY

Web-hooks are already set-up for this repo to have a unique Zenodo DOI generated everytime a new version of the pipeline is released. Would be good to add this in after every release 👍

Makebowtie2Index finish with error code but no error message is shown

Sorry if it is redondant to issue #66 , but it felt like it was a different problem.

I have ran the hi-c pipeline with no issue with a maize sample and a maize genome, but I encountered this issue with a genome freshly downloaded from the ncbi, and I don't understand as I feel like there is not error message :

executor >  slurm (5)
[37/acbb6d] process > get_software_versions    [100%] 1 of 1 ✔
[37/118a25] process > makeBowtie2Index         [100%] 1 of 1, failed: 1 ✘
[45/277a25] process > makeChromSize            [100%] 1 of 1 ✔
[57/48d373] process > getRestrictionFragments  [100%] 1 of 1 ✔
[-        ] process > bowtie2_end_to_end       -
[-        ] process > trim_reads               -
[-        ] process > bowtie2_on_trimmed_reads -
[-        ] process > merge_mapping_steps      -
[-        ] process > combine_mapped_files     -
[-        ] process > get_valid_interaction    -
[-        ] process > remove_duplicates        -
[-        ] process > merge_sample             -
[-        ] process > build_contact_maps       -
[-        ] process > run_ice                  -
[-        ] process > generate_cool            -
[-        ] process > multiqc                  -
[a9/d4b5e3] process > output_documentation     [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/hic] Pipeline completed with errors-
Error executing process > 'makeBowtie2Index (GCF_002263795.1_ARS-UCD1.2_genomic.fna)'

Caused by:
  Process `makeBowtie2Index (GCF_002263795.1_ARS-UCD1.2_genomic.fna)` terminated with an error exit status (247)

Command executed:

  mkdir bowtie2_index
  bowtie2-build GCF_002263795.1_ARS-UCD1.2_genomic.fna bowtie2_index/GCF_002263795.1_ARS-UCD1.2_genomic.fna

Command exit status:
  247

Command output:
    Max bucket size, len divisor: 4
    Difference-cover sample period: 1024
    Endianness: little
    Actual local endianness: little
    Sanity checking: disabled
    Assertions: disabled
    Random seed: 0
    Sizeofs: void*:8, int:4, long:8, size_t:8
  Input files DNA, FASTA:
    GCF_002263795.1_ARS-UCD1.2_genomic.fna
  Reading reference sizes
    Time reading reference sizes: 00:00:21
  Calculating joined length
  Writing header
  Reserving space for joined string
  Joining reference sequences
    Time to join reference sequences: 00:00:14
  bmax according to bmaxDivN setting: 678956407
  Using parameters --bmax 509217306 --dcv 1024
    Doing ahead-of-time memory usage test
    Passed!  Constructing with these parameters: --bmax 509217306 --dcv 1024
  Constructing suffix-array element generator
  Building DifferenceCoverSample
    Building sPrime
    Building sPrimeOrder
    V-Sorting samples
    V-Sorting samples time: 00:01:14
    Allocating rank array
    Ranking v-sort output
    Ranking v-sort output time: 00:00:21
    Invoking Larsson-Sadakane on ranks
    Invoking Larsson-Sadakane on ranks time: 00:00:42
    Sanity-checking and returning
  Building samples
  Reserving space for 12 sample suffixes
  Generating random suffixes
  QSorting 12 sample offsets, eliminating duplicates
  QSorting sample offsets, eliminating duplicates time: 00:00:00
  Multikey QSorting 12 samples
    (Using difference cover)
    Multikey QSorting samples time: 00:00:00
  Calculating bucket sizes
  Splitting and merging
    Splitting and merging time: 00:00:00
  Avg bucket size: 2.71583e+09 (target: 509217305)
  Converting suffix-array elements to index image
  Allocating ftab, absorbFtab
  Entering Ebwt loop
  Getting block 1 of 1
    No samples; assembling all-inclusive block

Command error:
  Building a SMALL index

Work dir:
  /work/sbsuser/test/roxane/nextflow/hi-c/MiSeq-Bovin/work/37/118a2510eb4fad5ef4ce5bb5402d27

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

And in the pointed work directory, the .error file is empty. I am not that familiar with bowtie2, and I don't really understand what is wrong here.

Sorry if wrong channel or if the issue is not really related to the nf itself... Thanks for your help !

Process get_software_versions crashes

Hi!
I am experiencing difficulties while trying to run nf core hic on my data, while the test run went completely fine. I would really appreciate if you could help to figure out what goes wrong here.

I'm attaching the log file:

N E X T F L O W  ~  version 19.10.0
Launching `nf-core/hic` [wise_hypatia] - revision: 481964d91c [master]
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/atacseq v1.1.0
----------------------------------------------------
WARN: Access to undefined parameter `max_restriction_framgnet_size` -- Initialise it to a default value eg. `params.max_restriction_framgnet_size = some_value`
Pipeline Release  : master
Run Name          : wise_hypatia
Reads             : /nfs/data/Hennighausen/p6/merged/final/sample1/*_R{1,2}.fastq.gz
splitFastq        : false
Fasta Ref         : s3://ngi-igenomes/igenomes//Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa
Restriction Motif : ^GATC
Ligation Motif    : ^GATCGATC
DNase Mode        : false
Remove Dup        : true
Min MAPQ          : 30
Min Fragment Size : 100
Max Fragment Size : null
Min Insert Size   : 100
Max Insert Size   : 600
Min CIS dist      : true
Maps resolution   : 1000000,500000
Max Memory        : 10.GB
Max CPUs          : 40
Max Time          : 2.0
Output dir        : ./results
Working dir       : /nfs/data/Hennighausen/work
Container Engine  : null
Current home      : /nfs/home/users/olgala
Current user      : olgala
Current path      : /nfs/data/Hennighausen
Script dir        : /nfs/home/users/olgala/.nextflow/assets/nf-core/hic
Config Profile    : conda
----------------------------------------------------
[-        ] process > get_software_versions    -
[-        ] process > makeChromSize            -
[-        ] process > getRestrictionFragments  -
[-        ] process > bowtie2_end_to_end       -
[-        ] process > trim_reads               -
[-        ] process > bowtie2_on_trimmed_reads -
[-        ] process > merge_mapping_steps      -
[-        ] process > combine_mapped_files     -
[-        ] process > get_valid_interaction    -
[-        ] process > remove_duplicates        -
[-        ] process > merge_sample             -
[-        ] process > build_contact_maps       -
[-        ] process > run_ice                  -
[-        ] process > generate_cool            -
[-        ] process > multiqc                  -
[-        ] process > output_documentation     -
Creating Conda env: /nfs/home/users/olgala/.nextflow/assets/nf-core/hic/environment.yml [cache /nfs/data/Hennighausen/work/conda/nf-core-hic-1.1.0-b0d9faeab5a09c5a9485e611e955124d]

[-        ] process > get_software_versions    -
[-        ] process > makeChromSize            -
[-        ] process > getRestrictionFragments  -
[-        ] process > bowtie2_end_to_end       -
[-        ] process > trim_reads               -
[-        ] process > bowtie2_on_trimmed_reads -
[-        ] process > merge_mapping_steps      -
[-        ] process > combine_mapped_files     -
[-        ] process > get_valid_interaction    -
[-        ] process > remove_duplicates        -
[-        ] process > merge_sample             -
[-        ] process > build_contact_maps       -
[-        ] process > run_ice                  -
[-        ] process > generate_cool            -
[-        ] process > multiqc                  -
[-        ] process > output_documentation     -
Creating Conda env: /nfs/home/users/olgala/.nextflow/assets/nf-core/hic/environment.yml [cache /nfs/data/Hennighausen/work/conda/nf-core-hic-1.1.0-b0d9faeab5a09c5a9485e611e955124d]
Staging foreign file: s3://ngi-igenomes/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa

[-        ] process > get_software_versions    -
[-        ] process > makeChromSize            -
[-        ] process > getRestrictionFragments  -
[-        ] process > bowtie2_end_to_end       -
[-        ] process > trim_reads               -
[-        ] process > bowtie2_on_trimmed_reads -
[-        ] process > merge_mapping_steps      -
[-        ] process > combine_mapped_files     -
[-        ] process > get_valid_interaction    -
[-        ] process > remove_duplicates        -
[-        ] process > merge_sample             -
[-        ] process > build_contact_maps       -
[-        ] process > run_ice                  -
[-        ] process > generate_cool            -
[-        ] process > multiqc                  -
[-        ] process > output_documentation     -
Staging foreign file: s3://ngi-igenomes/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa
Staging foreign file: s3://ngi-igenomes/igenomes/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index


executor >  local (1)
[af/5f9e5c] process > get_software_versions    [  0%] 0 of 1
[-        ] process > makeChromSize            -
[-        ] process > getRestrictionFragments  -
[-        ] process > bowtie2_end_to_end       -
[-        ] process > trim_reads               -
[-        ] process > bowtie2_on_trimmed_reads -
[-        ] process > merge_mapping_steps      -
[-        ] process > combine_mapped_files     -
[-        ] process > get_valid_interaction    -
[-        ] process > remove_duplicates        -
[-        ] process > merge_sample             -
[-        ] process > build_contact_maps       -
[-        ] process > run_ice                  -
[-        ] process > generate_cool            -
[-        ] process > multiqc                  -
[-        ] process > output_documentation     -



executor >  local (6)
[af/5f9e5c] process > get_software_versions          [  0%] 0 of 1
[2c/5c2ead] process > makeChromSize (genome.fa)      [  0%] 0 of 1
[32/3d0dff] process > getRestrictionFragments (ge... [  0%] 0 of 1
[35/16b000] process > bowtie2_end_to_end (paired_R1) [  0%] 0 of 2
[-        ] process > trim_reads                     -
[-        ] process > bowtie2_on_trimmed_reads       -
[-        ] process > merge_mapping_steps            -
[-        ] process > combine_mapped_files           -
[-        ] process > get_valid_interaction          -
[-        ] process > remove_duplicates              -
[-        ] process > merge_sample                   -
[-        ] process > build_contact_maps             -
[-        ] process > run_ice                        -
[-        ] process > generate_cool                  -
[-        ] process > multiqc                        -
[88/d216aa] process > output_documentation (1)       [  0%] 0 of 1
Error executing process > 'get_software_versions'

Caused by:
  Process `get_software_versions` terminated with an error exit status (1)

Command executed:

  echo 1.1.0 > v_pipeline.txt
  echo 19.10.0 > v_nextflow.txt
  bowtie2 --version > v_bowtie2.txt
  python --version > v_python.txt 2>&1
  samtools --version > v_samtools.txt
  multiqc --version > v_multiqc.txt
  scrape_software_versions.py &> software_versions_mqc.yaml

Command exit status:
  1

Command output:
  (empty)

Command wrapper:
  .command.run: line 260: activate: No such file or directory

Work dir:
  /nfs/data/Hennighausen/work/af/5f9e5ca56bf657ef850571d4f19ce5

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

executor >  local (6)
[af/5f9e5c] process > get_software_versions          [100%] 1 of 1, failed: 1 ✘
[2c/5c2ead] process > makeChromSize (genome.fa)      [100%] 1 of 1, failed: 1 ✘
[32/3d0dff] process > getRestrictionFragments (ge... [100%] 1 of 1, failed: 1 ✘
[35/16b000] process > bowtie2_end_to_end (paired_R1) [100%] 2 of 2, failed: 2 ✘
[-        ] process > trim_reads                     -
[-        ] process > bowtie2_on_trimmed_reads       -
[-        ] process > merge_mapping_steps            -
[-        ] process > combine_mapped_files           -
[-        ] process > get_valid_interaction          -
[-        ] process > remove_duplicates              -
[-        ] process > merge_sample                   -
[-        ] process > build_contact_maps             -
[-        ] process > run_ice                        -
[-        ] process > generate_cool                  -
[-        ] process > multiqc                        -
[88/d216aa] process > output_documentation (1)       [100%] 1 of 1, failed: 1 ✘
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'get_software_versions'

Caused by:
  Process `get_software_versions` terminated with an error exit status (1)

Command executed:

  echo 1.1.0 > v_pipeline.txt
  echo 19.10.0 > v_nextflow.txt
  bowtie2 --version > v_bowtie2.txt
  python --version > v_python.txt 2>&1
  samtools --version > v_samtools.txt
  multiqc --version > v_multiqc.txt
  scrape_software_versions.py &> software_versions_mqc.yaml

Command exit status:
  1

Command output:
  (empty)

Command wrapper:
  .command.run: line 260: activate: No such file or directory

Work dir:
  /nfs/data/Hennighausen/work/af/5f9e5ca56bf657ef850571d4f19ce5

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

TAD callers and compartement with more than one resolution

Hi,
I have launched the pipeline with 6 resolutions:

--res_compartments '800000,400000,200000,100000,50000' \
--tads_caller 'insulation,hicexplorer' \
--res_tads '200000,100000,50000,25000,10000,5000'

And the pipeline has lauched 6 jobs.

[be/c8092b] process > tads_hicexplorer (Bovin-365... [100%] 6 of 6 ✔
[22/7ac14e] process > tads_insulation (Bovin-3654... [100%] 6 of 6 ✔

However I have only one output:

ls ./tads/hicexplorer/
tad_boundaries.bed  tad_boundaries.gff  tad_domains.bed  tad_score.bedgraph

The same for insulation and compartement calling.

ls ./tads/insulation/
Bovin-3654_CCGCGGTT-CTAGCGCT-AHT2HCDSX2_L004_insulation.tsv

ls ./compartments
Bovin-3654_CCGCGGTT-CTAGCGCT-AHT2HCDSX2_L004_compartments.cis.E1.bedgraph
Bovin-3654_CCGCGGTT-CTAGCGCT-AHT2HCDSX2_L004_compartments.cis.lam.txt
Bovin-3654_CCGCGGTT-CTAGCGCT-AHT2HCDSX2_L004_compartments.cis.vecs.tsv

It seems that they are overwrite by each job because they all have the same prefix.
Here is the command run by the pipeline for hicexplorer:

hicFindTADs --matrix Bovin-3654_CCGCGGTT-CTAGCGCT-AHT2HCDSX2_L004_5000_norm.cool           
--outPrefix tad         
--correctForMultipleTesting fdr         
--numberOfProcessors 4

For more clarity I would recommend to keep the prefix of the input in full (I was wondering if it took normalized matrices or not).

hicFindTADs --matrix Bovin-3654_CCGCGGTT-CTAGCGCT-AHT2HCDSX2_L004_5000_norm.cool           
--outPrefix Bovin-3654_CCGCGGTT-CTAGCGCT-AHT2HCDSX2_L004_5000_norm         
--correctForMultipleTesting fdr         
--numberOfProcessors 4

A/B compartments

Available for A/B comparments calling ;

FANC

fanc DATA.cool output.ab -d compartments.fanc

HiTC

call_compartments.r --matrix ${mat} --bed ${bed}                                                                                                                                                           

cootools

cooltools genome gc BIN_PATH FASTA_PATH
cooltools call-compartments results/contact_maps/cool/SRR400264_01_1000000.cool -o cooltools_AB --bigwig --reference-track GC_FILE

HiCExplorer

hicPCA --matrix DATA.h5 --outputFileName hicexplorer_ABcompartments

perserve the qc pdf from hic pro

hicpro give five pdf for QC metrics, those added to 100% of the reads, so users know exact where reads go, eg. pairs with singleton
see my attached examples.
image
image

The nextflow pipeline have a multiqc output, but the total reads, does not add up, it is confusing where the reads got filtered out. see my attached examples.
image
image

in the hic pro portion, simply preserve the five pdf will help users like me a lot

TADs calling

Add a TADs calling step

cooltools ;

cooltools call-compartments ...

HiCExplorer

 hicFindTADs --matrix ${h5mat} \
  	      --outPrefix tad \
	      --correctForMultipleTesting fdr \
	      --numberOfProcessors ${task.cpus}

FANC

fanc insulation ...

Fix TEMPLATE branch

Hi there!

It looks this pipeline has a TEMPLATE branch (great!), however it has no shared history which means that the automated template synchronisation does not work.

In order to fix this for future pipeline syncs, you need to do one fully manual merge. The method for this is documented here: https://nf-co.re/developers/sync#merge-template-into-main-branches
The key is basically to supply --allow-unrelated-histories when running the merge.

You'll need to pull the template changes to your fork first, see these docs.

Once you've done this once, there will be shared history, and future automated sync PRs should work.. Let me know if you need any help.

Phil

x-ref nf-core/tools#548 (comment)

Typo in the doc for the genome parameter

In usage.md, this:
nextflow run nf-core/hic --reads '*_R{1,2}.fastq.gz' -genome GRCh37 -profile docker

should be replaced by (--genome instead of -genome):
nextflow run nf-core/hic --reads '*_R{1,2}.fastq.gz' --genome GRCh37 -profile docker

Problem submitting hic jobs using aws batch submit-job

Hi there,

I have a job that runs successfully when i launch nf-core/hic on an ec2 instance with:

nextflow run nf-core/hic -profile docker,awsbatch ...

But when i run the same job as:

aws batch submit-job ...

it can't seem to find the bwt2_index I specify in s3.

full command is:

$ aws batch submit-job
    --job-name hic-test
    --job-queue highpriority-XXX
    --job-definition nextflow
    --container-overrides '{"command": ["nf-core/hic", "-profile", "awsbatch,docker", "--awsregion", "us-west-2", "--awsqueue", "default-XXX", "--outdir", "s3://s3-omics-gears-batch-dev-results/hic-batch/", "-work-dir", "s3://s3-omics-gears-batch-dev-results/hic-batch/work", "--reads", "s3://omics-reference-dev/hic/test_data/dixon_2M/*_R{1,2}.fastq.gz", "--fasta", "s3://omics-reference-dev/references/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa", "--bwt2_index", "s3://omics-reference-dev/references/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome"]}'

(note: aws batch in my case is on an aws-genomics-workflow stack instance)

error msg is:

May-14 22:25:16.992 [main] DEBUG nextflow.Session - Session aborted -- Cause: No such property: USER for class: Script_6f34e000
May-14 22:25:17.023 [main] DEBUG nextflow.Session - The following nodes are still active:
  [operator] separate
  [operator] concat
  [operator] ifEmpty
  [operator] into

May-14 22:25:17.036 [Actor Thread 5] DEBUG nextflow.Nextflow - Ignore exit because execution is already aborted -- message=Genome index: Provided index not found: s3://omics-reference-dev/references/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome

now i know that the path is correct since it works when i launch via nextflow run ... also, i've checked it about 7000 times.

wondering if you've ever seen this behaviour before?

thanks

Arima HiC ligation motif

Hi, I have a short question concerning the Arima kit:
according to https://nf-co.re/hic/docs/usage the following is stated:
--restriction_site
ARIMA kit: ^GATC,^GANT

--ligation_site '[Ligation motif]'
Exemple of the ARIMA kit: GATCGATC,GATCGANT,GANTGATC,GANTGANT

However, on the Arima webpage they say:
"The Arima-HiC chemistry uses restriction enzymes that digest chromatin at ^GATC and G^ANTC, where N can be any of the 4 genomic bases. Our multiple restriction enzyme chemistry produces the following possible ligation junction motifs: GATCGATC, GANTGATC, GANTANTC, GATCANTC."

I am wondering where the difference comes from. I'd appreciate if you could help me with that!

Best,
Katrin

Error executing process > combine_mapped_files

Hello,
I got this error message:

executor >  slurm (376)
[82/2114a9] process > get_software_versions          [100%] 1 of 1 ✔
[5d/c7b324] process > makeBowtie2Index (GCF_00325... [100%] 1 of 1 ✔
[a8/2d5744] process > getRestrictionFragments (GC... [100%] 1 of 1 ✔[a3/aa9969] process > bowtie2_end_to_end (DRR1980... [100%] 74 of 74 ✔
[0a/e3b0e2] process > trim_reads (DRR198069_2.34)    [100%] 74 of 74 ✔[f5/6afae9] process > bowtie2_on_trimmed_reads (D... [100%] 74 of 74 ✔
[23/953a16] process > merge_mapping_steps (DRR198... [100%] 74 of 74 ✔[2f/0c7453] process > combine_mapped_files (DRR19... [100%] 74 of 74, f
ailed:...[-        ] process > get_valid_interaction          -
[-        ] process > remove_duplicates              -[7d/d7d092] process > merge_sample (mmapstat)        [100%] 2 of 2
[-        ] process > build_contact_maps             -
[-        ] process > run_ice                        -
[-        ] process > generate_cool                  -
[-        ] process > multiqc                        -
[04/53aece] process > output_documentation           [100%] 1 of 1 ✔Execution cancelled -- Finishing pending tasks before exit
-[nf-core/hic] Pipeline completed with errors-
Error executing process > 'combine_mapped_files (DRR198069_1.2 = DRR198069_1.2 + null)'

Caused by:
  Process `combine_mapped_files (DRR198069_1.2 = DRR198069_1.2 + null)` terminated with an error exit status (1)

Command executed:

  mergeSAM.py -f DRR198069_1.2_bwt2merged.bam -r null -o DRR198069_1.2_bwt2pairs.bam -t -q 10

Command exit status:
  1

Command output:
  (empty)

Command error:
  [E::hts_open_format] Failed to open file null
  Traceback (most recent call last):
    File "/home/nmary/.nextflow/assets/nf-core/hic/bin/mergeSAM.py", line 218, in <module>
      with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2:
    File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.__cinit__
    File "pysam/libcalignmentfile.pyx", line 940, in pysam.libcalignmentfile.AlignmentFile._open
  FileNotFoundError: [Errno 2] could not open alignment file `null`: No such file or directory

Work dir:
  /work/nmary/Abeille/Nfcore_hic/work/2e/b101bb042ff0bcf393936214cd72e2

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

My command is

module load bioinfo/nfcore-Nextflow-v20.10.0
nextflow run nf-core/hic \
-r 1.2.2 \
-profile genotoul \
-resume \
--input '/work/nmary/Abeille/*DRR198069_{1,2}.fastq' \
--fasta '/work/nmary/Abeille/Assemblage/GCF_003254395.2_Amel_HAv3.1_genomic.fna' \
--chromosome_size '/work/nmary/Abeille/Assemblage/HAv3_1_Chromosomes.list' \
--restriction_site '^GATC' \
--ligation_site 'GATC' \
--bins_size '10000' \
--split_fastq --fastq_chunks_size '5000000' \
--save_interaction_bam

Any idea to solve this error?

HIC: Convert param documentation to JSON schema documentation

Hi!

this is not necessarily an issue with the pipeline, but in order to streamline the documentation group next week for the hackathon, I'm opening issues in all repositories / pipeline repos that might need this update to switch from parameter docs to auto-generated documentation based on the JSON schema.

This will then supersede any further parameter documentation, thus making things a bit easier :-)

If this doesn't apply (anymore), please close the issue. Otherwise, I'm hoping to have some helping hands on this next week in the documentation team on Slack https://nfcore.slack.com/archives/C01QPMKBYNR

Typo

It sais max_restriction_fraMGnet_size, lines 50 and 250 of the pipeline. In the config is ok.

`--bin_size`

I also found an other issue when we want only 1 bin size: --bin_size '10000'

N E X T F L O W  ~  version 20.10.0
Launching `nf-core/hic` [gloomy_volta] - revision: 52e5f048a6 [1.2.2]
WARN: It appears you have never run this project before -- Option `-resume` is ignored
WARN: Access to undefined parameter `genomes` -- Initialise it to a default value eg. `params.genomes = some_value`
Unknown method invocation `tokenize` on Integer type

 -- Check script '/home/nmary/.nextflow/assets/nf-core/hic/main.nf' atline: 230 or see '.nextflow.log' file for more details
Unexpected error [ClosedByInterruptException]

It works with --bin_size '10000,25000'

I did not have this issue previoulsy because I used --bins_size '10000' with a "s", so it took default values ('1000000,500000')
You might update the manual. Moreover you use --bin_size and --bins_size in https://nf-co.re/hic/usage

Originally posted by @Nico-FR in #83 (comment)

bowtie2_end_to_end failed on 1.3.0 but worked on 1.2.2

Hello,
I am trying to run nf-core/hic on cow. It worked on the 1.2.2 with this command line:

nextflow run nf-core/hic \
-r 1.2.2 \
-profile genotoul \
-name v1.2.2 \
--input '/work/nmary/Bovin/Nf-core/977*_R{1,2}.fastq.gz' \
--fasta '/bank/bowtie2db/ensembl_bos_taurus_genome' \
--bwt2_index '/bank/bowtie2db/ensembl_bos_taurus_genome' \
--restriction_site '^GATC,G^ANTC' \
--ligation_site 'GATCGATC,GANTGATC,GANTANTC,GATCANTC' \
--bin_size '200000,10000' \
--min_insert_size 20 \
--max_insert_size 1000 \
--split_fastq --fastq_chunks_size '10000000'

But not on the 1.3.0 with the same command:

nextflow run nf-core/hic \
-r 1.3.0 \
-profile genotoul \
-name test11 \
--input '/work/nmary/Bovin/Nf-core/977*_R{1,2}.fastq.gz' \
--fasta '/bank/bowtie2db/ensembl_bos_taurus_genome' \
--bwt2_index '/bank/bowtie2db/ensembl_bos_taurus_genome' \
--restriction_site '^GATC,G^ANTC' \
--ligation_site 'GATCGATC,GANTGATC,GANTANTC,GATCANTC' \
--bin_size '200000,10000' \
--min_insert_size 20 \
--max_insert_size 1000 \
--split_fastq --fastq_chunks_size '10000000' \
--res_dist_decay '200000,50000,10000,5000' \
--res_compartments '200000,50000,10000' \
--tads_caller 'hicexplorer' \
--res_tads '200000,100000,50000,25000,10000,5000'
[43/2a6685] process > bowtie2_end_to_end (977_TTA... [100%] 1 of 1, failed: 1

Execution cancelled -- Finishing pending tasks before exit
- Ignore this warning: params.schema_ignore_params = "igenomesIgnore" 
- Ignore this warning: params.schema_ignore_params = "igenomesIgnore" 
WARN: Found unexpected parameters:
* --igenomesIgnore: true
WARN: Got an interrupted exception while taking agent result | java.lang.InterruptedException
WARN: Found unexpected parameters:
* --igenomesIgnore: true
Error executing process > 'bowtie2_end_to_end (977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1)'

Caused by:
  Process `bowtie2_end_to_end (977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1)` terminated with an error exit status (255)

Command executed:

  INDEX=`find -L ./ -name "*.rev.1.bt2" | sed 's/.rev.1.bt2//'`
    bowtie2 --rg-id BMG --rg SM:977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1 \
  --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder \
  -p 4 \
  -x ${INDEX} \
  --un 977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1_unmap.fastq \
        -U 977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1.fastq.gz | samtools view -F 4 -bS - > 977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1.bam

Command exit status:
  255

Command output:
  (empty)

Command error:
  (ERR): "--passthrough" does not exist or is not a Bowtie 2 index
  Exiting now ...

So I tryed to build the index in the curent directory with bowtie2-build ./Bos_taurus.ARS-UCD1.2.dna_sm.toplevel.fa ./Bos_taurus.ARS-UCD1.2.dna_sm.toplevel.fa with the same issue

[e1/a532aa] process > bowtie2_end_to_end (977_TTA... [100%] 1 of 1, failed: 1

Execution cancelled -- Finishing pending tasks before exit
- Ignore this warning: params.schema_ignore_params = "igenomesIgnore" 
- Ignore this warning: params.schema_ignore_params = "igenomesIgnore" 
WARN: Found unexpected parameters:
* --igenomesIgnore: true
WARN: Got an interrupted exception while taking agent result | java.lang.InterruptedException
WARN: Found unexpected parameters:
* --igenomesIgnore: true
Error executing process > 'bowtie2_end_to_end (977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1)'

Caused by:
  Process `bowtie2_end_to_end (977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1)` terminated with an error exit status (255)

Command executed:

  INDEX=`find -L ./ -name "*.rev.1.bt2" | sed 's/.rev.1.bt2//'`
    bowtie2 --rg-id BMG --rg SM:977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1 \
  --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder \
  -p 4 \
  -x ${INDEX} \
  --un 977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1_unmap.fastq \
        -U 977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1.fastq.gz | samtools view -F 4 -bS - > 977_TTATAACC-TCGATATC-BHV5JHDSXY_L003_R1.1.bam

Command exit status:
  255

Command output:
  (empty)

Command error:
  (ERR): "--passthrough" does not exist or is not a Bowtie 2 index
  Exiting now ...

reported bugs

I am running the nf-core/hic v1.1.0 pipeline from Nextflow v20.01.0 and I have a few comments and questions:

  1. If a single number is provided as bin_size (eg: --bin_size '1000000') an error is obtained:
    Unknown method tokenize on Integer type
    -- Check script '/home/sfoissac/.nextflow/assets/nf-core/hic/main.nf' at line: 226 or see '.nextflow.log' file for more details
    This does not happen when two numbers are provided (eg: --bin_size '1000000,500000')

  2. There is a typo in the main.nf about the "max_restriction_framgnet_size (with "fragmants" in the description). I get a warning:
    WARN nextflow.script.ScriptBinding - Access to undefined parameter max_restriction_framgnet_size -- Initialise it to a default value eg. params.max_restriction_framgnet_size = some_value

  3. I think that the information about the Arima Hi-C kit is wrong on the documentation page
    https://github.com/nf-core/hic/blob/master/docs/usage.md
    Instead of "ARIMA kit: ^GATC,^GANT" I believe it should be "ARIMA kit: ^GATC,G^ANTC"
    and
    "Exemple of the ARIMA kit: GATCGATC,GATCGANT,GANTGATC,GANTGANT" might be
    "Exemple of the ARIMA kit: GATCGATC,GANTGATC,GANTANTC,GATCANTC".

This is what I understood from https://arimagenomics.com/public/pdf/ArimaGenomics_Genome-Assembly_Datasheet_01-2019.pdf
and https://www.bioinformatics.babraham.ac.uk/projects/hicup/read_the_docs/html/index.html
--arima | Set the –re1 option to that used by the Arima protocol: ^GATC,DpnII:G^ANTC,Arima

  1. I get a strange error while running a little dataset of ~2M reads
Jun-10 12:28:32.563 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'combine_mapped_files (MB_HiC_liver_1_11_S2_all_R2_001 = MB_HiC_liver_1_11_S2_all_R2_001 +
 null)'
Caused by:
  Process `combine_mapped_files (MB_HiC_liver_1_11_S2_all_R2_001 = MB_HiC_liver_1_11_S2_all_R2_001 + null)` terminated with an error exit status (1)
Command executed:
  mergeSAM.py -f MB_HiC_liver_1_11_S2_all_R2_001_bwt2merged.bam -r null -o MB_HiC_liver_1_11_S2_all_R2_001_bwt2pairs.bam -t -q 10
Command exit status:
  1
Command output:
  (empty)
Command error:
  [E::hts_open_format] Failed to open file null
  Traceback (most recent call last):
    File "/home/sfoissac/.nextflow/assets/nf-core/hic/bin/mergeSAM.py", line 222, in <module>
      with  pysam.Samfile(R1file, "rb") as hr1,  pysam.Samfile(R2file, "rb") as hr2:
    File "pysam/libcalignmentfile.pyx", line 736, in pysam.libcalignmentfile.AlignmentFile.__cinit__
    File "pysam/libcalignmentfile.pyx", line 935, in pysam.libcalignmentfile.AlignmentFile._open
  IOError: [Errno 2] could not open alignment file `null`: No such file or directory

I am guessing that something went wrong during one of the previous mapping steps?

The only results files I have are the mmapstat files:

cat hic_results/stats/MB_HiC_liver_1_11_S2_all_R1_001/mstats/MB_HiC_liver_1_11_S2_all_R1_001/MB_HiC_liver_1_11_S2_all_R1_001.mmapstat
total_R2 2294915
mapped_R2 2081255
global_R2 1967565
local_R2 113690
cat hic_results/stats/MB_HiC_liver_1_11_S2_all_R2_001/mstats/MB_HiC_liver_1_11_S2_all_R2_001/MB_HiC_liver_1_11_S2_all_R2_001.mmapstat
total_R2 2294915
mapped_R2 2050220
global_R2 1938636
local_R2 111584

Any idea about what I missed? I am attaching the log file.

Issues with bowtie2 index generation

Hello,

Thanks for the nice pipeline and great documentation !

I am trying to run nfcore/hic with a custom genome and bowtie2 seems to be using the wrong path when looking for the index.
Here is the command line I used:

nextflow run nf-core/hic \
    -r 1.1.0 \
    -profile docker \
    --reads 'my_reads.end{1,2}.fq.gz' \
    --fasta my_genome.fa \
    --restriction_site "^GATC"

The pipeline finishes makeBowtie2Index, but crashes during bowtie2_end_to_end. The error indicates:
(ERR): "bowtie2_index/my_genome.fa" does not exist or is not a Bowtie 2 index

When looking at the content of the bowtie2_index directory, I see files names my_genome.1.bt2, ...
This suggests the .fa extension is removed when building the index, but not when calling bowtie2.

I also tried specifying a pre-built index:

nextflow run nf-core/hic \
    -r 1.1.0 \
    -profile docker \
    --reads 'my_reads.end{1,2}.fq.gz' \
    --fasta my_genome.fa \
    --bwt2_index my_index \
    --restriction_site "^GATC"

But the pipeline crashes instantly with error: Missing `fromPath` parameter

Did I miss something ? Any help would be greatly appreciated.

Input File Naming Convention

there is a some sort of…bug (idk if i’d call it that) in naming the input files. Basically, if the file names contain a number by itself (for example, our data was named like xxx_HiC_2_S43_L007_R1.fastq.gz), there would be problem parsing the name during merging and subsequent steps.

I did not notice this with HiC-pro, only in nf-core/hic.

Remove duplicates execution process error

Hello,

I am running an nf-core/hic pipeline for breast samples (Total 6 samples including replicates) 500 Million reads each.
I am getting the following error

executor >  local (4)
[1f/560b97] process > get_software_versions                                         [100%] 1 of 1, cached: 1 ✔
[e6/dd369f] process > makeChromSize (genome.fa)                                     [100%] 1 of 1, cached: 1 ✔
[3e/5eccce] process > getRestrictionFragments (genome.fa [^GATC])                   [100%] 1 of 1, cached: 1 ✔
[5c/13bee9] process > bowtie2_end_to_end (HiChIP_MCF10A-B_S8_R2_001.35)             [100%] 234 of 234, cache...
[20/529041] process > trim_reads (HiChIP_MCF10A-B_S8_R2_001.35)                     [100%] 234 of 234, cache...
[c0/19592a] process > bowtie2_on_trimmed_reads (HiChIP_MCF10A-B_S8_R2_001.27)       [100%] 234 of 234, cache...
[37/3b2c66] process > merge_mapping_steps (HiChIP_MCF10A-B_S8_001.34 = HiChIP_MC... [100%] 234 of 234, cache...
[63/d5a7a2] process > combine_mapped_files (HiChIP_MCF10A-B_S8_001.26 = HiChIP_M... [100%] 117 of 117, cache...
[5e/368d6e] process > get_valid_interaction (HiChIP_MCF10A-B_S8_001)                [100%] 117 of 117, cache...
[84/34a0f2] process > remove_duplicates (HiChIP_MCF10A-A_S7_001)                    [100%] 3 of 3, failed: 3...
[a9/485599] process > merge_sample (mRSstat)                                        [100%] 8 of 8, cached: 8 ✔
[-        ] process > build_contact_maps                                            -
[-        ] process > run_ice                                                       -
[-        ] process > generate_cool                                                 -
[-        ] process > multiqc                                                       -
[b7/f6bd1d] process > output_documentation (1)                                      [100%] 1 of 1, cached: 1 ✔

Error executing process > 'remove_duplicates (HiChIP_MCF10A-B_S8_001)'

Caused by:
  Process `remove_duplicates (HiChIP_MCF10A-B_S8_001)` terminated with an error exit status (137)

Command exit status:
  137

Command output:
  (empty)

Command error:
  .command.sh: line 5:    31 Killed                  sort -T /tmp/ -S 50% -k2,2V -k3,3n -k5,5V -k6,6n -m HiChIP_MCF10A-B_S8_001.3_bwt2pairs.validPairs HiChIP_MCF10A-B_S8_001.21_bwt2pairs.validPairs 
          32 Done                    | awk -F"\t" 'BEGIN{c1=0;c2=0;s1=0;s2=0}(c1!=$2 || c2!=$5 || s1!=$3 || s2!=$6){print;c1=$2;c2=$5;s1=$3;s2=$6}' > HiChIP_MCF10A-B_S8_001.allValidPairs

Work dir:
  /mnt/hichip_fastq/work/fe/551ceae621f99884fd3f0b8c061957

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

The following is the script I ran

sudo nextflow run nf-core/hic -r 1.0.0  \
       --reads '/mnt/hichip_fastq/MCF10A_2021/HiChIP_MCF10A-{A,B}_S{7,8}_R{1,2}_001.fastq.gz' \
       -profile docker \
       -resume \
       --splitFastq 10000000 \
       --max_memory '80.GB' \
       --max_time '10.h' \
       --max_cpus 20 \
       --outdir "/mnt/hicpro_results/MCF10A_2021" \
       --genome GRCh37 \
       --bwt2_opts_end2end '--very-sensitive --end-to-end --reorder' \
       --bwt2_opts_trimmed '--very-sensitive --end-to-end --reorder' \
       --ligation_site 'GATCGATC' \
       --restriction_site '^GATC' \
       --min_cis_dist 1000 \
       --min_mapq 20 \
       --bin_size '5000,20000,40000,150000,500000,1000000' \
       --saveReference 

I tried to run with version 1.3.0 but the pipeline couldn't able to complete the bowtie2 end-to-end process so I am running with version 1.0.0.

Any thoughts?

read name extension _1/_2

The pipeline crashed at step 'combine_mapped_files' when input files have _1/_2 extensions.
Note that replacing _1/_2 extension by _R1/_R2 fixes the issue

Add Zenodo DOI

Would be great to add a DOI like this one for HIC:

DOI

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2669513.svg)](https://doi.org/10.5281/zenodo.2669513)

input sample sheets

Add input sample sheets allowing to merge samples from multiple lanes, etc.
See the ChIP-seq pipeline as an example

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.