The phaseimpute from nf-core

Include stitch imputation tool

Description of feature

Add STITCH software as one of the imputation modes.

Description of feature

Add QUILT software as one of the imputation modes.

Description of the bug

Most of the small tests we use have phased = true. A test should be added that evaluates the output of the VCF_PHASE_PANEL subworkflow. Specifically this part:

    if (params.phased == false) {
        VCF_PHASE_SHAPEIT5(ch_vcf
            .map { meta, vcf, csi -> [meta, vcf, csi, [], meta.region] },
        Channel.of([[],[],[]]).collect(),
        Channel.of([[],[],[]]).collect(),
        Channel.of([[],[]]).collect())
        ch_versions = ch_versions.mix(VCF_PHASE_SHAPEIT5.out.versions)
        ch_panel_phased = VCF_PHASE_SHAPEIT5.out.variants_phased
            .combine(VCF_PHASE_SHAPEIT5.out.variants_index, by: 0)
    } else {
        ch_panel_phased = ch_vcf
    }

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add required arguments as external params to start the pipeline

Description of feature

Currently, the pipeline requires the channels from --step panelprep to run the following modes. To start the pipeline simply from different steps (such as --step impute), some files are necessary depending on the tool used (such as chunks, posfile, bams and panel ). The bams and panel can be added via a csv with --input and --panel. We should add the other files as external params.

Tasks

Beta Give feedback

Add --chunks param for --step impute #59
simplify posfile generation and processing #57

enhancement
Handle --posfile param for --step impute (all tools)
design tests for each --step and --param
Check mandatory params for each step #60

bug
Options

Renaming chromosome automatically

Description of feature

Create the list of renaming chr automatically
Use the fai and add or remove "chr" prefix

CSV tests need to point to nf-core repository

Description of the bug

While the current testing of the pipeline works, the CSVs are not stored in the nf-core repository. This should be corrected for reproducibility issues.

Example phaseimpute/tests/csv:

sample,vcf,csi
NA12878,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA12878/NA12878.s.1x.bcf,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA12878/NA12878.s.1x.bcf.csi
NA19401,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA19401/NA19401.s.1x.bcf,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA19401/NA19401.s.1x.bcf.csi
NA20359,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA20359/NA20359.s.1x.bcf,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA20359/NA20359.s.1x.bcf.csi

Command used and terminal output

No response

Relevant files

No response

System information

No response

Uniformize tools name

Description of the bug

We need to normalize the names of tools used such as BCFTOOLS_INDEX.
Same for the use of GAWK.

What can be done:

All process use twice in subworkflow add "_1/_2" to separate them
All sbwf names as "input_process_tool"

Correctly separate panel preparation and imputation

Description of the bug

The panel preparation and the imputation are not yet done separately for glimpse and quilt.
The aim would be to do all the preprocessing in the get_panel sbwf to not have duplicated modules and better readability.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Tasks

Beta Give feedback

Chunk panel inside get_panel for glimpse1 and quilt
Add normalisation identical for glimpse1 and quilt
Separate get panel into subworkflows #41
Options

modules.config should include the subworkflow

Description of the bug

There are several modules that are used many times in the pipeline. However, in modules.config the configuration for these modules is global and not specific for the subworkflows. An example:

    withName: GLIMPSE_CHUNK {
        ext.args = [
            "--window-size 200000",
            "--buffer-size 20000"
        ].join(' ')
        ext.prefix = { "${meta.id}" }
    }

This global config affects all the subworkflows that use the same module, even when those have their own config.

Specific config:

    withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:GLIMPSE_CHUNK' {

        ext.prefix = { "${meta.id}_${meta.chr}" }
    }

Example of error:

Caused by:
  Process `NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:GLIMPSE_CHUNK (1000G_phased)` terminated with an error exit status (1)

Command executed:

  GLIMPSE_chunk \
      --window-size 200000 --buffer-size 20000 \
      --input ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
      --region chr22 \
      --thread 2 \
      --output 1000G_phased_chr22.txt

Notice how --window-size and --buffer-size are NOT defined in the NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:GLIMPSE_CHUNK but these are executed anyway.

Therefore, we should strive to add the corresponding subworkflow for each module configuration. This should be solved before adding new functionality as it can have a snowball effect.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Update metromap to match pipeline version

Description of feature

The metromap is not yet updated to match the metromap.
This should be adress for the first release

Move to nf-schema

Description of feature

Move to the new plugin nf-schema instead of the actual nf-validation

subworkflow: remove sample from reference panel

Description of feature

Proposal: Adding an optional subworkflow to remove a specified sample from the reference panel. For example, NA12878 is always included in the reference panel and is the typical sample used for assessing performance.

Specify a fixed outdir parameter folder in the pipeline configuration

Description of feature

Edit the test.config file and fix a folder for storing the output files with the "--outdir" argument

Version ready for review v0.99.0

Description of feature

Before the first release here is the different issues that need to be adressed:

Tasks

Beta Give feedback

Update metromap to match pipeline version #36

documentation
Correctly separate panel preparation and imputation #29

3 of 3

bug
Run full scale test #38

enhancement
Uniformize tools name #42

9 of 9

bug
Rewrite GLIMPSE1 subworkflow to use the panel preparation #55

bug
Add required arguments as external params to start the pipeline #52

0 of 5

enhancement
Add test for phasing panel #53

bug
Options

Run full scale test

The full scale test should work.
Which test should we run at full scale ?
I think the most exhaustive would be a simulation, pre-processing of the panel, phasing of the later, imputation and validation.

Check mandatory params for each step

Description of the bug

Some params are mandatory for some steps, while others are not.

For instance, users may want to run --step panelprep to obtain all the necessary panel files. However, they do not need an --input or --tools in this case.

Command used and terminal output

nextflow run phaseimpute -profile test_panelprep,docker --outdir test

ERROR ~ No tools provided. Expression: params.tools

 -- Check script 'phaseimpute/./subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf' at line: 291 or see '.nextflow.log' file for more details

Relevant files

No response

System information

No response

Error: Panel index file requirement - Invalid filename format and missing file extension

Description of the bug

The schema_input_panel.json required that the "panel index file cannot contain spaces and must have extension '.vcf' or '.bcf' with '.csi' or '.tbi' extension". However, 1000G index files, such as s3://1000genomes/release/20130502/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
end with extension vcf.gz.tbi

Solution:

            "index": {
                "type": "string",
                "pattern": "^\\S+\\.(vcf|bcf)(\\.gz)?\\.(tbi|csi)$",
                "errorMessage": "Panel index file must be provided, cannot contain spaces and must have extension '.vcf' or '.bcf' with optional '.gz' extension and with '.csi' or '.tbi' extension"
            }

Command used and terminal output

No response

Relevant files

No response

System information

No response

Change the name of the main.nf file to clarify how to run the pipeline

Description of feature

It's not a good practice to have two main.nf scripts. It is better to name the main workflow as phaseinput.nf

Test profiles configuration

Description of feature

Include missing test profiles to the nextflow.config file: test_sim & test_panelprep
Modify input files: set AWS paths for test files

Include glimpse2 imputation tool

Description of feature

Add glimpse2 as an alternative imputation tool

Tasks

Beta Give feedback

Imputation of chrX

Description of feature

Develop a module that can handle non PAR chr X regions and perform imputation.

simplify posfile generation and processing

Description of feature

In the panel preparation phase, we generate two types of tsv:

one for GLIMPSE1 with structure as: -f'%CHROM\t%POS\t%REF,%ALT\\n'
another for STITCH with structure as: -f'%CHROM\t%POS\t%REF\t%ALT\\n'

I think it would be convenient to generate only a single type of tsv. This would be useful when using these files as independent inputs with param --posfile, so that they can have the same post-processing.

To make them specific to each tool, we could add a pre-processing step where we replace the last \t with ,, for example.

Separate get panel into subworkflows

Fix test profiles

Description of the bug

Test profiles currently contain local paths, i.e test_full.config

The test profiles that need correcting are:
test_full.config
test_panelprep.config
test_sim.config
test.config

Command used and terminal output

No response

Relevant files

No response

System information

dev

It's not clear which values are used to defined the input parameters in the main.nf script

Description of feature

The main workflow takes 6 parameters as inputs: ch_input, ch_fasta, ch_panel, ch_region, ch_map, and ch_versions. These variables are not defined with value using params.

Allow binary ref creation

Select two tools at the same time

Description of the bug

The pipeline currently allows for selecting only one single imputation tool. This behavior should be modified as users may want to use more than one tool for comparison.

ERROR ~ ERROR: Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --tools: 'glimpse1,quilt' is not a valid choice (Available choices: glimpse1, glimpse2, quilt)

Command used and terminal output

nextflow run phaseimpute -profile test,singularity --outdir test_both --tools glimpse1,quilt

Relevant files

No response

System information

No response

Reference panel should accept a CSV

Description of feature

The reference panel currently accepts a BCF. Example:

panel = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21_22/1000GP.chr21_22.s.norel.bcf"

Reference panels, such as 1000G, are generally stored per chromosome.

It would be useful if the reference panel input flag would accept a CSV. The csv could contain the chromosome names and the URL to the reference panel. For example:

The samplesheet can have as many columns as you desire, however, there is a strict requirement for at least 2 columns to match those defined in the table below.

A final samplesheet file for the reference panel may look something like the one below. This is for 3 chromosomes.

chr,vcf
1,ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
2,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
3,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

Column	Description
`chr`	Name of the chromosome. Use the prefix 'chr' if the panel uses the prefix.
`vcf`	Full path to a VCF file for that chromosome. File has to be gzipped and have the extension ".vcf.gz".gz".

Each row represents a chromosome with its corresponding VCF file, containing information about the reference haplotype panel. You can obtain reference panels from publicly available sources such as the 1000 Genomes Project phase 3.

The second column, vcf, can directly point to publicly available remote S3 buckets with the 1000G reference panels.

An example of this is: https://github.com/atrigila/quilt_nextflow/blob/master/docs/usage.md#structure-1

Tasks

Beta Give feedback

No tasks being tracked yet.

Options

Add support to nf-co2footprint

Description of feature

The plugin nf-co2footprint would be really interesting to be added to the pipeline.

Rewrite GLIMPSE1 subworkflow to use the panel preparation

Description of the bug

For the moment glimpse1 subworkflow is autonomous as available in nf-core.
But it is not compatible with the pipeline as it is.
We should create a new like for glimpse2 to allow the chunks to be computed outside the sbwf.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add --chunks param for --step impute

Add nf-test to the pipeline

Description of feature

It would be nice to test if the pipeline produce the good files each time we run it.

nf-core / phaseimpute Goto Github PK

phaseimpute's People

Contributors

Stargazers

Watchers

Forkers

phaseimpute's Issues

Description of feature

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Tasks

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of the bug

Tasks

Description of the bug

Command used and terminal output

Relevant files

System information

Tasks

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Tasks

Description of the bug

Command used and terminal output

Relevant files

System information

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of feature

Description of feature

Tasks

Description of feature

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Tasks

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Recommend Projects

Recommend Topics

Recommend Org