Giter VIP home page Giter VIP logo

phaseimpute's People

Contributors

atrigila avatar eugeniafontecha avatar louislenezet avatar mrvictorica avatar nf-core-bot avatar nschcolnicov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phaseimpute's Issues

Add test for phasing panel

Description of the bug

Most of the small tests we use have phased = true. A test should be added that evaluates the output of the VCF_PHASE_PANEL subworkflow. Specifically this part:

    if (params.phased == false) {
        VCF_PHASE_SHAPEIT5(ch_vcf
            .map { meta, vcf, csi -> [meta, vcf, csi, [], meta.region] },
        Channel.of([[],[],[]]).collect(),
        Channel.of([[],[],[]]).collect(),
        Channel.of([[],[]]).collect())
        ch_versions = ch_versions.mix(VCF_PHASE_SHAPEIT5.out.versions)
        ch_panel_phased = VCF_PHASE_SHAPEIT5.out.variants_phased
            .combine(VCF_PHASE_SHAPEIT5.out.variants_index, by: 0)
    } else {
        ch_panel_phased = ch_vcf
    }

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add required arguments as external params to start the pipeline

Description of feature

Currently, the pipeline requires the channels from --step panelprep to run the following modes. To start the pipeline simply from different steps (such as --step impute), some files are necessary depending on the tool used (such as chunks, posfile, bams and panel ). The bams and panel can be added via a csv with --input and --panel. We should add the other files as external params.

Tasks

  1. atrigila
  2. enhancement
  3. bug
    atrigila

CSV tests need to point to nf-core repository

Description of the bug

While the current testing of the pipeline works, the CSVs are not stored in the nf-core repository. This should be corrected for reproducibility issues.

Example phaseimpute/tests/csv:

sample,vcf,csi
NA12878,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA12878/NA12878.s.1x.bcf,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA12878/NA12878.s.1x.bcf.csi
NA19401,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA19401/NA19401.s.1x.bcf,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA19401/NA19401.s.1x.bcf.csi
NA20359,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA20359/NA20359.s.1x.bcf,https://raw.githubusercontent.com/louislenezet/test-datasets/imputation/data/individuals/NA20359/NA20359.s.1x.bcf.csi

Command used and terminal output

No response

Relevant files

No response

System information

No response

Uniformize tools name

Description of the bug

We need to normalize the names of tools used such as BCFTOOLS_INDEX.
Same for the use of GAWK.

What can be done:

  • All process use twice in subworkflow add "_1/_2" to separate them
  • All sbwf names as "input_process_tool"

Tasks

Correctly separate panel preparation and imputation

Description of the bug

The panel preparation and the imputation are not yet done separately for glimpse and quilt.
The aim would be to do all the preprocessing in the get_panel sbwf to not have duplicated modules and better readability.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Tasks

  1. atrigila

modules.config should include the subworkflow

Description of the bug

There are several modules that are used many times in the pipeline. However, in modules.config the configuration for these modules is global and not specific for the subworkflows. An example:

    withName: GLIMPSE_CHUNK {
        ext.args = [
            "--window-size 200000",
            "--buffer-size 20000"
        ].join(' ')
        ext.prefix = { "${meta.id}" }
    }

This global config affects all the subworkflows that use the same module, even when those have their own config.

Specific config:

    withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:GLIMPSE_CHUNK' {

        ext.prefix = { "${meta.id}_${meta.chr}" }
    }

Example of error:

Caused by:
  Process `NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:GLIMPSE_CHUNK (1000G_phased)` terminated with an error exit status (1)

Command executed:

  GLIMPSE_chunk \
      --window-size 200000 --buffer-size 20000 \
      --input ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
      --region chr22 \
      --thread 2 \
      --output 1000G_phased_chr22.txt

Notice how --window-size and --buffer-size are NOT defined in the NFCORE_PHASEIMPUTE:PHASEIMPUTE:MAKE_CHUNKS:GLIMPSE_CHUNK but these are executed anyway.

Therefore, we should strive to add the corresponding subworkflow for each module configuration. This should be solved before adding new functionality as it can have a snowball effect.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Move to nf-schema

Description of feature

Move to the new plugin nf-schema instead of the actual nf-validation

subworkflow: remove sample from reference panel

Description of feature

Proposal: Adding an optional subworkflow to remove a specified sample from the reference panel. For example, NA12878 is always included in the reference panel and is the typical sample used for assessing performance.

Version ready for review v0.99.0

Description of feature

Before the first release here is the different issues that need to be adressed:

Tasks

  1. documentation
    LouisLeNezet
  2. 3 of 3
    bug
    LouisLeNezet
  3. enhancement
  4. 9 of 9
    bug
    LouisLeNezet
  5. bug
    LouisLeNezet
  6. 0 of 5
    enhancement
    atrigila
  7. bug

Run full scale test

The full scale test should work.
Which test should we run at full scale ?
I think the most exhaustive would be a simulation, pre-processing of the panel, phasing of the later, imputation and validation.

Check mandatory params for each step

Description of the bug

Some params are mandatory for some steps, while others are not.

For instance, users may want to run --step panelprep to obtain all the necessary panel files. However, they do not need an --input or --tools in this case.

Command used and terminal output

nextflow run phaseimpute -profile test_panelprep,docker --outdir test

ERROR ~ No tools provided. Expression: params.tools

 -- Check script 'phaseimpute/./subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf' at line: 291 or see '.nextflow.log' file for more details

Relevant files

No response

System information

No response

Error: Panel index file requirement - Invalid filename format and missing file extension

Description of the bug

The schema_input_panel.json required that the "panel index file cannot contain spaces and must have extension '.vcf' or '.bcf' with '.csi' or '.tbi' extension". However, 1000G index files, such as s3://1000genomes/release/20130502/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
end with extension vcf.gz.tbi

Solution:

            "index": {
                "type": "string",
                "pattern": "^\\S+\\.(vcf|bcf)(\\.gz)?\\.(tbi|csi)$",
                "errorMessage": "Panel index file must be provided, cannot contain spaces and must have extension '.vcf' or '.bcf' with optional '.gz' extension and with '.csi' or '.tbi' extension"
            }

Command used and terminal output

No response

Relevant files

No response

System information

No response

Test profiles configuration

Description of feature

  • Include missing test profiles to the nextflow.config file: test_sim & test_panelprep
  • Modify input files: set AWS paths for test files

Imputation of chrX

Description of feature

Develop a module that can handle non PAR chr X regions and perform imputation.

simplify posfile generation and processing

Description of feature

In the panel preparation phase, we generate two types of tsv:

  • one for GLIMPSE1 with structure as: -f'%CHROM\t%POS\t%REF,%ALT\\n'
  • another for STITCH with structure as: -f'%CHROM\t%POS\t%REF\t%ALT\\n'

I think it would be convenient to generate only a single type of tsv. This would be useful when using these files as independent inputs with param --posfile, so that they can have the same post-processing.

To make them specific to each tool, we could add a pre-processing step where we replace the last \t with ,, for example.

Fix test profiles

Description of the bug

Test profiles currently contain local paths, i.e test_full.config
image

The test profiles that need correcting are:
test_full.config
test_panelprep.config
test_sim.config
test.config

Command used and terminal output

No response

Relevant files

No response

System information

dev

Select two tools at the same time

Description of the bug

The pipeline currently allows for selecting only one single imputation tool. This behavior should be modified as users may want to use more than one tool for comparison.

ERROR ~ ERROR: Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --tools: 'glimpse1,quilt' is not a valid choice (Available choices: glimpse1, glimpse2, quilt)

Command used and terminal output

nextflow run phaseimpute -profile test,singularity --outdir test_both --tools glimpse1,quilt

Relevant files

No response

System information

No response

Reference panel should accept a CSV

Description of feature

The reference panel currently accepts a BCF. Example:

panel = "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/data/panel/21_22/1000GP.chr21_22.s.norel.bcf"

Reference panels, such as 1000G, are generally stored per chromosome.

It would be useful if the reference panel input flag would accept a CSV. The csv could contain the chromosome names and the URL to the reference panel. For example:

The samplesheet can have as many columns as you desire, however, there is a strict requirement for at least 2 columns to match those defined in the table below.

A final samplesheet file for the reference panel may look something like the one below. This is for 3 chromosomes.

chr,vcf
1,ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
2,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
3,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
Column Description
chr Name of the chromosome. Use the prefix 'chr' if the panel uses the prefix.
vcf Full path to a VCF file for that chromosome. File has to be gzipped and have the extension ".vcf.gz".gz".

Each row represents a chromosome with its corresponding VCF file, containing information about the reference haplotype panel. You can obtain reference panels from publicly available sources such as the 1000 Genomes Project phase 3.

The second column, vcf, can directly point to publicly available remote S3 buckets with the 1000G reference panels.

An example of this is: https://github.com/atrigila/quilt_nextflow/blob/master/docs/usage.md#structure-1

Tasks

No tasks being tracked yet.

Rewrite GLIMPSE1 subworkflow to use the panel preparation

Description of the bug

For the moment glimpse1 subworkflow is autonomous as available in nf-core.
But it is not compatible with the pipeline as it is.
We should create a new like for glimpse2 to allow the chunks to be computed outside the sbwf.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add nf-test to the pipeline

Description of feature

It would be nice to test if the pipeline produce the good files each time we run it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.