Giter VIP home page Giter VIP logo

ncov-dehoster's People

Contributors

connorchato avatar darianhole avatar takadonet avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ncov-dehoster's Issues

Docs update, Flow update, and Help statement update

Want to change up how the input data and pipeline are parsed to be more in line with more recent developments

Ideal Flow Changes

  • Validate inputs with groovy script
    • Will change how the main.nf script flows (simplifies it)
  • Help command with groovy script
  • Print given inputs to log at the start
  • Fix trace/execution outputs
  • Add to tests

Docs:

  • Better description of inputs and when to use certain flags like the --flat one
  • Update Manifest

Other:

  • Add license

Better Fastq Filtering [ Nanopore Minimap2 ]

#17 works as a quick solution to dealing with outputs that have no fastq data and preventing the next process, regenerateFast5s_MM2, from failing.

However it still creates the fastq file and folder in the final fastq_pass output directory. A change should be made so that the fastq file is also filtered out to generate a clearer output

not working regenerateFastqFiles.

Hi,
I am really happy to look a great code. Maybe I am doing something wrong, I can't get any filtered fastqs after running this command line. In addition, can we use the resume function for this pipeline? I tried to find the contact information, but I couldn't. So I decided to leave my question here. Please let me know if you don't like this question to be here, I will delete it.

Thank you,

nextflow run phac-nml/ncov-dehoster -profile conda --nanopore --minimap2 --fastq_directory MG_adapted --human_ref GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --run_name adative_seq -resume --min_length 1

Empty nanopore fastq files cause fast5 regeneration to fail minimap2

Basically, when a dehosted nanopore fastq file is empty (0 reads) in the minimap2 process, the fast5-dehost-regenerate.sh script fails to run

Either need to change the two processes (regenerateFastqFiles and regenerateFast5s_MM2) or find a way to filter the regenerateFastqFiles tuple on fastq count

Flat Output Option [ Nanopore Minimap2]

Allow user to specify a flat output option

Current output is organized as such to make it easier to work with demultiplexed data from the instrument:

YYYYY
    ├── removal_summary.csv
    └── run
        ├── fast5_pass
        │   ├── barcode27
        │   │   ├── barcode27_0.fast5
        │   │   ├── barcode27_10.fast5
        │   │   ├── barcode27_11.fast5
        │   │   ├── barcode27_1.fast5
        │   │   ├── barcode27_2.fast5
        │   │   ├── barcode27_3.fast5
        │   │   ├── barcode27_4.fast5
        │   │   ├── barcode27_5.fast5
        │   │   ├── barcode27_6.fast5
        │   │   ├── barcode27_7.fast5
        │   │   ├── barcode27_8.fast5
        │   │   ├── barcode27_9.fast5
        │   │   └── filename_mapping.txt
        │   ├── barcode51
        │   │   ├── barcode51_0.fast5
        │   │   ├── barcode51_1.fast5
        │   │   ├── barcode51_2.fast5
        │   │   ├── barcode51_3.fast5
        │   │   ├── barcode51_4.fast5
        │   │   ├── barcode51_5.fast5
        │   │   ├── barcode51_6.fast5
        │   │   ├── barcode51_7.fast5
        │   │   ├── barcode51_8.fast5
        │   │   └── filename_mapping.txt
        │   └── barcode96
        │       ├── barcode96_0.fast5
        │       └── filename_mapping.txt
        ├── fastq_pass
        │   ├── barcode27
        │   │   └── barcode27.host_removed.fastq
        │   ├── barcode29
        │   │   └── barcode29.host_removed.fastq
        │   ├── barcode51
        │   │   └── barcode51.host_removed.fastq
        │   └── barcode96
        │       └── barcode96.host_removed.fastq
        └── sequencing_summary.txt

Change would be to make the fastq_pass and fast5_pass flat (no additional directories)

YYYYY
    ├── removal_summary.csv
    └── run
        ├── fast5_pass
        │   ├── barcode27_0.fast5
        │   ├── barcode27_10.fast5
        │   ├── barcode27_11.fast5
        │   ├── barcode27_1.fast5
        │   ├── barcode27_2.fast5
        │   ├── barcode27_3.fast5
        │   ├── barcode27_4.fast5
        │   ├── barcode27_5.fast5
        │   ├── barcode27_6.fast5
        │   ├── barcode27_7.fast5
        │   ├── barcode27_8.fast5
        │   ├── barcode27_9.fast5
        │   ├── barcode51_0.fast5
        │   ├── barcode51_1.fast5
        │   ├── barcode51_2.fast5
        │   ├── barcode51_3.fast5
        │   ├── barcode51_4.fast5
        │   ├── barcode51_5.fast5
        │   ├── barcode51_6.fast5
        │   ├── barcode51_7.fast5
        │   ├── barcode51_8.fast5
        │   ├── barcode96_0.fast5
        │   └── filename_mapping.txt
        ├── fastq_pass
        │   ├── barcode27
        │   ├── barcode27.host_removed.fastq
        │   ├── barcode29.host_removed.fastq
        │   ├── barcode51.host_removed.fastq
        │   └── barcode96.host_removed.fastq
        └── sequencing_summary.txt

Reformat Nanopore Host Removal Codeblock

I think that this should work if you want to cut the pass in 30-33.

if (read.reference_name != contig_ID and read.mapping_quality >= remove_minimum_quality) or read.query_name in reads_to_remove_set:
    h_count += 1
    pass

If you never plan on passing an actual value to reads_to_remove_set, you can also just cut that as well and do

if read.reference_name != contig_ID and read.mapping_quality >= remove_minimum_quality:
    h_count += 1
    pass

Originally posted by @ConnorChato in #14 (comment)

Significant speed enhancement that should be done sooner rather than later

Keep Full Fastq Read Identifiers [ Nanopore Minimap2 ]

As title says, keep the full name of each of the fastq reads being generated in the nanopore minimap2 pipeline (so that we can do strict demultiplexing post host removal and other specifics that utilize that information)

Ex Currently.

@2f33f219-86dd-4b14-9442-2b1e068b7b22 runid=28fdafe3bad100kjfd94mh2bmfda932 read=84325 ch=426 start_time=2021-04-15T08:56:07.367881+00:00 flow_cell_id=FAR protocol_group_id=YYYY sample_id=YYYY barcode=barcode96 barcode_alias=barcode96 barcode=barcode96
ATGC
+
%$%'

Becomes

@2f33f219-86dd-4b14-9442-2b1e068b7b22
ATGC
+
%$%'

Want to keep the whole line

Add Nanopore Dehosting

Add in nanopore dehosting support with the following stipulations:

  • dehosting fast5 files
  • re-basecalling fastq files
  • size selection of fastq reads

Slightly tricky as guppy is proprietary and must be configured on the system beforehand to run this properly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.