Giter VIP home page Giter VIP logo

Comments (10)

muffato avatar muffato commented on August 15, 2024

Hi @abdo3a .
I haven't seen that error before. Can you still try the pipeline with the code from the fixes_for_prod branch just in case it's a collateral of something else ? I've fixed quite a few bugs on that branch and was going to make the v0.3 release out of it.

If you're still seeing the issue, would you be able to check the input file – in this example /home/sharafa/Ahmep_genome/work/5f/0f95c92e19f521dc31245971f4fd9a/Ahemp.tsv ? Especially: does it have any data at all ? It feels like an earlier step produced an empty file, and we may need to trace those steps back.

from blobtoolkit.

abdo3a avatar abdo3a commented on August 15, 2024

Thanks @muffato for quick reply,
While trying the code from the fixes_for_prod branch as you suggested but i got a new error from BLAST_BLASTN process.

-[sanger-tol/blobtoolkit] Pipeline completed with errors-
ERROR ~ Error executing process > 'SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:RUN_BLASTN:BLAST_BLASTN (Ahemp)'

Caused by:
  Process `SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:RUN_BLASTN:BLAST_BLASTN (Ahemp)` terminated with an error exit status (1)

Command executed:

  if [ "false" == "true" ]; then
      gzip -c -d Ahemp.chunks.fasta > Ahemp.chunks.fasta
  fi
  
  DB=`find -L ./ -name "*.nin" | sed 's/\.nin$//'`
  blastn \
      -num_threads 6 \
      -db $DB \
      -query Ahemp.chunks.fasta \
       \
      -outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1' \
      -out Ahemp.txt
  
  cat <<-END_VERSIONS > versions.yml
  "SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:RUN_BLASTN:BLAST_BLASTN":
      blast: $(blastn -version 2>&1 | sed 's/^.*blastn: //; s/ .*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Unable to find image 'quay.io/biocontainers/blast:2.14.1--pl5321h6f7f691_0' locally
  2.14.1--pl5321h6f7f691_0: Pulling from biocontainers/blast
  642efca944a0: Already exists
  bd9ddc54bea9: Already exists
  bfa1a70cade6: Pulling fs layer
  bfa1a70cade6: Verifying Checksum
  bfa1a70cade6: Download complete
  bfa1a70cade6: Pull complete
  Digest: sha256:0fa116b90c6411d5b09cdda5ca81a857167d218c49915104e7e1588b16baedf7
  Status: Downloaded newer image for quay.io/biocontainers/blast:2.14.1--pl5321h6f7f691_0
  USAGE
    blastn [-h] [-help] [-import_search_strategy filename]
      [-export_search_strategy filename] [-task task_name] [-db database_name]
      [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
      [-negative_gilist filename] [-negative_seqidlist filename]
      [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
      [-negative_taxidlist filename] [-entrez_query entrez_query]
      [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
      [-subject subject_input_file] [-subject_loc range] [-query input_file]
      [-out output_file] [-evalue evalue] [-word_size int_value]
      [-gapopen open_penalty] [-gapextend extend_penalty]
      [-perc_identity float_value] [-qcov_hsp_perc float_value]
      [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
      [-xdrop_gap_final float_value] [-searchsp int_value] [-penalty penalty]
      [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value]
      [-template_type type] [-template_length int_value] [-dust DUST_options]
      [-filtering_db filtering_database]
      [-window_masker_taxid window_masker_taxid]
      [-window_masker_db window_masker_db] [-soft_masking soft_masking]
      [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
      [-best_hit_score_edge float_value] [-subject_besthit]
      [-window_size int_value] [-off_diagonal_range int_value]
      [-use_index boolean] [-index_name string] [-lcase_masking]
      [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format]
      [-show_gis] [-num_descriptions int_value] [-num_alignments int_value]
      [-line_length line_length] [-html] [-sorthits sort_hits]
      [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
      [-num_threads int_value] [-mt_mode int_value] [-remote] [-version]
  
  DESCRIPTION
     Nucleotide-Nucleotide BLAST 2.14.1+
  
  Use '-help' to print detailed descriptions of command line arguments
  ========================================================================
  
  Error: Too many positional arguments (1), the offending value: Ahemp.chunks.fasta
  Error:  (CArgException::eSynopsis) Too many positional arguments (1), the offending value: Ahemp.chunks.fasta

Work dir:
  /home/sharafa/Ahmep_genome/work/ad/bab0769e69a33818fbc5793659fa56

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

from blobtoolkit.

abdo3a avatar abdo3a commented on August 15, 2024

Hi again, i managed to solve the BLAST_BLASTN process issue by using the blastn module from the original branch but still struggling with BLOBTOOLKIT_WINDOWSTATS, I think it's related to the yaml file. i created my own using the released genome example since the link for the draft genome example is not working.

from blobtoolkit.

muffato avatar muffato commented on August 15, 2024

Hi @abdo3a . What database are you using for blastn ? If the complete NT database, then you must use the version of the BLAST_BLASTN module from the fixes_for_prod branch. This is because very large databases have got many .nin files that make the module confused.

Secondly, the Nextflow pipeline does not take the yaml file as input to configure its steps. It's only used to populate some fields like the taxonomy etc for the final blobdir. All the configuration is done via Nextflow parameters.

from blobtoolkit.

abdo3a avatar abdo3a commented on August 15, 2024

Hi @muffato,
Yes, i'm using the full NT database, but when i'm using the version of the BLAST_BLASTN module from the fixes_for_prod branch. it produces this error.

Error: Too many positional arguments (1), the offending value: Ahemp.chunks.fasta
Error:  (CArgException::eSynopsis) Too many positional arguments (1), the offending value: Ahemp.chunks.fasta

Regards ymal met data file, i noticed the comment about ignoring ymal file with Nextflow but when i tried running it without the ymal file it produced the following error:

ERROR ~ Error executing process > 'SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:INPUT_CHECK:BLOBTOOLKIT_CONFIG (Ahemp)'

Caused by:
  Process `SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:INPUT_CHECK:BLOBTOOLKIT_CONFIG (Ahemp)` terminated with an error exit status (1)

Command executed:

  btk pipeline \
      generate-config \
      Ahemp \
       \
      --reads Ahemp
  
  cat <<-END_VERSIONS > versions.yml
  "SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:INPUT_CHECK:BLOBTOOLKIT_CONFIG":
      blobtoolkit: $(btk --version | cut -d' ' -f2 | sed 's/v//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  2024-01-31 11:03:34.557 [INFO] Fetching assembly metadata
  Traceback (most recent call last):
    File "/opt/conda/envs/btk_env/bin/btk", line 8, in <module>
      sys.exit(cli())
    File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/btk/btk.py", line 80, in cli
      sys.exit(subcommand())
    File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/btk/lib/pipeline.py", line 11, in main
      cli("btk pipeline")
    File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/blobtoolkit_pipeline.py", line 52, in cli
      sys.exit(subcommand(rename))
    File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/lib/generate_config.py", line 736, in main
      meta = parse_assembly_meta(accession)
    File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/lib/generate_config.py", line 399, in parse_assembly_meta
      root = ET.fromstring(xml)
    File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/defusedxml/common.py", line 126, in fromstring
      parser.feed(text)
    File "/opt/conda/envs/btk_env/lib/python3.9/xml/etree/ElementTree.py", line 1717, in feed
      self.parser.Parse(data, False)
  TypeError: a bytes-like object is required, not 'NoneType'

Work dir:
  /home/sharafa/Ahmep_genome/work/c9/da89ad80fa3fcc0d1d699592ddba96

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

from blobtoolkit.

muffato avatar muffato commented on August 15, 2024
  1. Can you print the file .command.out that is in the work directory of the BLAST_BLASTN job ? There will be a line that starts with Using and should be the name of the directory in which you have the NT database followed by /nt.

  2. TypeError: a bytes-like object is required, not 'NoneType' makes sense in that context, since you have a pre-curation assembly. It's meant to be saying that Ahemp is not a valid accession number. I'll raise an error upstream so that the tool prints a clearer error message.

  3. Can you paste your sample-sheet ? I wonder if something's not being parsed correctly.

from blobtoolkit.

abdo3a avatar abdo3a commented on August 15, 2024
  1. i can re-produce this error again. maybe i used the wrong branch.
  2. Then what i should use for a draft genome, i used Ahemp as TAG.
  3. Here is my sample-sheet:
sample,datatype,datafile
Ahemp,ont,/home/sharafa/Ahmep_genome/ont.cram

from blobtoolkit.

muffato avatar muffato commented on August 15, 2024

Hi @abdo3a . Sorry for the silence !

I've given the pipeline a go with a config file and managed to get it to work on a non-accessioned genome.

First, here is the minimal Yaml file I had to provide:

assembly:
  level: bar
settings:
  foo: 0
similarity:
  diamond_blastx:
    foo: 0
taxon:
  class: class_name
  family: family_name
  genus: genus_name
  kingdom: kingdom_name
  name: species_name
  order: order_name
  phylum: phylum_name
  superkingdom: superkingdom_name
  taxid: 0

All those keys have to be present, but the values are ignored and don't matter. Everything else you would find in a typical BlobToolKit yaml file is superfluous.

Then, to run the pipeline, your command-line from #91 (comment) should work, i.e.:

  • Provide this Yml file to the --yaml parameter
  • Give the name of your assembly as --accession. That's used to name various files.
  • Give the name of your species as --taxon. That's used to select the relevant Busco lineages
  • Give your samplesheet as --input. The one from #91 (comment) seems fine

For completeness, here is the command I've used for my tests:

nextflow run ~/workspace/tol-it/nextflow/sanger-tol/blobtoolkit -profile test,singularity --yaml $PWD/test.yml --accession draft

I mentioned the branch fixes_for_prod in an earlier comment. This branch has now been merged into the dev branch, and we'll make the 0.3.0 release out of it very soon.

from blobtoolkit.

abdo3a avatar abdo3a commented on August 15, 2024

Hello @muffato,
Just reporting that the pipeline works now after following your suggestion. thanks and will close this

from blobtoolkit.

muffato avatar muffato commented on August 15, 2024

Super, thank you for confirming, @abdo3a .

The next version will simplify usage on draft assemblies

from blobtoolkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.