Comments (10)
Hi @abdo3a .
I haven't seen that error before. Can you still try the pipeline with the code from the fixes_for_prod
branch just in case it's a collateral of something else ? I've fixed quite a few bugs on that branch and was going to make the v0.3 release out of it.
If you're still seeing the issue, would you be able to check the input file – in this example /home/sharafa/Ahmep_genome/work/5f/0f95c92e19f521dc31245971f4fd9a/Ahemp.tsv
? Especially: does it have any data at all ? It feels like an earlier step produced an empty file, and we may need to trace those steps back.
from blobtoolkit.
Thanks @muffato for quick reply,
While trying the code from the fixes_for_prod
branch as you suggested but i got a new error from BLAST_BLASTN
process.
-[sanger-tol/blobtoolkit] Pipeline completed with errors-
ERROR ~ Error executing process > 'SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:RUN_BLASTN:BLAST_BLASTN (Ahemp)'
Caused by:
Process `SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:RUN_BLASTN:BLAST_BLASTN (Ahemp)` terminated with an error exit status (1)
Command executed:
if [ "false" == "true" ]; then
gzip -c -d Ahemp.chunks.fasta > Ahemp.chunks.fasta
fi
DB=`find -L ./ -name "*.nin" | sed 's/\.nin$//'`
blastn \
-num_threads 6 \
-db $DB \
-query Ahemp.chunks.fasta \
\
-outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1' \
-out Ahemp.txt
cat <<-END_VERSIONS > versions.yml
"SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:RUN_BLASTN:BLAST_BLASTN":
blast: $(blastn -version 2>&1 | sed 's/^.*blastn: //; s/ .*$//')
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
Unable to find image 'quay.io/biocontainers/blast:2.14.1--pl5321h6f7f691_0' locally
2.14.1--pl5321h6f7f691_0: Pulling from biocontainers/blast
642efca944a0: Already exists
bd9ddc54bea9: Already exists
bfa1a70cade6: Pulling fs layer
bfa1a70cade6: Verifying Checksum
bfa1a70cade6: Download complete
bfa1a70cade6: Pull complete
Digest: sha256:0fa116b90c6411d5b09cdda5ca81a857167d218c49915104e7e1588b16baedf7
Status: Downloaded newer image for quay.io/biocontainers/blast:2.14.1--pl5321h6f7f691_0
USAGE
blastn [-h] [-help] [-import_search_strategy filename]
[-export_search_strategy filename] [-task task_name] [-db database_name]
[-dbsize num_letters] [-gilist filename] [-seqidlist filename]
[-negative_gilist filename] [-negative_seqidlist filename]
[-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
[-negative_taxidlist filename] [-entrez_query entrez_query]
[-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
[-subject subject_input_file] [-subject_loc range] [-query input_file]
[-out output_file] [-evalue evalue] [-word_size int_value]
[-gapopen open_penalty] [-gapextend extend_penalty]
[-perc_identity float_value] [-qcov_hsp_perc float_value]
[-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]
[-xdrop_gap_final float_value] [-searchsp int_value] [-penalty penalty]
[-reward reward] [-no_greedy] [-min_raw_gapped_score int_value]
[-template_type type] [-template_length int_value] [-dust DUST_options]
[-filtering_db filtering_database]
[-window_masker_taxid window_masker_taxid]
[-window_masker_db window_masker_db] [-soft_masking soft_masking]
[-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
[-best_hit_score_edge float_value] [-subject_besthit]
[-window_size int_value] [-off_diagonal_range int_value]
[-use_index boolean] [-index_name string] [-lcase_masking]
[-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format]
[-show_gis] [-num_descriptions int_value] [-num_alignments int_value]
[-line_length line_length] [-html] [-sorthits sort_hits]
[-sorthsps sort_hsps] [-max_target_seqs num_sequences]
[-num_threads int_value] [-mt_mode int_value] [-remote] [-version]
DESCRIPTION
Nucleotide-Nucleotide BLAST 2.14.1+
Use '-help' to print detailed descriptions of command line arguments
========================================================================
Error: Too many positional arguments (1), the offending value: Ahemp.chunks.fasta
Error: (CArgException::eSynopsis) Too many positional arguments (1), the offending value: Ahemp.chunks.fasta
Work dir:
/home/sharafa/Ahmep_genome/work/ad/bab0769e69a33818fbc5793659fa56
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
from blobtoolkit.
Hi again, i managed to solve the BLAST_BLASTN
process issue by using the blastn
module from the original branch but still struggling with BLOBTOOLKIT_WINDOWSTATS
, I think it's related to the yaml file. i created my own using the released genome example since the link for the draft genome example is not working.
from blobtoolkit.
Hi @abdo3a . What database are you using for blastn ? If the complete NT database, then you must use the version of the BLAST_BLASTN
module from the fixes_for_prod
branch. This is because very large databases have got many .nin
files that make the module confused.
Secondly, the Nextflow pipeline does not take the yaml file as input to configure its steps. It's only used to populate some fields like the taxonomy etc for the final blobdir. All the configuration is done via Nextflow parameters.
from blobtoolkit.
Hi @muffato,
Yes, i'm using the full NT database, but when i'm using the version of the BLAST_BLASTN
module from the fixes_for_prod
branch. it produces this error.
Error: Too many positional arguments (1), the offending value: Ahemp.chunks.fasta
Error: (CArgException::eSynopsis) Too many positional arguments (1), the offending value: Ahemp.chunks.fasta
Regards ymal met data file, i noticed the comment about ignoring ymal file with Nextflow but when i tried running it without the ymal file it produced the following error:
ERROR ~ Error executing process > 'SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:INPUT_CHECK:BLOBTOOLKIT_CONFIG (Ahemp)'
Caused by:
Process `SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:INPUT_CHECK:BLOBTOOLKIT_CONFIG (Ahemp)` terminated with an error exit status (1)
Command executed:
btk pipeline \
generate-config \
Ahemp \
\
--reads Ahemp
cat <<-END_VERSIONS > versions.yml
"SANGERTOL_BLOBTOOLKIT:BLOBTOOLKIT:INPUT_CHECK:BLOBTOOLKIT_CONFIG":
blobtoolkit: $(btk --version | cut -d' ' -f2 | sed 's/v//')
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
2024-01-31 11:03:34.557 [INFO] Fetching assembly metadata
Traceback (most recent call last):
File "/opt/conda/envs/btk_env/bin/btk", line 8, in <module>
sys.exit(cli())
File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/btk/btk.py", line 80, in cli
sys.exit(subcommand())
File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/btk/lib/pipeline.py", line 11, in main
cli("btk pipeline")
File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/blobtoolkit_pipeline.py", line 52, in cli
sys.exit(subcommand(rename))
File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/lib/generate_config.py", line 736, in main
meta = parse_assembly_meta(accession)
File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/lib/generate_config.py", line 399, in parse_assembly_meta
root = ET.fromstring(xml)
File "/opt/conda/envs/btk_env/lib/python3.9/site-packages/defusedxml/common.py", line 126, in fromstring
parser.feed(text)
File "/opt/conda/envs/btk_env/lib/python3.9/xml/etree/ElementTree.py", line 1717, in feed
self.parser.Parse(data, False)
TypeError: a bytes-like object is required, not 'NoneType'
Work dir:
/home/sharafa/Ahmep_genome/work/c9/da89ad80fa3fcc0d1d699592ddba96
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
from blobtoolkit.
-
Can you print the file
.command.out
that is in the work directory of theBLAST_BLASTN
job ? There will be a line that starts withUsing
and should be the name of the directory in which you have the NT database followed by/nt
. -
TypeError: a bytes-like object is required, not 'NoneType'
makes sense in that context, since you have a pre-curation assembly. It's meant to be saying thatAhemp
is not a valid accession number. I'll raise an error upstream so that the tool prints a clearer error message. -
Can you paste your sample-sheet ? I wonder if something's not being parsed correctly.
from blobtoolkit.
- i can re-produce this error again. maybe i used the wrong branch.
- Then what i should use for a draft genome, i used
Ahemp
as TAG. - Here is my sample-sheet:
sample,datatype,datafile
Ahemp,ont,/home/sharafa/Ahmep_genome/ont.cram
from blobtoolkit.
Hi @abdo3a . Sorry for the silence !
I've given the pipeline a go with a config file and managed to get it to work on a non-accessioned genome.
First, here is the minimal Yaml file I had to provide:
assembly:
level: bar
settings:
foo: 0
similarity:
diamond_blastx:
foo: 0
taxon:
class: class_name
family: family_name
genus: genus_name
kingdom: kingdom_name
name: species_name
order: order_name
phylum: phylum_name
superkingdom: superkingdom_name
taxid: 0
All those keys have to be present, but the values are ignored and don't matter. Everything else you would find in a typical BlobToolKit yaml file is superfluous.
Then, to run the pipeline, your command-line from #91 (comment) should work, i.e.:
- Provide this Yml file to the
--yaml
parameter - Give the name of your assembly as
--accession
. That's used to name various files. - Give the name of your species as
--taxon
. That's used to select the relevant Busco lineages - Give your samplesheet as
--input
. The one from #91 (comment) seems fine
For completeness, here is the command I've used for my tests:
nextflow run ~/workspace/tol-it/nextflow/sanger-tol/blobtoolkit -profile test,singularity --yaml $PWD/test.yml --accession draft
I mentioned the branch fixes_for_prod
in an earlier comment. This branch has now been merged into the dev
branch, and we'll make the 0.3.0 release out of it very soon.
from blobtoolkit.
Hello @muffato,
Just reporting that the pipeline works now after following your suggestion. thanks and will close this
from blobtoolkit.
Super, thank you for confirming, @abdo3a .
The next version will simplify usage on draft assemblies
from blobtoolkit.
Related Issues (20)
- subworkflow: busco
- subworkflow: cov_stats
- subworkflow: window_stats
- subworkflow: diamond_blastp
- subworkflow: diamond_blastx
- subworkflow: blastn
- subworkflow: blobtools
- subworkflow: view
- nf-core module: goat/taxonsearch HOT 1
- i/o automate input subworkflow
- Workflow diagram
- Clean up
- Clean up docs
- Remove `parameters.md`
- Improved generation of the summary Yaml file HOT 1
- Optional readmapping subworkflows
- Add optional read mapping subworkflow
- Missing BUSCO results
- Updates in place and broken Nextflow job cache
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blobtoolkit.