Comments (4)
Hi,
There is a warning message at the beginning of that output that might help out:
INFO Checking if sequence data files exist and if sequence IDs are compatible with wgd pipeline...
WARNING Poa trivialis: sequence IDs in FASTA file [cds_final.fasta] could raise an error due to:
WARNING - ID length longer than 50 characters, it is advised to shorten them
WARNING - ID name contains one or more characters that are not allowed: =
INFO Completed
We noticed in the past that long sequence names or sequence names with unusual characters (like =) would become problematic. Therefore we suggest to short out the sequence names (but make sure that they still remain unique per sequence), perhaps removing any spaces and unusual characters (in this case the =).
Afterwards, delete the already generated BLAST table and MCL file and rerun the paralog-ks command from scratch.
I hope this can be of help, let me know how it goes!
Cecilia
from ksrates.
Hi Cecilia
Thank you for your quick reply. I was able to edit the fasta files to remove the equal signs and shorten the sequence length while keeping them unique. I am still getting the same error. Any other suggestions would be appreciated. Thank you!
(WGD) [cpb5881@p-sc-2001 wgd]$ ksrates paralogs-ks config_filename.txt --n-threads 16
INFO - - - - - - - - - - - - - - - - - - - -
INFO Paralog wgd analysis for species [poatr]
INFO Thu Jun 2 10:43:48 2022
INFO - - - - - - - - - - - - - - - - - - - -
INFO Checking if sequence data files exist and if sequence IDs are compatible with wgd pipeline...
INFO Completed
INFO Creating directory [paralog_distributions/]
INFO Running wgd paralog Ks pipeline...
INFO ---
INFO Checking external software...
INFO makeblastdb: 2.5.0+
INFO blastp: 2.5.0+
INFO mcl 14-137
INFO muscle 5.1.linux64 []
INFO AAML in paml version 4.9j, February 2020
INFO Usage for FastTree version 2.1.11 Double precision (No SSE3):
INFO Creating output directory /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr
INFO Translating CDS file cds_final_edited_edited_edited_edited.fasta...
INFO ---
INFO Running all versus all Blastp
INFO Writing protein Blastdb sequences to /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/...
INFO Writing protein query sequences to /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/...
INFO Performing all versus all Blastp (this might take a while)...
INFO Making Blastdb
INFO makeblastdb -in /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta -dbtype prot
INFO makeblastdb output:
Building a new DB, current time: 06/02/2022 10:43:55
New DB name: /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta
New DB title: /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 33248 sequences in 1.24499 seconds.
INFO Running Blastp
INFO blastp -db /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.db.fasta -query /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast_tmp/poatr.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 16 -out /storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.blast.tsv
INFO All versus all Blastp done
INFO Removing tmp directory
INFO ---
INFO Running gene family construction (MCL clustering with inflation factor = 2.0)
INFO Started MCL clustering (mcl)
INFO ---
INFO Running whole paranome Ks analysis...
WARNING Filtered out the 3 largest gene families because their size is > 200
WARNING If you want to analyse these large families anyhow, please raise the `max_gene_family_size` parameter
INFO Started analysis of 5019 gene families in parallel using 16 threads
INFO Performing analysis on gene family GF_000004 (size 165)
ERROR Unexpected internal error during analysis of gene family GF_000004:
Traceback (most recent call last):
File "/storage/home/cpb5881/.local/lib/python3.8/site-packages/wgd_ksrates/ks_distribution.py", line 278, in analyse_family_try_except
analysis_function(family_id, family, nucleotide, tmp, codeml, preserve,
File "/storage/home/cpb5881/.local/lib/python3.8/site-packages/wgd_ksrates/ks_distribution.py", line 371, in analyse_family
msa_path, stats, successful = prepare_aln(msa_path_protein, nucleotide)
File "/storage/home/cpb5881/.local/lib/python3.8/site-packages/wgd_ksrates/alignment.py", line 43, in prepare_aln
with open(msa_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.ks_tmp/GF_000004.fasta.msa'
ERROR Skipping gene family
INFO Performing analysis on gene family GF_000005 (size 138)
... [same error for the other 54 gene families]
FileNotFoundError: [Errno 2] No such file or directory: '/storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.ks_tmp/GF_000054.fasta.msa'
ERROR Skipping gene family
ERROR Too many gene family analyses failed, terminating threads...
ERROR Too many gene family analyses failed, terminating threads...
ERROR Too many gene family analyses failed, terminating threads...
ERROR Too many gene family analyses failed, terminating threads...
ERROR Too many gene family analyses failed, terminating threads...
ERROR Too many gene family analyses failed, terminating threads...
ERROR --
ERROR The analyses of more than 1% of gene families [51/5019] have failed due to unexpected internal errors
ERROR Please check the nature of the error(s), remove the tmp directory [/storage/group/cpb5881/default/WGD/wgd/paralog_distributions/wgd_poatr/poatr.ks_tmp] and rerun the Ks analysis
ERROR See the tracebacks above for the following gene family IDs:
ERROR GF_000004
ERROR GF_000005
ERROR GF_000006
ERROR GF_000007
ERROR GF_000008
ERROR GF_000009
ERROR GF_000010
ERROR GF_000011
ERROR GF_000012
ERROR GF_000013
ERROR GF_000014
ERROR GF_000015
ERROR GF_000016
ERROR GF_000017
ERROR GF_000018
ERROR GF_000019
ERROR GF_000020
ERROR GF_000021
ERROR GF_000022
ERROR GF_000023
ERROR GF_000024
ERROR GF_000025
ERROR GF_000026
ERROR GF_000027
ERROR GF_000028
ERROR GF_000029
ERROR GF_000030
ERROR GF_000031
ERROR GF_000032
ERROR GF_000033
ERROR GF_000034
ERROR GF_000035
ERROR GF_000036
ERROR GF_000037
ERROR GF_000038
ERROR GF_000039
ERROR GF_000040
ERROR GF_000041
ERROR GF_000042
ERROR GF_000043
ERROR GF_000044
ERROR GF_000045
ERROR GF_000046
ERROR GF_000047
ERROR GF_000048
ERROR GF_000049
ERROR GF_000050
ERROR GF_000051
ERROR GF_000052
ERROR GF_000053
ERROR GF_000054
ERROR Exiting
from ksrates.
I think I figured out what the issue was (or at least what solved the issue). I downgraded muscle from 5.1 to 3.8 and this step is now completing and a ks file is being generated.
from ksrates.
Hi,
Great that you could find the cause! Sorry about this versioning issue, we indeed developed and tested it with muscle 3.8.31.
Cecilia
from ksrates.
Related Issues (17)
- A tree with branch length set to "rate-adjusted mixed Ks distances" and its Newick string? HOT 1
- Error when executing ksrates test in nextflow HOT 5
- Incompatibility with recent Nextflow version HOT 4
- singularity HOT 5
- TypeError: cannot convert the series to <class 'float'> HOT 10
- installation error HOT 2
- Warning: Dubious indirect gene relationship - closest genes get same color in alignment HOT 2
- ksrates paralogs-ks seemingly freezing on i-adhore step HOT 9
- ERROR Unexpected internal error during analysis of gene family GF_000001 HOT 4
- ERROR Unexpected internal error during analysis of gene family GF_000001 HOT 2
- ortholog_peak_db.tsv, ortholog_ks_list_db.tsv database are not generated HOT 2
- where would I find the equation or lognorm fit parameters to the Ks distributions? HOT 10
- Installation problem HOT 11
- No output files HOT 17
- Updated Errors HOT 11
- Difficulty in Output Plots Interpretation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ksrates.