Giter VIP home page Giter VIP logo

arzwa / wgd Goto Github PK

View Code? Open in Web Editor NEW
78.0 5.0 40.0 73.93 MB

Python package and CLI for whole-genome duplication related analyses. This package is deprecated in favor of https://github.com/heche-psb/wgd.

Home Page: http://wgd.readthedocs.io/en/latest/

License: GNU General Public License v3.0

Jupyter Notebook 29.66% Python 70.07% Singularity 0.28%
wgd duplication polyploidy bioinformatics genomics evolution

wgd's People

Contributors

arzwa avatar cecilia-sensalari avatar heche-psb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

wgd's Issues

Plotting Ks distributions

Hi,
Sorry for disturbing you again.

I tried using wgd last week, which is a great tool! And In use, I have two questions:

  1. How to plot Ks distribution of one-to-one orthologs of C. papaya and A. thaliana, as shown in Fig1. A and E in "wgd—simple command line tools for the analysis of ancient whole-genome duplications"?
  2. Can I use the result of ksd directly to plot Ks distribution?

GREETING!

enconding error

Hi, I uesd WGD with the command :

 ksd --n_threads 8   $DIR/final_cds.out/format.final.cds.fa.blast.tsv.mcl    $DIR/format.final.cds.fa

 and got an error like that:
2020-10-08 10:07:41: INFO	Performing analysis on gene family GF_000280
--- Logging error ---
Traceback (most recent call last):
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/logging/__init__.py", line 994, in emit
    stream.write(msg)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/encodings/iso8859_15.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03c9' in position 32: character maps to <undefined>
Call stack:
  File "/ds3512/home/panyp/ruanjian/python3/bin/wgd", line 11, in <module>
    load_entry_point('wgd==1.1', 'console_scripts', 'wgd')()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd_cli.py", line 632, in ksd
    max_pairwise=max_pairwise
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd_cli.py", line 773, in ksd_
    max_pairwise=max_pairwise,
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/ks_distribution.py", line 645, in ks_analysis_paranome
    ) for family in sorted_families)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 749, in __call__
    n_jobs = self._initialize_backend()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 547, in _initialize_backend
    **self._backend_args)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 317, in configure
    self._pool = MemmapingPool(n_jobs, **backend_args)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/pool.py", line 600, in __init__
    super(MemmapingPool, self).__init__(**poolargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/pool.py", line 420, in __init__
    super(PicklingPool, self).__init__(**poolargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/pool.py", line 174, in __init__
    self._repopulate_pool()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/pool.py", line 239, in _repopulate_pool
    w.start()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 350, in __call__
    return self.func(*args, **kwargs)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/ks_distribution.py", line 289, in analyse_family
    os.path.basename(msa_path), preserve=preserve, times=times)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/codeml.py", line 312, in run_codeml
    d, likelihood = _parse_codeml_out(self.out_file)
  File "/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.1-py3.6.egg/wgd/codeml.py", line 127, in _parse_codeml_out
    logging.warning("No \u03c9 value for {0} - {1}!".format(gene_1, gene_2))
Message: 'No \u03c9 value for Maker00014056 - Maker00020689!'

my input file is fasta file and gff file like that:

>Maker00026579
ATGGCCACAGGAAAGCGGAAACTCACATTCATAGCCAACGATTCTCAAAG
AAAAACAGTATGCAAGAAAAGGAAGCAGTCACTGCTGAAGAAAACGGAGG
AACTCAGCACCCTTTGTGGCGTTGAAGCATGTGCTATAGTTTATGGCCCC
AATGATCATCGGCCAGAGATCTGGCCATCTGAATCGGGTGTCAAAAATGT
ACTGGGAAAGTTCATGAACAAGCCACAATGGGAGCAAAGCAAAAAGATGA
TGAACCAAGAGAGTTTCATTGCACAAAGTATCATGAAGAGTAAAGACAAG
TTACAGAAAGTTGTGAAGGAAAACAAGGAGATTGAAATGTCCTTGTTCAT
GGCTCAGTGCTTTCAGACAGGTATGTTTCAGCCTGATATCAATATGACCG
CAGCTGATATGAATGTTCTTTCATCGGAGATTGAACAGAACCTGAAGGAC
ATTGATAAAAGGATGGAAATGCTGAAAGCCAACCAGGTGACACCAAACCA
ACCCGATATTGAATCGTCAACATTCCAACCCCAGATAATGCAAACATCAG
CATTCCAACCCCAGATTCAAATACCAGCATTCGAAACCCAGATCCAAACA
CAAACATACCAATCCCAGATGGAAACACCAACATTTCAACCCCAGATGCA
ATCACCAGCATTATTCCAACCCCAGATACAAACTGCATCATACCAACCCC
ATATGCAAACACAGTCATACCATCCCCATATGCAAGCACCATCATTCCCA

and

 HiC_scaffold_3  maker   mRNA    52904539        52906776        .       -       .       ID=Maker00000001;_AED=0.83;_eAED=0.85;_QI=0|0|0|0.5|0|0|2|0|95;
HiC_scaffold_3  maker   CDS     52906635        52906776        .       -       0       Parent=Maker00000001;
HiC_scaffold_3  maker   CDS     52904539        52904684        .       -       2       Parent=Maker00000001;
HiC_scaffold_3  maker   mRNA    52889299        52891610        .       -       .       ID=Maker00000002;_AED=0.81;_eAED=1.00;_QI=0|0|0|0.5|0|0.5|2|0|82;
HiC_scaffold_3  maker   CDS     52891518        52891610        .       -       0       Parent=Maker00000002;
HiC_scaffold_3  maker   CDS     52889299        52889454        .       -       0       Parent=Maker00000002;
HiC_scaffold_3  maker   mRNA    52850941        52853577        .       -       .       ID=Maker00000003;_AED=0.56;_eAED=0.66;_QI=0|0|0|1|0|0|2|0|128;
HiC_scaffold_3  maker   CDS     52853434        52853577        .       -       0       Parent=Maker00000003;
HiC_scaffold_3  maker   CDS     52850941        52851183        .       -       0       Parent=Maker00000003;
HiC_scaffold_3  maker   mRNA    52876283        52881803        .       +       .       ID=Maker00000004;_AED=0.87;_eAED=1.00;_QI=0|0|0|0.5|0|0|2|453|51;

could you help me fix it ? thank you very much.

standard error of Ks from codeml

Hello,

I really enjoy using this tool for my Ks analysis. For some reason I have to get the standard error of Ks, and it would be perfect for me to get SE during the Ks analysis with wgd. I have tried a stupid way to change the getSE = 1 in the codeml.py and have --preserve to get the codeml result, and then wgd ksd failed before generating Ks result for a major part of the gene families, although there are still a few gene families succeeded in producing Ks result. I attached the error report here. This error is triggered by getSE = 1 in codeml.py, and the process goes well when getSE = 0.
Do you have any advise?

Thank you very much for your help!

nohup.txt

paranome is empty

Hi

I am having problem to run wgd dmd to some of my samples. The diamond result in wgd_dmd directory returns empty. It is happening for only some genomes, but works well in others. I supposed that it could be because there was no paralogs but when I run direct diamond I found paralogues.

I would appreciated any help.

The verbosity debug log is attached.

Thank you.
dmd_log.txt

Downstream analyses after wgd - dating WGD ?

Hi,
it's not really an issue... I would like to date the WGD events using the method in Vanneste et al (2014), using the anchors identified by wgd. As both wgd and the method used by Vanneste et al. came from the same lab, I wish there is some congruence/integration in the output files formats ?
So, can I use outputs from wgd to feed inparanoid, or is there an easier pipeline (the method used in Vanneste et al. seems entirely custom-made, or at least mostly manual...) that would integrate the output of wgd in the Vanneste et al. method ?

wgd viz error

Hi,
Sorry for disturbing you.
The wgd is a good tool for wgd analysis. However, I was puzzled with an error of wgd viz.

$ wgd viz -i -ks cds.cpa.fasta.ks.tsv,cds.mtr.fasta.ks.tsv 
BokehDeprecationWarning: 'WidgetBox' is deprecated and will be removed in Bokeh 3.0, use 'bokeh.models.Column' instead
[0]
/public-supool/home/shenchen/.local/lib/python3.6/site-packages/bokeh/models/plots.py:764: UserWarning: 
You are attempting to set `plot.legend.items` on a plot that has zero legends added, this will have no effect.

Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.

  warnings.warn(_LEGEND_EMPTY_WARNING % attr)
BokehDeprecationWarning: 'legend' keyword is deprecated, use explicit 'legend_label', 'legend_field', or 'legend_group' keywords instead
BokehDeprecationWarning: 'legend' keyword is deprecated, use explicit 'legend_label', 'legend_field', or 'legend_group' keywords instead
Traceback (most recent call last):
  File "/public-supool/home/shenchen/.local/bin/wgd", line 11, in <module>
    load_entry_point('wgd==1.1', 'console_scripts', 'wgd')()
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/wgd_cli.py", line 1265, in viz
    output_file, filters, ks_range, bins, interactive, weighted
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/wgd_cli.py", line 1329, in viz_
    histogram_bokeh(dists_files, labels)
  File "/public-supool/home/shenchen/.local/lib/python3.6/site-packages/wgd/viz.py", line 623, in histogram_bokeh
    session.loop_until_closed()  # run forever
AttributeError: 'ClientSession' object has no attribute 'loop_until_closed'

My computer skill is poor, so I hope you can tell me what should I do ?
Thank you very much!

issue on mcl stage

Hi, I'm trying to run the mcl stage on a genome I got and the wgd_blast is empty after the stage.
I'm not sure why.
this is what I get on the screen:

2019-08-15 16:53:27: INFO
2019-08-15 16:53:27: INFO
2019-08-15 16:53:27: INFO       mcl 14-137
Copyright (c) 1999-2014, Stijn van Dongen. mcl comes with NO WARRANTY
to the extent permitted by law. You may redistribute copies of mcl under
the terms of the GNU General Public License.
2019-08-15 16:53:27: INFO       CDS sequences provided, will first translate.
Sequence length != multiple of 3 for chr1!
In-frame STOP codon in chr1 at position 0:3
In-frame STOP codon in chr2 at position 27:30
Sequence length != multiple of 3 for chr3!
In-frame STOP codon in chr3 at position 45:48
In-frame STOP codon in chr4 at position 51:54
Sequence length != multiple of 3 for chr5!
In-frame STOP codon in chr5 at position 3:6
In-frame STOP codon in chr6 at position 57:60
Sequence length != multiple of 3 for chr7!
In-frame STOP codon in chr7 at position 171:174
Sequence length != multiple of 3 for chr8!
In-frame STOP codon in chr8 at position 45:48
In-frame STOP codon in chr9 at position 231:234
Sequence length != multiple of 3 for chr10!
In-frame STOP codon in chr10 at position 60:63
Sequence length != multiple of 3 for chr11!
In-frame STOP codon in chr11 at position 42:45
Sequence length != multiple of 3 for chr12!
In-frame STOP codon in chr12 at position 45:48
Sequence length != multiple of 3 for chr13!
In-frame STOP codon in chr13 at position 117:120
In-frame STOP codon in chrUn at position 9:12
100% (15 of 15) |########################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
2019-08-15 16:53:37: WARNING    There were 23 warnings during translation
2019-08-15 16:53:37: INFO       Writing blastdb sequences to db.fasta.
2019-08-15 16:53:37: INFO       Writing query sequences to query.fasta.
2019-08-15 16:53:37: INFO       Performing all-vs.-all Blastp (this might take a while)
2019-08-15 16:53:37: INFO       Making Blastdb
makeblastdb: symbol lookup error: /powerapps/share/mpi/openmpi-1.10.4.c7/lib/libmpi_cxx.so.1: undefined symbol: ompi_mpi_char
2019-08-15 16:53:37: INFO       Running Blastp
2019-08-15 16:53:37: INFO       blastp -db wgd_blast/37a19200e9d1ae.db.fasta -query wgd_blast/37a19200ea5b22.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 4 -out wgd_blast/botznik-chr.fa.blast.tsv
2019-08-15 16:53:37: INFO       All versus all Blastp done
rm: cannot remove ‘wgd_blast/37a19200e9d1ae.db.fasta.phr’: No such file or directory
rm: cannot remove ‘wgd_blast/37a19200e9d1ae.db.fasta.pin’: No such file or directory
rm: cannot remove ‘wgd_blast/37a19200e9d1ae.db.fasta.psq’: No such file or directory
2019-08-15 16:53:37: INFO       Blast done
2019-08-15 16:53:37: INFO       Performing MCL clustering (inflation factor = 2.0)
Traceback (most recent call last):
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/bin/wgd", line 11, in <module>
    sys.exit(cli())
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/wgd_cli.py", line 293, in mcl
    inflation_factor, eval_cutoff, output_dir, n_threads)
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/wgd_cli.py", line 442, in blast_mcl
    ava_graph = ava_blast_to_abc(blast_results)
  File "/powerapps/share/centos7/miniconda/miniconda3-4.5.11/lib/python3.7/site-packages/wgd/blast_mcl.py", line 137, in ava_blast_to_abc
    with open(ava_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'wgd_blast/botznik-chr.fa.blast.tsv'

Installation stuck at pandas wheel

Hi, I've been trying to install wgd on my Linux computer through pip3 and Python 3.8.5 (3.8 is default in Ubuntu 20.04). However, it has troubles when building the pandas wheel. I've read online that this may be caused by some incompatibility between pandas 0.24 and Python 3.8 (on stackoverflow).
Here is how the installation log looked like, it got stuck for some time and I cancelled it (perhaps it just takes 20 minutes as the post says).

csensa@Data:~/Documents/wgd-1.1.1$ pip3 install .
Processing /home/csensa/Documents/wgd-1.1.1
Requirement already satisfied: click>=7.0 in /usr/lib/python3/dist-packages (from wgd==1.1) (7.0)
Requirement already satisfied: biopython>=1.75 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (1.78)
Requirement already satisfied: seaborn>=0.9.0 in /usr/lib/python3/dist-packages (from wgd==1.1) (0.10.0)
Requirement already satisfied: coloredlogs>=10.0 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (14.0)
Requirement already satisfied: fastcluster==1.1.25 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (1.1.25)
Requirement already satisfied: numpy>=1.16 in /usr/lib/python3/dist-packages (from wgd==1.1) (1.17.4)
Requirement already satisfied: sklearn in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (0.0)
Requirement already satisfied: scipy>=1.2 in /usr/lib/python3/dist-packages (from wgd==1.1) (1.3.3)
Requirement already satisfied: matplotlib>=3.0.2 in /usr/lib/python3/dist-packages (from wgd==1.1) (3.1.2)
Requirement already satisfied: plumbum>=1.6.7 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (1.6.9)
Collecting pandas==0.24.1
  Using cached pandas-0.24.1.tar.gz (11.8 MB)
Requirement already satisfied: progressbar2>=3.39 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (3.53.1)
Requirement already satisfied: joblib==0.11 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (0.11)
Requirement already satisfied: ete3>=3.1 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (3.1.2)
Requirement already satisfied: bokeh>=1.0.4 in /home/csensa/.local/lib/python3.8/site-packages (from wgd==1.1) (1.4.0)
Requirement already satisfied: humanfriendly>=7.1 in /home/csensa/.local/lib/python3.8/site-packages (from coloredlogs>=10.0->wgd==1.1) (8.2)
Requirement already satisfied: scikit-learn in /home/csensa/.local/lib/python3.8/site-packages (from sklearn->wgd==1.1) (0.23.2)
Requirement already satisfied: python-dateutil>=2.5.0 in /usr/lib/python3/dist-packages (from pandas==0.24.1->wgd==1.1) (2.7.3)
Requirement already satisfied: pytz>=2011k in /usr/lib/python3/dist-packages (from pandas==0.24.1->wgd==1.1) (2019.3)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from progressbar2>=3.39->wgd==1.1) (1.14.0)
Requirement already satisfied: python-utils>=2.3.0 in /home/csensa/.local/lib/python3.8/site-packages (from progressbar2>=3.39->wgd==1.1) (2.4.0)
Requirement already satisfied: Jinja2>=2.7 in /home/csensa/.local/lib/python3.8/site-packages (from bokeh>=1.0.4->wgd==1.1) (2.11.2)
Requirement already satisfied: tornado>=4.3 in /home/csensa/.local/lib/python3.8/site-packages (from bokeh>=1.0.4->wgd==1.1) (6.0.4)
Requirement already satisfied: pillow>=4.0 in /home/csensa/.local/lib/python3.8/site-packages (from bokeh>=1.0.4->wgd==1.1) (7.2.0)
Requirement already satisfied: packaging>=16.8 in /usr/lib/python3/dist-packages (from bokeh>=1.0.4->wgd==1.1) (20.3)
Requirement already satisfied: PyYAML>=3.10 in /usr/lib/python3/dist-packages (from bokeh>=1.0.4->wgd==1.1) (5.3.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/csensa/.local/lib/python3.8/site-packages (from scikit-learn->sklearn->wgd==1.1) (2.1.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/lib/python3/dist-packages (from Jinja2>=2.7->bokeh>=1.0.4->wgd==1.1) (1.1.0)
Building wheels for collected packages: wgd, pandas
  Building wheel for wgd (setup.py) ... done
  Created wheel for wgd: filename=wgd-1.1-py3-none-any.whl size=74849 sha256=5d5701b55793be0d34affd7b74af9757878a2550c9fcfb1eeb6a204731b4b173
  Stored in directory: /home/csensa/.cache/pip/wheels/c5/17/a8/ba6431f85d4761f80672f2de7c246331f429c6dd10a843c5df
  Building wheel for pandas (setup.py) ... |^canceled

For a comparison I edited the setup.py asking for the latest pandas version (1.2.0) and the installation could reach immediately the end. I tried the example A.thaliana dataset with mcl and ksd commands and they worked.

running problem

Hi,
I trying to run a cDNA data and the mcl command doesn't create any file.
I get a lot of warnings about codons that contain 'N' in them and about in-frame stop codons (can't do much about this, this is the data I got) and then:
100% (84303 of 84303) |#################################################################################################################| Elapsed Time: 0:00:07 Time: 0:00:07 2019-09-12 10:25:43: WARNING There were 139382 warnings during translation 2019-09-12 10:25:43: INFO Writing blastdb sequences to db.fasta. 2019-09-12 10:25:43: INFO Writing query sequences to query.fasta. 2019-09-12 10:25:44: INFO Performing all-vs.-all Blastp (this might take a while) 2019-09-12 10:25:44: INFO Making Blastdb makeblastdb: symbol lookup error: /powerapps/share/mpi/openmpi-1.10.4.c7/lib/libmpi_cxx.so.1: undefined symbol: ompi_mpi_char 2019-09-12 10:25:44: INFO Running Blastp 2019-09-12 10:25:44: INFO blastp -db wgd_blast/37b75c7448cc96.db.fasta -query wgd_blast/37b75c7464ccc4.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 4 -out wgd_blast/ESTs_Cisavi_2018.fasta.blast.tsv 2019-09-12 10:25:44: INFO All versus all Blastp done rm: cannot remove ‘wgd_blast/37b75c7448cc96.db.fasta.phr’: No such file or directory rm: cannot remove ‘wgd_blast/37b75c7448cc96.db.fasta.pin’: No such file or directory rm: cannot remove ‘wgd_blast/37b75c7448cc96.db.fasta.psq’: No such file or directory
then some more info about trying to perform mcl clustering but I think the problem is upstream from that

could you let me know what I'm doing wrong?

NameError: global name 'FileNotFoundError' is not defined

Hi I am trying to run All-versus-all Blastp analysis and MCL clustering but I got this error.
Any suggestions?

File "/rhome/ccoleine/.conda/envs/MIC-test/lib/python2.7/site-packages/wgd_cli.py", line 293, in mcl
    inflation_factor, eval_cutoff, output_dir, n_threads)
  File "/rhome/ccoleine/.conda/envs/MIC-test/lib/python2.7/site-packages/wgd_cli.py", line 334, in blast_mcl
    if can_i_run_software(software) == 1:
  File "/rhome/ccoleine/.conda/envs/MIC-test/lib/python2.7/site-packages/wgd/utils.py", line 67, in can_i_run_software
    except FileNotFoundError:
NameError: global name 'FileNotFoundError' is not defined

WGD is not responding on second stage

Hi,

I am writing here to seek your advice on using wgd for my study.

Based on the paper and manual, I first got my genome_cds.fasta, and then used this command-

wgd mcl --cds --mcl -s genome_cds.fasta -o ./ -n 8

This resulted in genome_cds.mcl file as output. I used this output file to ruin next step as follows-

wgd ksd -o ./ -n32 --pairwise --wm phyml ../genome_cds.mcl ../genome_cds.fa

this is the part of running process-

(py3) amit8chiba@amit8chiba-Precision-Tower-7910:/mnt/md0/genome_cds_final/Comparitive_genomics/wgd/run_results/genome_cds_analysis$ wgd ksd -o ./ -n32 --pairwise --wm phyml ../genome_cds.mcl ../genome_cds.fa
2019-03-02 02:37:05: INFO
2019-03-02 02:37:05: INFO       codeml found
2019-03-02 02:37:05: INFO       MUSCLE v3.8.1551 by Robert C. Edgar
2019-03-02 02:37:05: INFO       . Command line: phyml --version

. This is PhyML version 3.3.20180621.
2019-03-02 02:37:05: WARNING    Output directory exists, will possibly overwrite
2019-03-02 02:37:06: INFO       Translating CDS file
100% (32390 of 32390) |##################################################################################################################################| Elapsed Time: 0:00:07 Time:  0:00:07
2019-03-02 02:37:14: WARNING    There were 0 warnings during translation
2019-03-02 02:37:14: INFO       Started whole paranome Ks analysis
2019-03-02 02:37:31: WARNING    Filtered out the 9 largest gene families because n*(n-1)/2 > `max_pairwise`
2019-03-02 02:37:31: WARNING    If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter.
2019-03-02 02:37:31: INFO       Started analysis in parallel (n_threads = 32)
2019-03-02 02:37:31: INFO       Performing analysis on gene family GF_000010
2019-03-02 02:37:31: INFO       Performing analysis on gene family GF_000011
2019-03-02 02:37:31: INFO       Performing analysis on gene family GF_000012
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000013
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000014
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000015
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000016
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000017
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000018
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000019
2019-03-02 02:37:32: INFO       Performing analysis on gene family GF_000020
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000022
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000021
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000023
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000024
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000025
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000026
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000027
2019-03-02 02:37:33: INFO       Performing analysis on gene family GF_000028
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000029
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000030
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000031
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000032
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000033
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000034
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000035
2019-03-02 02:37:34: INFO       Performing analysis on gene family GF_000036
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000038
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000037
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000039
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000041
2019-03-02 02:37:35: INFO       Performing analysis on gene family GF_000042

This step resulted in several files in temp file but it has been almost 12 hours but output file has not generated. It seems the program is stuck as no new files are being generated but I can not see any error. I am wondering if this is expected time. My genome size is 400Mb, and got 35000 genes in it.

Please let me know if you need any further information in order to help me out here.

Thank you so much in advance,

with best regards
Amit

wgd ksd error

Dear,
When I use wgd ksd, the error is occurred at last:

/Python/3.5.2/lib/pythwgd ksd erroron3.5/site-packages/joblib/externals/loky/process_executor.py:706: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak

Could you help me check this issues? Thank you!

No .ovo.tsv file produced after one_v_one mcl step

Hi,
sorry to have an other issue, I'm testing each function of wgd extensively ;-)
When I run wgd mcl for paranome, I get a file "xxxx.blast.tsv.mcl" that I use for the next step (ksd).
If i want to get the one-to-one orthologs among different species, i don't get this file, right ?
I tried between my species and Amborella, and I got a file "xxxxxx.ovo.tsv" that I used in step two...
But when I tried with other species, I only get the "xxxxx.blast.tsv" file, that I cannot use without error in the second step... I didn't get any error message during the first step, just no ovo.tsv.
Do you know what could be the cause of this, and how to modify the .blast.txv file to use it in the second step (I suspect it only needs minor edits to be usable) ?
Thanks a lot in advance !

wgd mcl error

Hi,

I am running the command

wgd mcl --cds -s NR_Error_corrected_K19_rename.fasta -o ./ -n 8

using PacBio Isoform sequencing data, but for some reasons the clustering step didn't produce mcl file. Do you know what the problem is?

Part of the error msg is below,

wgd mcl --cds -s NR_Error_corrected_K19_rename.fasta -o ./ -n 8
2019-07-16 13:40:11: INFO	makeblastdb: 2.9.0+
 Package: blast 2.9.0, build Mar 11 2019 15:20:05
2019-07-16 13:40:11: INFO	blastp: 2.9.0+
 Package: blast 2.9.0, build Mar 11 2019 15:20:05
2019-07-16 13:40:11: INFO	CDS sequences provided, will first translate.
Invalid codon gaa in transcript/0
Sequence length != multiple of 3 for transcript/6137!                                                                                     
Invalid codon gTC in transcript/6137
Invalid codon agt in transcript/4090
Sequence length != multiple of 3 for transcript/2047!
Invalid codon ggc in transcript/2047
.
.
.
Ignoring sequence 'lcl|38592' as it has no sequence data
Ignoring sequence 'lcl|38593' as it has no sequence data
Ignoring sequence 'lcl|38594' as it has no sequence data
Ignoring sequence 'lcl|38595' as it has no sequence data

In the end, I have the blast-blast tsv file only...

Issue during MSA (second step)

Hi,
I ran the first step without trouble (Get the paranome), but the second failed on the cluster I use :
with this command : wgd --verbosity debug ksd -o ./ -n 12 --pairwise --wm fasttree peptides_cleaned.fa.blast.tsv.mcl ../peptides_cleaned.fa
I got this error :

2019-01-31 19:38:30: INFO	Performing analysis on gene family GF_003161
2019-01-31 19:38:30: DEBUG	Performing MSA (muscle) for GF_003161
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'evm.model.scaf_173.199'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 350, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/wgd-1.0/venv/lib/python3.6/site-packages/wgd/ks_distribution.py", line 425, in analyse_family_pairwise
    'Ks': results_dict['Ks'][g1][g2],
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'evm.model.scaf_173.199'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python-3.6.5.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/python-3.6.5/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 359, in __call__
    raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
KeyError                                           Thu Jan 31 19:23:15 2019
PID: 31068              Python 3.6.5: /usr/local/wgd-1.0/venv/bin/python3.6
...........................................................................
/usr/local/python-3.6.5/lib/python3.6/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function analyse_family_pairwise>, ('GF_000136', {'evm.model.scaf_117.352': 'MAAASLIFSPCLLLLLFLVSSPSLSARVALLSEEQRQHKQPPLFTHVC...SFSINACKALSMVTESAYVVLPWGTHSISVGDGEKAVTFPVHVSYEFSA', 'evm.model.scaf_147.358': 'MVPAALFSTQLRSALTPSPACLIPGTTNNGKMTSFAFVLLLLLFCIAP...VISPCEHLSKTEEDGSKVLEGGSHFLVVGDEEYQVNIVSSKRNEWSSLV', 'evm.model.scaf_173.199': 'MGGRLPITWYYNDYVKHIPMTSIQLRPDLANKYPARTYKFFDGSVVYP...APTAKFSLRSLRGIEPCHRRCLHLSASTLSLNDGALTSPFSLCFKRLKK', 'evm.model.scaf_173.201': 'MQITGPPKIFRGKSIPCRYSKPVDACGKYTKVNFMRGCNGAKCPHPSWVANAVKTSQTLDATISHVGLDLSTEAEGLDRTDLLLPGF', 'evm.model.scaf_173.202': 'MVNSHRFLGNAPEDAVRQVLSAGLDLDCGDYYRNYALITTNKGNVDNA...DSLGKEDFCSDEHMELATEAARQGTVLLKNDHNTLPLDACNLKSVAAVS', 'evm.model.scaf_173.203': 'MSKTSQLYGLRERGPGLDLDCGDFYPKYLKSAVEQGKVREGDIDKALI...FSINACEALGLVTETAYKVLPWGRHTISIGDGSGAITFPLQVNFKFFSN', 'evm.model.scaf_212.169': 'MAALDLLLLVCISLLIISTSSRTIQPVRRSYPRRGIQTLGMNATNFNH...SKTIPYDLNICESLKVVTGSAYTVVPYGQHTITVGDGDGSISFSFEVKF', 'evm.model.scaf_226.59': 'MAAAVRLISLVLLFSLLSILFSQAQSRPAFACGGGSARTFPFCQTSLP...ARVTVGLDVCKHLSFVDEQGIRRIPIGDHSLHVGDLTHSLSLHVEGTGI', 'evm.model.scaf_244.127': 'MSKTPLLSVLLLLFLLVSPSASHPHRPFACLGPESSLPFCNAALSIPD...VSNVGPGTRFGGKFPAATSFPQVILTAAAFNASLWEEIGRVRSLLSSRR', 'evm.model.scaf_244.128': 'MYNKGWGGLTYWSPNVNIFRDPRWGRGQETPGEDPVVAGKYAASYVRG...QTRVAVNIHVCKHLSVVDTSGIRRIPIGDHSLQVGDLTHSISLLGETLP', ...}, {'evm.model.scaf_1.1': 'ATGTCTTTGAATTTTAATAATTCCTCCAGCACAAAGGATCACTTCCAG...ATATGGATTGCAAGCTGCAGTGGGTGCCATGCTGTCTCCATTATTGTGA', 'evm.model.scaf_1.14': 'ATGCTAGCGAACACCTCGATAAGGGTGCTGTACAAATCTCCCTTCAGA...AGAAGAAGATGAAGAAGGTGAGGGAGATAGAGTAGAAACAAACAAGTAG', 'evm.model.scaf_1.15': 'ATGGACGCTGCTGCGTCTCTGCTTCGCCCATATTCCATCCTCCGGCTG...CTACTGCTTCTTCCAGGGTGCTGGCGACTCGCTAAAATATTCTCGTTGA', 'evm.model.scaf_1.16': 'ATGGTGGATGGCAAAAGAATCGCCCTCGATTTTTGGGGGTTCATCTGG...AGCAGAGGATGAAGCATTGAAAGAAGACTGGCAGAGGATGAAGCATTGA', 'evm.model.scaf_1.19.1': 'ATGTCCGTCAGCGAAATCGCCTGCACCTACGCCGCCCTGCTTCTATAC...TCCTGGTTTGTTGTATTGTCGTGCAATTAACGATGAAACTTGTCGTTGA', 'evm.model.scaf_1.2': 'ATGATACCAAATATCTTGAGCACATGCCTTCTGTTGGTTCATCGACAA...CACAACAGCAGAGGATGAAGCATTGCAAGCTGTGGATGGGGAGCGCTAG', 'evm.model.scaf_1.22': 'ATGAGAGCCGGGACTAGGTCAATTCGGCTTCAATCTTCCATTCAGGGA...AGGAGAAGAATACTTATTTCCAATGGAAAACCATTTTACAGAAGTATGA', 'evm.model.scaf_1.23': 'ATGGATTTGCTGCAAAATTACTCTGCAAAGAGCGATTCCTCTGATGGC...GAGCAAAGTTGCAACGTGTGGTTGGGACGGTTTGATCAAATACTGGTAA', 'evm.model.scaf_1.27': 'ATGGCAGCTGTAACTGCCTCCCTCTTTGTATCAAGAAGCAACAATTTG...TGGACATGAACTTGCTCCACTTTCAGTGGATAATGTGGCTAATCCTTGA', 'evm.model.scaf_1.28': 'ATGGCAACTGTAAGCGCCTCCCACTTTGTATCAAGAAGTTTCCATTTC...TCATTTTCTCCTTTTCTGTCAAAAATGTCGTGATATTTTTCTTGTATAA', ...}, '/tmp/hinsinger/data_comparative_genomics/paranome/ks_tmp.3707af5a12e8b8', 'codeml', False, 1, 100, 'fasttree', 'muscle', '/tmp/hinsinger/data_comparative_genomics/paranome'), {})]
    132 
    133     def __len__(self):
    134         return self._size
    135 

Then other partial alignments and more messages...
How can I fix it ? IT doesn't seem related to the cluster settings, but to the multiprocessing in WGD....
Thanks a lot in advance !

Problem with ksd

Hi - Thank you for making this analysis tool available. I am running a analysis on a genome of a bacterial environmental isolate. Similar to another issue the ksd command results in multiple errors. I used the .mcl file and the fasta file containing nucleotide sequences as an input for this command. I also tried the --verbosity debug option. Any idea what might cause the problem? I have also attached my input files.

Thanks for your time,

Markus

Here are the commands I used:

wgd mcl --cds --mcl -s Input.fasta -o Input.wgd -n 40
wgd ksd --n_threads 40 Input.fasta.blast.tsv.mcl Input.fasta
wgd --verbosity debug ksd --n_threads 40 Input.fasta.blast.tsv.mcl Input.fastata

These are some of the errors I get which ultimately result in a 'TypeError: 'NoneType' object is not iterable' error

2019-07-15 22:20:53: INFO	codeml found
2019-07-15 22:20:53: INFO	MUSCLE v3.8.1551 by Robert C. Edgar
2019-07-15 22:20:53: INFO	
2019-07-15 22:20:53: WARNING	Output directory exists, will possibly overwrite
2019-07-15 22:20:53: INFO	Translating CDS file
100% (6427 of 6427) |######################################################################################| Elapsed Time: 0:00:01 Time:  0:00:01
2019-07-15 22:20:55: WARNING	There were 0 warnings during translation
2019-07-15 22:20:55: INFO	Started whole paranome Ks analysis
2019-07-15 22:20:55: WARNING	Filtered out the 0 largest gene families because n*(n-1)/2 > `max_pairwise`
2019-07-15 22:20:55: WARNING	If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter. 
2019-07-15 22:20:55: INFO	Started analysis in parallel (n_threads = 40)
2019-07-15 22:20:55: INFO	Performing analysis on gene family GF_000001
2019-07-15 22:20:55: INFO	Performing analysis on gene family GF_000002
2019-07-15 22:20:55: INFO	Performing analysis on gene family GF_000003
2019-07-15 22:20:55: INFO	Performing analysis on gene family GF_000004
........
........
2019-07-15 22:20:56: INFO	Performing analysis on gene family GF_000040
2019-07-15 22:20:57: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000028.codeml not found
2019-07-15 22:20:57: INFO	Performing analysis on gene family GF_000041
2019-07-15 22:20:57: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000020.codeml not found
2019-07-15 22:20:57: INFO	Performing analysis on gene family GF_000042
2019-07-15 22:20:58: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000038.codeml not found
2019-07-15 22:20:58: INFO	Performing analysis on gene family GF_000043
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000032.codeml not found
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000017.codeml not found
2019-07-15 22:20:59: INFO	Performing analysis on gene family GF_000044
2019-07-15 22:20:59: INFO	Performing analysis on gene family GF_000045
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000025.codeml not found
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000029.codeml not found
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000035.codeml not found
2019-07-15 22:20:59: INFO	Performing analysis on gene family GF_000046
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000040.codeml not found
2019-07-15 22:20:59: INFO	Performing analysis on gene family GF_000047
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000030.codeml not found
2019-07-15 22:20:59: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/TEXT/GF_000041.codeml not found
2019-07-15 22:20:59: INFO	Performing analysis on gene family GF_000048
2019-07-15 22:20:59: INFO	Performing analysis on gene family GF_000049
........
........
2019-07-15 21:24:26: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000008.codeml not found
2019-07-15 21:24:29: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000010.codeml not found
2019-07-15 21:24:30: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000054.codeml not found
2019-07-15 21:24:35: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000005.codeml not found
2019-07-15 21:24:47: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000007.codeml not found
2019-07-15 21:25:12: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000004.codeml not found
2019-07-15 21:26:00: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000002.codeml not found
2019-07-15 21:26:26: WARNING	Codeml output file /media/LargeStorage/Markus wgd/CG23_2.wgd/ks_tmp.3789a71badfa0c/GF_000001.codeml not found
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/software/miniconda2/envs/wgd/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 350, in __call__
    return self.func(*args, **kwargs)
  File "/software/miniconda2/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/software/miniconda2/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/software/miniconda2/envs/wgd/lib/python3.6/site-packages/wgd/ks_distribution.py", line 287, in analyse_family
    os.path.basename(msa_path), preserve=preserve, times=times)
TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/software/miniconda2/envs/wgd/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/software/miniconda2/envs/wgd/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 359, in __call__
    raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
.
.
.

Input.fasta.blast.tsv.mcl.txt
Input.fasta.txt

Install Error

Hi,
When I try to install the software "wgd",It report the error like this below.

  ERROR: Could not install packages due to an EnvironmentError: [(/software/wgd/.git/objects/pack/pack-0f9dbb53be2abbe014d924ad3322cd845548336c.idx', '/tmp/pip-req-build-hxhrkvnl/.git/objects/pack/pack-0f9dbb53be2abbe014d924ad3322cd845548336c.idx', "[Errno 13] Permission denied: '/tmp/pip-req-build-hxhrkvnl/.git/objects/pack/pack-0f9dbb53be2abbe014d924ad3322cd845548336c.idx'"), ('/software/wgd/.git/objects/pack/pack-0f9dbb53be2abbe014d924ad3322cd845548336c.pack', '/tmp/pip-req-build-hxhrkvnl/.git/objects/pack/pack-0f9dbb53be2abbe014d924ad3322cd845548336c.pack', "[Errno 13] Permission denied: '/tmp/pip-req-build-hxhrkvnl/.git/objects/pack/pack-0f9dbb53be2abbe014d924ad3322cd845548336c.pack'")]

And I do not have root authority.Can you give me some advice?
Thanks,

result figures are empty

hi, I run the WGD like that,format.SoyC09.CDS.fasta and ormat.SoyC09.gff are my input file :

wgd mcl -n 8 --cds --mcl -s format.SoyC09.CDS.fasta -o SoyC09.CDS.out
wgd ksd --n_threads 8 --pairwise SoyC09.CDS.out/format.SoyC09.CDS.fasta.blast.tsv.mcl format.SoyC09.CDS.fasta
wgd syn  format.SoyC09.gff format.SoyC09.CDS.fasta
wgd kde wgd_ksd/format.SoyC09.CDS.fasta.ks.tsv
wgd mix wgd_ksd/format.SoyC09.CDS.fasta.ks.tsv

I don't find errors in log but the output figures like ks.svg and dotplot.svg are empty.coud't you help me fix this problem ? thank you very much . here is my log and I delete some rereat INFOs because it's too big:

2020-10-31 22:45:08: INFO	makeblastdb stdout: makeblastdb: 2.2.26+
Package: blast 2.2.26, build Feb  9 2012 16:01:46
2020-10-31 22:45:08: INFO	makeblastdb stderr: 
2020-10-31 22:45:08: INFO	blastp stdout: blastp: 2.2.26+
Package: blast 2.2.26, build Feb  9 2012 16:01:46
2020-10-31 22:45:08: INFO	blastp stderr: 
2020-10-31 22:45:09: INFO	mcl stdout: mcl 14-137
Copyright (c) 1999-2014, Stijn van Dongen. mcl comes with NO WARRANTY
to the extent permitted by law. You may redistribute copies of mcl under
the terms of the GNU General Public License.
2020-10-31 22:45:09: INFO	mcl stderr: 
2020-10-31 22:45:09: INFO	Output directory: /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out does not exist, will make it.
2020-10-31 22:45:09: INFO	CDS sequences provided, will first translate.
N/A% (0 of 55927) |                      | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (94 of 55927) |                     | Elapsed Time: 0:00:00 ETA:   0:00:59
 94% (52991 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 94% (53096 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 95% (53334 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 95% (53589 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 96% (53754 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:01
 96% (53945 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:00
 96% (54206 of 55927) |################# | Elapsed Time: 0:00:31 ETA:   0:00:00
 97% (54402 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 97% (54606 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 97% (54805 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 98% (55019 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 98% (55213 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 99% (55420 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
 99% (55623 of 55927) |################# | Elapsed Time: 0:00:32 ETA:   0:00:00
100% (55927 of 55927) |##################| Elapsed Time: 0:00:32 Time:  0:00:32
2020-10-31 22:45:48: WARNING	There were 1 warnings during translation
2020-10-31 22:45:48: INFO	Writing blastdb sequences to db.fasta.
2020-10-31 22:45:48: INFO	Writing query sequences to query.fasta.
2020-10-31 22:45:49: INFO	Performing all-vs.-all Blastp (this might take a while)
2020-10-31 22:45:49: INFO	Making Blastdb


Building a new DB, current time: 10/31/2020 22:45:49
New DB name:   /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b02beba6.db.fasta
New DB title:  /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b02beba6.db.fasta
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 55926 sequences in 2.31642 seconds.
2020-10-31 22:45:52: INFO	Running Blastp
2020-10-31 22:45:52: INFO	blastp -db /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b02beba6.db.fasta -query /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/38fdb5b09ed912.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 8 -out /ds3512/home/panyp/NN1138-2/04.WGD_data/NN_data/SoyC09.CDS.out/format.SoyC09.CDS.fasta.blast.tsv
2020-11-01 03:00:51: INFO	All versus all Blastp done
2020-11-01 03:00:51: INFO	Blast done
2020-11-01 03:00:52: INFO	Performing MCL clustering (inflation factor = 2.0)
2020-11-01 03:01:05: INFO	Started MCL clustering (mcl)
2020-11-01 03:01:50: INFO	Done
2020-11-01 03:02:01: INFO	codeml stdout: AAML in paml version 4.9j, February 2020
2020-11-01 03:02:01: INFO	codeml stderr: Error: file name empty..
2020-11-01 03:02:01: INFO	codeml found
2020-11-01 03:02:01: INFO	mafft stdout: 
2020-11-01 03:02:01: INFO	mafft stderr: v7.158b (2014/06/27)
2020-11-01 03:02:02: INFO	FastTree stdout: 
2020-11-01 03:02:02: INFO	FastTree stderr: Unknown or incorrect use of option --version
  FastTree protein_alignment > tree
  FastTree < protein_alignment > tree
  FastTree -out tree protein_alignment
  FastTree -nt nucleotide_alignment > tree
  FastTree -nt -gtr < nucleotide_alignment > tree
  FastTree < nucleotide_alignment > tree
FastTree accepts alignments in fasta or phylip interleaved formats

Common options (must be before the alignment file):
  -quiet to suppress reporting information
  -nopr to suppress progress indicator
  -log logfile -- save intermediate trees, settings, and model details
  -fastest -- speed up the neighbor joining phase & reduce memory usage
        (recommended for >50,000 sequences)
  -n <number> to analyze multiple alignments (phylip format only)
        (use for global bootstrap, with seqboot and CompareToBootstrap.pl)
  -nosupport to not compute support values
  -intree newick_file to set the starting tree(s)
  -intree1 newick_file to use this starting tree for all the alignments
        (for faster global bootstrap on huge alignments)
  -pseudo to use pseudocounts (recommended for highly gapped sequences)
  -gtr -- generalized time-reversible model (nucleotide alignments only)
  -lg -- Le-Gascuel 2008 model (amino acid alignments only)
  -wag -- Whelan-And-Goldman 2001 model (amino acid alignments only)
  -quote -- allow spaces and other restricted characters (but not ' ) in
           sequence names and quote names in the output tree (fasta input only;
           FastTree will not be able to read these trees back in)
  -noml to turn off maximum-likelihood
  -nome to turn off minimum-evolution NNIs and SPRs
        (recommended if running additional ML NNIs with -intree)
  -nome -mllen with -intree to optimize branch lengths for a fixed topology
  -cat # to specify the number of rate categories of sites (default 20)
      or -nocat to use constant rates
  -gamma -- after optimizing the tree under the CAT approximation,
      rescale the lengths to optimize the Gamma20 likelihood
  -constraints constraintAlignment to constrain the topology search
       constraintAlignment should have 1s or 0s to indicates splits
  -expert -- see more options
For more information, see http://www.microbesonline.org/fasttree/
2020-11-01 03:02:02: WARNING	Output directory exists, will possibly overwrite
2020-11-01 03:02:02: INFO	Translating CDS file
N/A% (0 of 55927) |                      | Elapsed Time: 0:00:00 ETA:  --:--:--
  0% (166 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:33
  0% (351 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:31
  1% (580 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:28
  1% (708 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:28
  1% (970 of 55927) |                    | Elapsed Time: 0:00:00 ETA:   0:00:25
  2% (1235 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  2% (1416 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  2% (1668 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:23
  3% (1853 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  3% (2014 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  3% (2124 of 55927) |                   | Elapsed Time: 0:00:00 ETA:   0:00:24
  4% (2322 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  4% (2485 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  4% (2648 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  5% (2832 of 55927) |                   | Elapsed Time: 0:00:01 ETA:   0:00:25
  5% (3023 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:25
  5% (3207 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:26
  6% (3410 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:25
  6% (3540 of 55927) |#                  | Elapsed Time: 0:00:01 ETA:   0:00:25
 95% (53553 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:01
 96% (53725 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:01
 96% (53926 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:01
 96% (54176 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:00
 97% (54356 of 55927) |################# | Elapsed Time: 0:00:27 ETA:   0:00:00
 97% (54512 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 97% (54673 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 98% (54853 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 98% (55032 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 98% (55204 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 99% (55395 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 99% (55561 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
 99% (55814 of 55927) |################# | Elapsed Time: 0:00:28 ETA:   0:00:00
100% (55927 of 55927) |##################| Elapsed Time: 0:00:28 Time:  0:00:28
2020-11-01 03:02:31: WARNING	There were 1 warnings during translation
2020-11-01 03:02:31: INFO	Started whole paranome Ks analysis
2020-11-01 03:02:31: WARNING	Filtered out the 1 largest gene families because n*(n-1)/2 > `max_pairwise`
2020-11-01 03:02:31: WARNING	If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter. 
2020-11-01 03:02:31: INFO	Started analysis in parallel (n_threads = 8)
2020-11-01 03:02:32: INFO	Performing analysis on gene family GF_000002
2020-11-01 03:02:33: INFO	Performing analysis on gene family GF_000003
2020-11-01 03:02:33: INFO	Performing analysis on gene family GF_000004
2020-11-01 03:02:34: INFO	Performing analysis on gene family GF_000005
2020-11-01 03:02:34: INFO	Performing analysis on gene family GF_000006
2020-11-01 03:02:34: INFO	Performing analysis on gene family GF_000007
2020-11-01 03:02:35: INFO	Performing analysis on gene family GF_000008
2020-11-01 03:02:35: INFO	Performing analysis on gene family GF_000009
2020-11-01 03:45:53: INFO	Performing analysis on gene family GF_000010
2020-11-01 03:49:08: INFO	Performing analysis on gene family GF_000011
2020-11-01 03:58:49: INFO	Performing analysis on gene family GF_000012
2020-11-01 04:01:43: INFO	Performing analysis on gene family GF_000013
2020-11-01 04:09:27: INFO	Performing analysis on gene family GF_000014
2020-11-01 04:12:15: INFO	Performing analysis on gene family GF_000015
2020-11-01 06:35:45: INFO	Performing analysis on gene family GF_000306
2020-11-01 06:36:03: INFO	Performing analysis on gene family GF_000307
2020-11-01 06:36:09: INFO	Performing analysis on gene family GF_000308
2020-11-01 06:36:12: INFO	Performing analysis on gene family GF_000309
2020-11-01 06:36:13: INFO	Performing analysis on gene family GF_000310
2020-11-01 06:36:16: INFO	Performing analysis on gene family GF_000311
2020-11-01 06:36:20: INFO	Performing analysis on gene family GF_000312
2020-11-01 06:36:33: INFO	Performing analysis on gene family GF_000313
2020-11-01 06:36:37: INFO	Performing analysis on gene family GF_000314
2020-11-01 06:36:52: INFO	Performing analysis on gene family GF_000315
2020-11-01 08:03:23: INFO	Performing analysis on gene family GF_011430
2020-11-01 08:03:24: INFO	Performing analysis on gene family GF_011431
2020-11-01 08:03:24: INFO	Performing analysis on gene family GF_011432
2020-11-01 08:03:24: INFO	Performing analysis on gene family GF_011433
2020-11-01 08:03:24: INFO	Performing analysis on gene family GF_011434
2020-11-01 08:03:24: INFO	Performing analysis on gene family GF_011435
2020-11-01 08:03:24: INFO	Performing analysis on gene family GF_011436
2020-11-01 08:03:25: INFO	Performing analysis on gene family GF_011437
2020-11-01 08:03:25: INFO	Performing analysis on gene family GF_011438
2020-11-01 08:03:25: INFO	Performing analysis on gene family GF_011439
2020-11-01 08:03:25: INFO	Performing analysis on gene family GF_011440
2020-11-01 08:03:25: INFO	Performing analysis on gene family GF_011441
2020-11-01 08:03:25: INFO	Performing analysis on gene family GF_011442
2020-11-01 08:03:25: INFO	Performing analysis on gene family GF_011443
2020-11-01 08:03:26: INFO	Performing analysis on gene family GF_011444
2020-11-01 08:03:26: INFO	Performing analysis on gene family GF_011445
2020-11-01 08:03:26: INFO	Performing analysis on gene family GF_011446
2020-11-01 08:03:26: INFO	Performing analysis on gene family GF_011447
2020-11-01 08:03:26: INFO	Performing analysis on gene family GF_011448
2020-11-01 08:03:26: INFO	Performing analysis on gene family GF_011449
2020-11-01 08:03:26: INFO	Performing analysis on gene family GF_011450
2020-11-01 08:03:27: INFO	Performing analysis on gene family GF_011451
2020-11-01 08:03:27: INFO	Performing analysis on gene family GF_011452
2020-11-01 08:03:28: INFO	Analysis done
2020-11-01 08:03:28: INFO	Making results data frame
2020-11-01 08:13:15: INFO	Removing tmp directory
2020-11-01 08:13:34: INFO	Computing weights, outlier cut-off at Ks > 5
2020-11-01 08:13:34: INFO	Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2020-11-01 08:13:34: INFO	NumExpr defaulting to 8 threads.
2020-11-01 08:13:39: INFO	Generating plots
2020-11-01 08:13:39: INFO	Will plot **node-weighted** histograms
2020-11-01 08:13:41: INFO	Done
2020-11-01 08:13:45: INFO	i-adhore stdout: This is i-ADHoRe v3.0.
Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on compute-0-1.local.
2020-11-01 08:13:45: INFO	i-adhore stderr: Error opening the settings file: -version
2020-11-01 08:13:45: WARNING	Output directory already exists, will possibly overwrite
2020-11-01 08:13:45: INFO	Parsing GFF file
2020-11-01 08:13:48: INFO	Writing gene lists
2020-11-01 08:13:49: INFO	Writing families file
2020-11-01 08:13:51: INFO	Writing configuration file
2020-11-01 08:13:51: INFO	Running I-ADHoRe 3.0
2020-11-01 08:13:55: WARNING	WARNING: Maximum allowed number of gaps in the alignment not specified.  Setting to cluster_gap.
WARNING: Tandem gap size not correct in settings file. Using default (gap_size / 2)

2020-11-01 08:13:55: INFO	
This is i-ADHoRe v3.0.
Copyright (c) 2002-2010, Flanders Interuniversity Institute for Biotechnology, VIB.
Algorithm designed by Klaas Vandepoele, Cedric Simillion, Jan Fostier, Dieter De Witte,
Koen Janssens, Sebastian Proost, Yvan Saeys and Yves Van de Peer.

Process 1/1 is alive on compute-0-1.local.


************* i-ADHoRe parameters *************
	Number of genelists = 54
	Blast table = ./wgd_syn/families.tsv
	Output path = ./wgd_syn/i-adhore-out/
	Gap size = 30
	Cluster gap size = 35
	Cloud gap size = 0
	Cloud cluster gap size = 0
	Max gaps in alignment = 35
	Tandem gap = 15
	Flush output = 1000
	Q-value = 0.75
	Anchorpoints = 3
	Probability cutoff = 0.01
	Cloud filtering method = Binomial
	Level 2 only = false
	Use family = true
	Write statistics = false
	Alignment method = GreedyGraphbased4
	Multiple hypothesis correction = FDR
	Number of threads = 1
	Compare aligners = false
	Collinear searches only
	Visualize GHM.png = false
	Visualize Alignment = true
	Verbose output = true
************ END i-AdDHoRe parameters *********

Creating dataset...			done. (time: 0.0401988s)
Mapping gene families...		done. (time: 0.424732s)
Remapping tandem duplicates...	done. (time: 0.050889s)
Writing genelists file...		done. (time: 0.0950758s)
Collinear Search
Level 2 multiplicon detection...	done. (time: 3.18828s)
Profile detection...
Flushing output files...Visualize AlignedProfiles
done.
Time for Higher Level Detection: 0.00403285s.


All Done!  Bye...



2020-11-01 08:13:55: INFO	Drawing co-linearity dotplot
2020-11-01 08:13:55: INFO	Done
/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd/viz.py:223: UserWarning: Attempting to set identical left == right == 0 results in singular transformations; automatically expanding.
/ds3512/home/panyp/ruanjian/python3/lib/python3.6/site-packages/wgd-1.2-py3.6.egg/wgd/viz.py:224: UserWarning: Attempting to set identical bottom == top == 0 results in singular transformations; automatically expanding.
2020-11-01 08:14:02: INFO	Preparing data frame
2020-11-01 08:14:03: INFO	 .. max_iter = 1000
2020-11-01 08:14:03: INFO	 .. n_init   = 1
2020-11-01 08:14:03: INFO	Method is GMM, interpret best model with caution!
2020-11-01 08:14:03: INFO	Fitting GMM with 1 components
2020-11-01 08:14:04: INFO	Component mean, variance, weight: 
2020-11-01 08:14:04: INFO	.. 0.283, 1.347, 1.000
2020-11-01 08:14:04: INFO	Fitting GMM with 2 components
2020-11-01 08:14:04: INFO	Component mean, variance, weight: 
2020-11-01 08:14:04: INFO	.. 0.915, 0.486, 0.390
2020-11-01 08:14:04: INFO	.. 0.134, 0.456, 0.610
2020-11-01 08:14:04: INFO	Fitting GMM with 3 components
2020-11-01 08:14:04: INFO	Component mean, variance, weight: 
2020-11-01 08:14:04: INFO	.. 0.656, 0.142, 0.231
2020-11-01 08:14:04: INFO	.. 0.136, 0.458, 0.631
2020-11-01 08:14:04: INFO	.. 1.948, 0.079, 0.138
2020-11-01 08:14:04: INFO	Fitting GMM with 4 components
2020-11-01 08:14:04: INFO	Component mean, variance, weight: 
2020-11-01 08:14:04: INFO	.. 0.130, 0.165, 0.467
2020-11-01 08:14:04: INFO	.. 1.874, 0.098, 0.147
2020-11-01 08:14:04: INFO	.. 0.065, 0.947, 0.079
2020-11-01 08:14:04: INFO	.. 0.550, 0.206, 0.306
2020-11-01 08:14:04: INFO	
2020-11-01 08:14:04: INFO	AIC assessment:
2020-11-01 08:14:04: INFO	min(AIC) = 97487.41 for model 4
2020-11-01 08:14:04: INFO	Relative probabilities compared to model 4:
2020-11-01 08:14:04: INFO	   /                          \
2020-11-01 08:14:04: INFO	   |      (min(AIC) - AICi)/2 |
2020-11-01 08:14:04: INFO	   | p = e                    |
2020-11-01 08:14:04: INFO	   \                          /
2020-11-01 08:14:04: INFO	.. model   1: p = 0.0000
2020-11-01 08:14:04: INFO	.. model   2: p = 0.0000
2020-11-01 08:14:04: INFO	.. model   3: p = 0.0000
2020-11-01 08:14:04: INFO	.. model   4: p = 1.0000
2020-11-01 08:14:04: INFO	
2020-11-01 08:14:04: INFO	
2020-11-01 08:14:04: INFO	Delta BIC assessment: 
2020-11-01 08:14:04: INFO	min(BIC) = 97580.00 for model 4
2020-11-01 08:14:04: INFO	.. model   1: delta(BIC) =  7250.57 (    >10: Very Strong)
2020-11-01 08:14:04: INFO	.. model   2: delta(BIC) =  4139.71 (    >10: Very Strong)
2020-11-01 08:14:04: INFO	.. model   3: delta(BIC) =  2174.50 (    >10: Very Strong)
2020-11-01 08:14:04: INFO	.. model   4: delta(BIC) =     0.00 (0 to  2:   Very weak)
2020-11-01 08:14:04: INFO	
2020-11-01 08:14:04: INFO	Plotting AIC & BIC
2020-11-01 08:14:04: INFO	Plotting mixtures
2020-11-01 08:14:07: INFO	Writing component-wise probabilities to file

Error: Invalid value for '--ks_range' / '-r'

Hi,

When I execute the command ( wgd viz -r 0.005,2 -ks ./ks_dir/), I encounter this error:
Error: Invalid value for '--ks_range' / '-r': 0.005,2 is not a valid floating point value

Is this parameter (-r) in the wrong format? How should this parameter be set?

Looking forward to your reply.

with best regards
Xupeng

installation issue

Hi,

This might be silly, I used this program a while ago, everything works great. Now I am trying to install it on the new env using pip install wgd, but the installation doesn't work anymore with error msg below

ERROR: Could not find a version that satisfies the requirement wgd (from versions: none)
ERROR: No matching distribution found for wgd

Just to have positive control, I did pip install bar to make sure the pip is working properly. My current python version was 3.6. Would you please take a look and see what might be the reason?

Thank you!

install error

Hi, when i install the wgd use the command line:
pip install .
the result is:

**Installing collected packages: wgd
  Found existing installation: wgd 1.0
    Uninstalling wgd-1.0:
      Successfully uninstalled wgd-1.0
  Running setup.py install for wgd ... done
Successfully installed wgd-1.0**
But when I run the wgd and encounter error:
Traceback (most recent call last):
  File "/usr/local/bin/wgd", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/local/lib/python2.7/dist-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 2603, in <module>
  File "/usr/local/lib/python2.7/dist-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 666, in require
  File "/usr/local/lib/python2.7/dist-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 565, in resolve
pkg_resources.DistributionNotFound: bokeh

why?

Singularity I-ADHoRe not working

Hi

I am using WGD on an insect genome, and I have been using the singularity container within a Centos 7 Virtual Machine.
I was following the basic tutorials, but I came across an error on the third command line (singularity exec wgd.simg wgd syn):

ERROR i-adhora executable not found!
ERROR Could not run all software, exit here.

I believe this is an error with the container instalation?

best;
Pedro

I got an Intel MKL FATAL ERROR

Hi Dr. Zwaenepoel,
I successfully installed the software, but got an error when I tried to ran "wgd ksd" as:
wgd ksd -o ./ -n 12 ./blabla.blast.tsv.mcl blabla.cds.fa
The mcl file is produced by "wgd mcl"

the error said it cannot load /root/anaconda3/lib/python3.7/site-packages/sklearn/metrics/../../../../libmkl_core.so.

But I can see there is a file called libmkl_core.so in that path,
and the r+x authority of the file (as well as the whole path to the file) has been given to every user.

Could you please help me to see what's wrong with it?
Many thanks,
Bowen

2019-03-15 10:38:08: INFO	Performing analysis on gene family GF_006004
2019-03-15 10:38:08: INFO	Performing analysis on gene family GF_006005
2019-03-15 10:38:08: INFO	Performing analysis on gene family GF_006006
2019-03-15 10:38:08: INFO	Performing analysis on gene family GF_006007
2019-03-15 11:44:23: INFO	Analysis done
2019-03-15 11:44:23: INFO	Making results data frame
2019-03-15 11:45:55: INFO	Removing tmp directory
2019-03-15 11:45:57: INFO	Computing weights, outlier cut-off at Ks > 5
Intel MKL FATAL ERROR: Cannot load /root/anaconda3/lib/python3.7/site-packages/sklearn/metrics/../../../../libmkl_core.so.

Can I use wgd for de novo transcriptomes

Dear Arthur,

I have several questions about the wgd.
1), can I use the wgd for my de novo transcriptome data?
2), I have tried wgd for my denovo transcriptome data. The maximum numbers for duplications of the samples are around 100 - 350. Are they too low?
3), can I use the wgd for estimating the between species Ks plot. I have tried the command 'wgd dmd spcies1.cds.fa species2. cds.fa'. I suppose the Ks peak recovered from analysis should correspond to the split between the species1 and species2. Then, comparing the location of Ks peak (a WGD event) of species1, Ks peak of species2 (a WGD event), and the Ks peak between the two species, I can know if WGD events happened before or after the split of the two species. Is my understanding right? However, after running the command 'wgd dmd spcies1.cds.fa species2. cds.fa', I did not get an svg file, which may show the Ks distribution. What additional steps need?
4), I want to plot the Ks distribution for multiple species in one figure (e.g. the Fig. 1a in your Zwaenepoel1 and de Peer (2019) paper). How can I achieve it with my Linux workstation?

I appreciate for help.
Best,

Lingyun

Extract CDS for WGD events

This tool helped my analysis a lot, Arthur. I have a question to understand the output files. How can I derive the CDS assigned to WGD events from the wgd mix output tsv? I see the gene families but don't see an obvious way to extract the corresponding CDS pair per row.

Concrete on my data:
content of the GMM mix output:
image
content of the ksd output for gene family 1:
image
Can I extract the CDS pairs of the ksd output from the rows in the mix output? (I might compare the stats like alignment cov, id and length, but is there a more unique way in doing it?)

Let me know if you need more information.

Unable to run wgd

Hi,

I am interested to use this tool in order to perform Ks and WGD analysis. I am using orthofinder for clustering of genes and hope to use this tool to plot comparison between species.

I followed instructions to install this software, and it was very easy. When I type wgd or other commands, I can see the options, which I think means that probably the tool was installed properly.

However, when I tried to run the program (any options), I am getting this error-

(py3) amit8chiba@amit8chiba-Precision-Tower-7910:/mnt/md0/Opu_r1.2_final/Comparitive_genomics/Ks_analysis_test/on-going_analysis$ wgd wf2 -n 16 cac_hc_gene_models.pep.fa, cro_v2.proteins.fasta ./cac_cro_Ks_out/
Traceback (most recent call last):
  File "/home/amit8chiba/miniconda2/bin/wgd", line 11, in <module>
    sys.exit(cli())
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/wgd_cli.py", line 1349, in wf2
    output_dir=blast_dir)
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/wgd_cli.py", line 334, in blast_mcl
    if can_i_run_software(software) == 1:
  File "/home/amit8chiba/miniconda2/lib/python2.7/site-packages/wgd/utils.py", line 67, in can_i_run_software
    except FileNotFoundError:
NameError: global name 'FileNotFoundError' is not defined

I first thought that It could be due to python version, and therefore, I created a py3 environment in conda to install python 3.5, and then activated the source. Then I reinstalled the tool but now getting the same message.

I will really appreciate your help to get this tool working for me.

Hope to get your advise, and please let me know if you need any further information.

errors in ksd step with the example dataset, newick error

Hello,

I ran into the following errors when trying run the command 'wgd ksd sample.mcl sample.fasta' on the Arabdopsis samples provided:
It goes on to a series of "Performing analysis on gene family GF_00000#", but suddenly spits the following (complete error [attached):](url
error_wgd.txt
)

##
multiprocessing.pool.RemoteTraceback:                                                                                                                     [776/1807]
"""                                                                                                                                                                 
Traceback (most recent call last):                                                                                                                                  
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 350, in __call__                                               
    return self.func(*args, **kwargs)                                                                                                                               
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/parallel.py", line 131, in __call__                                                         
    return [func(*args, **kwargs) for func, args, kwargs in self.items]                                                                                             
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/parallel.py", line 131, in <listcomp>                                                       
    return [func(*args, **kwargs) for func, args, kwargs in self.items]                                                                                             
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/wgd/ks_distribution.py", line 305, in analyse_family                                               
    results_dict, msa=msa_path_protein, method=method)                                                                                                              
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/wgd/ks_distribution.py", line 98, in _weighting
    tree_path, pairwise_estimates['Ks'])
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/wgd/phy.py", line 123, in phylogenetic_tree_to_cluster_format
    t = Tree(tree)
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/ete3/coretype/tree.py", line 213, in __init__
    quoted_names=quoted_node_names)
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/ete3/parser/newick.py", line 264, in read_newick
    raise NewickError('Unexisting tree file or Malformed newick tree structure.')
ete3.parser.newick.NewickError: Unexisting tree file or Malformed newick tree structure.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/BIOTECH/psharma/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/BIOTECH/psharma/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 359, in __call__
    raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
NewickError                                        Fri Oct 23 13:45:53 2020
PID: 18908        Python 3.7.7: /home/BIOTECH/psharma/miniconda3/bin/python


###

It goes on to run parallel.py and distribution.py, but this newick error appears several times:
NewickError: Unexisting tree file or Malformed newick tree structure.

The wgd_ksd is empty after the run is done.
Do you know what the problem may be?
Thank you for your help!
Best,
Guilherme

Issue with mcl analysis (step 1)

Hi Arthur, I am trying to use wgd tool to highly detect whole genome duplication in a black fungus, but I got an error during the step 1 (blast+ and mcl analysis) and I cannot get a .mcl file to proceed with analysis (I also tried with other genome sequences but got same error)
Thank you very much in advance.
I am using this command:

source activate wgd
module load ncbi-blast
module load mcl 
export LC_ALL=en_US.utf8
wgd mcl --cds --mcl -s Friedmanniomyces_endolithicus_CCFEE_5311.cds.fasta -o wgd__mcl_blast_output_Friedmanniomyces -n 4

This is the error:

2019-05-06 06:22:28: INFO       makeblastdb: 2.2.30+
Package: blast 2.2.30, build Oct 27 2014 16:58:06
2019-05-06 06:22:28: INFO       blastp: 2.2.30+
Package: blast 2.2.30, build Oct 27 2014 16:58:06
2019-05-06 06:22:28: INFO       mcl 14-137
Copyright (c) 1999-2014, Stijn van Dongen. mcl comes with NO WARRANTY
to the extent permitted by law. You may redistribute copies of mcl under
the terms of the GNU General Public License.
2019-05-06 06:22:28: INFO       Output directory: wgd__mcl_blast_output_Friedmanniomyces does not exist, will make it.
2019-05-06 06:22:28: INFO       CDS sequences provided, will first translate.
Sequence length != multiple of 3 for B0A54_02501-T1!
Invalid codon  CT in B0A54_02501-T1
Sequence length != multiple of 3 for 625353)]!
Invalid codon   T in 625353)]
Sequence length != multiple of 3 for 279254)]!
Invalid codon  AA in 279254)]
....
|#########################################################################################################| Elapsed Time: 0:00:21 Time:  0:00:21
2019-05-06 06:22:51: WARNING    There were 146 warnings during translation
2019-05-06 06:22:51: INFO       Writing blastdb sequences to db.fasta.
2019-05-06 06:22:51: INFO       Writing query sequences to query.fasta.
2019-05-06 06:22:51: INFO       Performing all-vs.-all Blastp (this might take a while)
2019-05-06 06:22:51: INFO       Making Blastdb


Building a new DB, current time: 05/06/2019 06:22:51
New DB name:   wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta
New DB title:  wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B

volume: wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta

file: wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta.pin
file: wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta.phr
file: wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta.psq

BLAST Database creation error: FASTA-Reader: No residues given
2019-05-06 06:22:52: INFO       Running Blastp
2019-05-06 06:22:52: INFO       blastp -db wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta -query wgd__mcl_blast_output_Friedmanniomyces/37522ff94b818a.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 4 -out wgd__mcl_blast_output_Friedmanniomyces/Friedmanniomyces_endolithicus_CCFEE_5311.cds.fasta.blast.tsv
2019-05-06 06:22:52: INFO       All versus all Blastp done
rm: cannot remove ‘wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta.phr’: No such file or directory
rm: cannot remove ‘wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta.pin’: No such file or directory
rm: cannot remove ‘wgd__mcl_blast_output_Friedmanniomyces/37522ff93b3a44.db.fasta.psq’: No such file or directory
2019-05-06 06:22:52: INFO       Blast done
2019-05-06 06:22:52: INFO       Performing MCL clustering (inflation factor = 2.0)
2019-05-06 06:22:52: INFO       Started MCL clustering (mcl)
2019-05-06 06:22:52: INFO       Done

compute a Ks distribution error

Hello, I am running the code in the example file during the test software installation.when I run the code :wgd ksd sample.mcl sample.fasta,an error happened.
wgd越界

I hope you can reply me when you see it. Thank you

noninteractive apt-get

There is an issue at wgd.shub. tzdata packet ask timezone interactively, but we can't answer this question. Because of that reason, the apt-get line should be modified like below to work:

apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -yq install python3-pip python3-tk git wget \
	    build-essential mcl ncbi-blast+ muscle mafft prank fasttree phyml paml

python error when running wgd

I have installed wgd on python 3.6.4, with all the prerequisites. OS linux, centos 6.6. The error I had is below. Your help is appreciated

[ashermoshe@lecs2 ~/dorothee]$ wgd --verbosity debug ksd schlosseri.mcl Botryllus_schlosseri.fas 
2019-02-26 11:14:26: DEBUG	CACHEDIR=/groups/pupko/ashermoshe/.cache/matplotlib
2019-02-26 11:14:26: DEBUG	Using fontManager instance from /groups/pupko/ashermoshe/.cache/matplotlib/fontList.json
2019-02-26 11:14:26: DEBUG	backend agg version v2.2
2019-02-26 11:14:27: INFO	
2019-02-26 11:14:27: INFO	codeml found
2019-02-26 11:14:27: INFO	MUSCLE v3.7 by Robert C. Edgar
2019-02-26 11:14:27: INFO	
2019-02-26 11:14:27: WARNING	Output directory exists, will possibly overwrite
2019-02-26 11:14:27: DEBUG	Reading CDS sequences
2019-02-26 11:14:28: INFO	Translating CDS file
2019-02-26 11:14:28: DEBUG	wrapping excepthook
100% (65587 of 65587) |#################################################| Elapsed Time: 0:00:21 Time:  0:00:21
2019-02-26 11:14:49: WARNING	There were 0 warnings during translation
2019-02-26 11:14:49: INFO	Started whole paranome Ks analysis
2019-02-26 11:14:49: WARNING	Filtered out the 0 largest gene families because n*(n-1)/2 > `max_pairwise`
2019-02-26 11:14:49: WARNING	If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter. 
2019-02-26 11:14:49: INFO	Started analysis in parallel (n_threads = 4)
2019-02-26 11:14:50: INFO	Analysis done
2019-02-26 11:14:50: INFO	Making results data frame
2019-02-26 11:14:50: INFO	Removing tmp directory
2019-02-26 11:14:50: INFO	Computing weights, outlier cut-off at Ks > 5
Traceback (most recent call last):
  File "/share/apps/anaconda3-5.1.0/bin/wgd", line 11, in <module>
    sys.exit(cli())
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd_cli.py", line 545, in ksd
    max_pairwise=max_pairwise
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd_cli.py", line 686, in ksd_
    max_pairwise=max_pairwise,
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/ks_distribution.py", line 665, in ks_analysis_paranome
    results_frame = compute_weights(results_frame)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/ks_distribution.py", line 709, in compute_weights
    df["WeightOutliersIncluded"] = 1 / df.groupby(['Family', 'Node'])[
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/pandas/core/generic.py", line 5162, in groupby
    **kwargs)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/pandas/core/groupby.py", line 1848, in groupby
    return klass(obj, by, **kwds)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/pandas/core/groupby.py", line 516, in __init__
    mutated=self.mutated)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/pandas/core/groupby.py", line 2934, in _get_grouper
    raise KeyError(gpr)
KeyError: 'Node'

error in plot generation in wgd ksd

Hi,

I get the below error when running wgd ksd in the plot generating step. I am running wgd in a singularity shell on a server running ubuntu. Is there a way to circumvent this error using this runtime environment?

And when I come to restart the ksd command, does it resume and use the previously generated intermediate files (I did run it with the --preserve flag)?

Any help greatly appreciated.

[...]
2020-12-10 18:49:17: INFO       Analysis done
2020-12-10 18:49:17: INFO       Making results data frame
2020-12-10 18:57:22: INFO       Removing tmp directory
2020-12-10 18:57:33: INFO       Computing weights, outlier cut-off at Ks > 5
2020-12-10 18:57:39: INFO       Generating plots
Traceback (most recent call last):
  File "/usr/local/bin/wgd", line 8, in <module>
    sys.exit(cli())
  File "/home1/joerg/.local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home1/joerg/.local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home1/joerg/.local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home1/joerg/.local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home1/joerg/.local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home1/joerg/.local/lib/python3.8/site-packages/wgd_cli.py", line 633, in ksd
    ksd_(
  File "/home1/joerg/.local/lib/python3.8/site-packages/wgd_cli.py", line 790, in ksd_
    plot_selection(
  File "/home1/joerg/.local/lib/python3.8/site-packages/wgd/viz.py", line 103, in plot_selection
    fig = plt.figure(figsize=(15, 9.27))
  File "/home1/joerg/.local/lib/python3.8/site-packages/matplotlib/pyplot.py", line 687, in figure
    figManager = new_figure_manager(num, figsize=figsize,
  File "/home1/joerg/.local/lib/python3.8/site-packages/matplotlib/pyplot.py", line 315, in new_figure_manager
    return _backend_mod.new_figure_manager(*args, **kwargs)
  File "/home1/joerg/.local/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 3494, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "/home1/joerg/.local/lib/python3.8/site-packages/matplotlib/backends/_backend_tk.py", line 885, in new_figure_manager_given_figure
    window = tk.Tk(className="matplotlib")
  File "/usr/lib/python3.8/tkinter/__init__.py", line 2261, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display "localhost:13.0"

why wgd can't run syn in one_v_one model?

I want to know why wgd can't use the syn between two different genomes. Other pipelines(such as MCscan) are always run collineation first , and then calculate the Ks of collinear parologs.

Problem in ksd step

Hello,

Firstly, I am aware there is an open ticket about a similar problem but I didn't want to hijack that ticket because the problem might not be with codeml. And thank you for this great pipeline!

$ wgd --verbosity debug ksd cds.lactea_sample.mcl cds.lactea_sample.fa 
2019-08-01 15:25:06: DEBUG	CACHEDIR=/home/papaya/.cache/matplotlib
2019-08-01 15:25:06: DEBUG	Using fontManager instance from /home/papaya/.cache/matplotlib/fontlist-v310.json
2019-08-01 15:25:06: DEBUG	Loaded backend qt5agg version unknown.
2019-08-01 15:25:06: DEBUG	Loaded backend tkagg version unknown.
2019-08-01 15:25:06: DEBUG	Loaded backend TkAgg version unknown.
2019-08-01 15:25:06: INFO	
2019-08-01 15:25:06: INFO	codeml found
2019-08-01 15:25:06: INFO	MUSCLE v3.8.31 by Robert C. Edgar
2019-08-01 15:25:06: INFO	
2019-08-01 15:25:06: WARNING	Output directory exists, will possibly overwrite
2019-08-01 15:25:06: DEBUG	Reading CDS sequences
2019-08-01 15:25:06: INFO	Translating CDS file
2019-08-01 15:25:06: DEBUG	wrapping excepthook
100% (1001 of 1001) |##################################################################################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
2019-08-01 15:25:06: WARNING	There were 0 warnings during translation
2019-08-01 15:25:06: INFO	Started whole paranome Ks analysis
2019-08-01 15:25:06: WARNING	Filtered out the 0 largest gene families because n*(n-1)/2 > `max_pairwise`
2019-08-01 15:25:06: WARNING	If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter. 
2019-08-01 15:25:06: INFO	Started analysis in parallel (n_threads = 4)
2019-08-01 15:25:06: INFO	Analysis done
2019-08-01 15:25:06: INFO	Making results data frame
2019-08-01 15:25:06: INFO	Removing tmp directory
2019-08-01 15:25:06: INFO	Computing weights, outlier cut-off at Ks > 5
Traceback (most recent call last):
  File "/home/papaya/anaconda2/envs/wgd/bin/wgd", line 10, in <module>
    sys.exit(cli())
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/wgd_cli.py", line 545, in ksd
    max_pairwise=max_pairwise
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/wgd_cli.py", line 686, in ksd_
    max_pairwise=max_pairwise,
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/wgd/ks_distribution.py", line 665, in ks_analysis_paranome
    results_frame = compute_weights(results_frame)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/wgd/ks_distribution.py", line 709, in compute_weights
    df["WeightOutliersIncluded"] = 1 / df.groupby(['Family', 'Node'])[
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/pandas/core/generic.py", line 7632, in groupby
    observed=observed, **kwargs)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2110, in groupby
    return klass(obj, by, **kwds)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 360, in __init__
    mutated=self.mutated)
  File "/home/papaya/anaconda2/envs/wgd/lib/python3.6/site-packages/pandas/core/groupby/grouper.py", line 578, in _get_grouper
    raise KeyError(gpr)
KeyError: 'Node'

Now I am using a small sample from my whole cds data as in the supplementary method example from the publication. Can this be related to the problem? I am not very strong in bioinformatics so I would be really glad if you can help me out.

I am adding the input files in the attachment. Thanks in advance.

Edit: Spelling.

cds.lactea_sample.mcl.txt

cds.lactea_sample.fa.txt

Warnings and errors during ksd analysis

Hi, I am performing ksd with wgd and getting loads of warnings and errors like this:
I.e:

2020-06-21 11:34:49: WARNING	No ω value for PG7P017240 - PG2P020626!
2020-06-21 11:34:56: ERROR	Not all gene pairs present in /home/alex/wgd_venv/wgd/ks_tmp.3895c2cde7f7ac/GF_000303.codeml

What is the reason for this?
Not sure what should be attached for you to answer, but maybe you will be able to give me a hint anyway..?

wgd dmd error

Dear teams,
When I used the wgd dmd, it comes the following error:

Traceback (most recent call last):
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/bin/wgd", line 8, in
sys.exit(cli())
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/wgd_cli.py", line 528, in dmd
s[0].get_paranome(inflation=inflation, eval=eval)
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/wgd/diamond.py", line 94, in get_paranome
df = self.run_diamond(self, eval=eval)
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/wgd/diamond.py", line 71, in run_diamond
self.make_diamond_db()
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/site-pack ages/wgd/diamond.py", line 65, in make_diamond_db
out = sp.run(cmd, capture_output=True)
File "/Bio/home/Yanglab/Yuzhenpeng/miniconda2/envs/wgd/lib/python3.6/subproces s.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
TypeError: init() got an unexpected keyword argument 'capture_output'

My computer used the python3 environment. It can run 'wgd mcl' very well. Could you please give me a help.
Thank you.

Zhenpeng

Error during ksd with some gene families

Hi,

I am calculating ks with "wgd ksd". I really enjoy using wgd, and the whole process went well with most datasets, but in some cases, the program first went well and then stopped with an error report for a particular gene family, and it seems to be related to the failure of creating the Ks file.

The input files and the error report are here:
test.zip

My command is:
wgd ksd -n 10 Bo.Bo.homo Bo.Bo.cds

I would be grateful if you can help me with this error. Many thanks to your help!

Problem with gene families containing 2 sequences ?

Hi,
I would like to know if the smallest multi-copies families can be an issue for wgd... I can imagine phyml will maybe run into trouble building a tree with only 2 sequences, but is wgd dealing with these families ?
I ask this question because analyses seems to stall using phyml for the second step (ksd), each time at a gene family containing 2 members... I would like to exclude this issue before troubleshooting more :-)

NewickError: Unexisting tree file or Malformed newick tree structure.

I installed WGD with all prerequisites. I tried the following command after preparing the mcl file using the
[ashermoshe@login-0-0 ~/dorothee]$ wgd ksd schlosseri.mcl Botryllus_schlosseri.fas
command, but I got an error somewhere downstream.
I would appreciate your help in understanding how to approach it.

2019-02-27 17:18:23: INFO
2019-02-27 17:18:23: INFO       codeml found
2019-02-27 17:18:23: INFO       MUSCLE v3.7 by Robert C. Edgar
2019-02-27 17:18:23: INFO
2019-02-27 17:18:23: WARNING    Output directory exists, will possibly overwrite
2019-02-27 17:18:24: INFO       Translating CDS file
100% (65587 of 65587) |##################################################################################################################| Elapsed Time: 0:00:17 Time:  0:00:17
2019-02-27 17:18:42: WARNING    There were 0 warnings during translation
2019-02-27 17:18:42: INFO       Started whole paranome Ks analysis
2019-02-27 17:18:42: WARNING    Filtered out the 5 largest gene families because n*(n-1)/2 > `max_pairwise`
2019-02-27 17:18:42: WARNING    If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter.
2019-02-27 17:18:42: INFO       Started analysis in parallel (n_threads = 4)
2019-02-27 17:18:42: INFO       Performing analysis on gene family GF_000006
2019-02-27 17:18:43: INFO       Performing analysis on gene family GF_000007
2019-02-27 17:18:43: INFO       Performing analysis on gene family GF_000008
2019-02-27 17:18:43: INFO       Performing analysis on gene family GF_000009
2019-02-27 17:18:49: INFO       Performing analysis on gene family GF_000010
2019-02-27 17:19:12: INFO       Performing analysis on gene family GF_000011
2019-02-27 17:19:13: INFO       Performing analysis on gene family GF_000012
2019-02-27 17:21:05: INFO       Performing analysis on gene family GF_000013
2019-02-27 17:21:57: INFO       Performing analysis on gene family GF_000014
2019-02-27 17:22:08: INFO       Performing analysis on gene family GF_000015
2019-02-27 17:22:35: INFO       Performing analysis on gene family GF_000016
2019-02-27 17:23:03: INFO       Performing analysis on gene family GF_000017
2019-02-27 17:23:22: INFO       Performing analysis on gene family GF_000018
2019-02-27 17:23:50: INFO       Performing analysis on gene family GF_000019
2019-02-27 17:24:02: INFO       Performing analysis on gene family GF_000020
2019-02-27 17:26:30: INFO       Performing analysis on gene family GF_000021
2019-02-27 17:26:35: INFO       Performing analysis on gene family GF_000022
2019-02-27 17:26:40: INFO       Performing analysis on gene family GF_000023
2019-02-27 17:26:42: INFO       Performing analysis on gene family GF_000024
2019-02-27 17:26:50: INFO       Performing analysis on gene family GF_000025
2019-02-27 17:26:59: INFO       Performing analysis on gene family GF_000026
2019-02-27 17:39:17: INFO       Performing analysis on gene family GF_000027
2019-02-27 17:39:17: INFO       Performing analysis on gene family GF_000028
2019-02-27 17:40:25: INFO       Performing analysis on gene family GF_000029
2019-02-27 17:40:52: INFO       Performing analysis on gene family GF_000030
2019-02-27 17:41:04: INFO       Performing analysis on gene family GF_000031
2019-02-27 17:41:09: INFO       Performing analysis on gene family GF_000032
2019-02-27 17:48:02: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN77472_c4_g3::TRINITY_DN77472_c4_g3_i12::g.41868::m.41868 - Botryllus_schlosseri_TRINITY_DN77472_c4_g3::TRINITY_DN77472_c4_g3_i7::g.41841::m.41841!
2019-02-27 17:48:02: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN82389_c1_g1::TRINITY_DN82389_c1_g1_i1::g.79015::m.79015 - Botryllus_schlosseri_TRINITY_DN82597_c2_g1::TRINITY_DN82597_c2_g1_i1::g.145711::m.145711!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN82579_c2_g2::TRINITY_DN82579_c2_g2_i2::g.147899::m.147899 - Botryllus_schlosseri_TRINITY_DN82597_c2_g1::TRINITY_DN82597_c2_g1_i1::g.145711::m.145711!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN82579_c2_g2::TRINITY_DN82579_c2_g2_i2::g.147899::m.147899 - Botryllus_schlosseri_TRINITY_DN82389_c1_g1::TRINITY_DN82389_c1_g1_i1::g.79015::m.79015!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN79293_c3_g3::TRINITY_DN79293_c3_g3_i1::g.306985::m.306985 - Botryllus_schlosseri_TRINITY_DN70319_c5_g1::TRINITY_DN70319_c5_g1_i6::g.86980::m.86980!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN80015_c3_g1::TRINITY_DN80015_c3_g1_i11::g.69428::m.69428 - Botryllus_schlosseri_TRINITY_DN83038_c1_g1::TRINITY_DN83038_c1_g1_i3::g.203559::m.203559!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN72881_c1_g2::TRINITY_DN72881_c1_g2_i9::g.188509::m.188509 - Botryllus_schlosseri_TRINITY_DN76803_c3_g2::TRINITY_DN76803_c3_g2_i4::g.291848::m.291848!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN82877_c1_g1::TRINITY_DN82877_c1_g1_i7::g.185179::m.185179 - Botryllus_schlosseri_TRINITY_DN83038_c1_g1::TRINITY_DN83038_c1_g1_i3::g.203559::m.203559!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN82877_c1_g1::TRINITY_DN82877_c1_g1_i7::g.185179::m.185179 - Botryllus_schlosseri_TRINITY_DN76803_c3_g2::TRINITY_DN76803_c3_g2_i4::g.291848::m.291848!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN82877_c1_g1::TRINITY_DN82877_c1_g1_i7::g.185179::m.185179 - Botryllus_schlosseri_TRINITY_DN80015_c3_g1::TRINITY_DN80015_c3_g1_i11::g.69428::m.69428!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65697_c0_g1::TRINITY_DN65697_c0_g1_i1::g.293174::m.293174 - Botryllus_schlosseri_TRINITY_DN83038_c1_g1::TRINITY_DN83038_c1_g1_i3::g.203559::m.203559!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65697_c0_g1::TRINITY_DN65697_c0_g1_i1::g.293174::m.293174 - Botryllus_schlosseri_TRINITY_DN80015_c3_g1::TRINITY_DN80015_c3_g1_i11::g.69428::m.69428!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65697_c0_g1::TRINITY_DN65697_c0_g1_i1::g.293174::m.293174 - Botryllus_schlosseri_TRINITY_DN82877_c1_g1::TRINITY_DN82877_c1_g1_i7::g.185179::m.185179!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65038_c0_g1::TRINITY_DN65038_c0_g1_i1::g.106604::m.106604 - Botryllus_schlosseri_TRINITY_DN79594_c0_g1::TRINITY_DN79594_c0_g1_i1::g.278650::m.278650!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65038_c0_g1::TRINITY_DN65038_c0_g1_i1::g.106604::m.106604 - Botryllus_schlosseri_TRINITY_DN76803_c3_g2::TRINITY_DN76803_c3_g2_i4::g.291848::m.291848!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65038_c0_g1::TRINITY_DN65038_c0_g1_i2::g.106605::m.106605 - Botryllus_schlosseri_TRINITY_DN79594_c0_g1::TRINITY_DN79594_c0_g1_i1::g.278650::m.278650!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65038_c0_g1::TRINITY_DN65038_c0_g1_i2::g.106605::m.106605 - Botryllus_schlosseri_TRINITY_DN76803_c3_g2::TRINITY_DN76803_c3_g2_i4::g.291848::m.291848!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN65038_c0_g1::TRINITY_DN65038_c0_g1_i2::g.106605::m.106605 - Botryllus_schlosseri_TRINITY_DN65038_c0_g1::TRINITY_DN65038_c0_g1_i1::g.106604::m.106604!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN83231_c0_g1::TRINITY_DN83231_c0_g1_i1::g.112228::m.112228 - Botryllus_schlosseri_TRINITY_DN76803_c3_g2::TRINITY_DN76803_c3_g2_i4::g.291848::m.291848!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN83231_c0_g1::TRINITY_DN83231_c0_g1_i1::g.112228::m.112228 - Botryllus_schlosseri_TRINITY_DN82877_c1_g1::TRINITY_DN82877_c1_g1_i7::g.185179::m.185179!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN80868_c1_g1::TRINITY_DN80868_c1_g1_i17::g.220347::m.220347 - Botryllus_schlosseri_TRINITY_DN83038_c1_g1::TRINITY_DN83038_c1_g1_i3::g.203559::m.203559!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN80868_c1_g1::TRINITY_DN80868_c1_g1_i17::g.220347::m.220347 - Botryllus_schlosseri_TRINITY_DN76803_c3_g2::TRINITY_DN76803_c3_g2_i4::g.291848::m.291848!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN80868_c1_g1::TRINITY_DN80868_c1_g1_i17::g.220347::m.220347 - Botryllus_schlosseri_TRINITY_DN80015_c3_g1::TRINITY_DN80015_c3_g1_i11::g.69428::m.69428!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN80868_c1_g1::TRINITY_DN80868_c1_g1_i17::g.220347::m.220347 - Botryllus_schlosseri_TRINITY_DN82877_c1_g1::TRINITY_DN82877_c1_g1_i7::g.185179::m.185179!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN80868_c1_g1::TRINITY_DN80868_c1_g1_i17::g.220347::m.220347 - Botryllus_schlosseri_TRINITY_DN65697_c0_g1::TRINITY_DN65697_c0_g1_i1::g.293174::m.293174!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN80868_c1_g1::TRINITY_DN80868_c1_g1_i17::g.220347::m.220347 - Botryllus_schlosseri_TRINITY_DN83231_c0_g1::TRINITY_DN83231_c0_g1_i1::g.112228::m.112228!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN83038_c0_g1::TRINITY_DN83038_c0_g1_i2::g.203556::m.203556 - Botryllus_schlosseri_TRINITY_DN76803_c3_g2::TRINITY_DN76803_c3_g2_i4::g.291848::m.291848!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN83038_c0_g1::TRINITY_DN83038_c0_g1_i2::g.203556::m.203556 - Botryllus_schlosseri_TRINITY_DN82877_c1_g1::TRINITY_DN82877_c1_g1_i7::g.185179::m.185179!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN83038_c0_g1::TRINITY_DN83038_c0_g1_i2::g.203556::m.203556 - Botryllus_schlosseri_TRINITY_DN83231_c0_g1::TRINITY_DN83231_c0_g1_i1::g.112228::m.112228!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN83038_c0_g1::TRINITY_DN83038_c0_g1_i2::g.203556::m.203556 - Botryllus_schlosseri_TRINITY_DN80868_c1_g1::TRINITY_DN80868_c1_g1_i17::g.220347::m.220347!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN81132_c0_g1::TRINITY_DN81132_c0_g1_i4::g.47004::m.47004 - Botryllus_schlosseri_TRINITY_DN77472_c4_g3::TRINITY_DN77472_c4_g3_i7::g.41841::m.41841!
2019-02-27 17:48:03: WARNING    No Ks value for Botryllus_schlosseri_TRINITY_DN81132_c0_g1::TRINITY_DN81132_c0_g1_i4::g.47004::m.47004 - Botryllus_schlosseri_TRINITY_DN77472_c4_g3::TRINITY_DN77472_c4_g3_i12::g.41868::m.41868!
2019-02-27 17:48:05: INFO       Performing analysis on gene family GF_000033
2019-02-27 17:51:44: INFO       Performing analysis on gene family GF_000034
2019-02-27 17:51:56: INFO       Performing analysis on gene family GF_000035
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 350, in __call__
    return self.func(*args, **kwargs)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/joblib/parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/ks_distribution.py", line 303, in analyse_family
    results_dict, msa=msa_path_protein, method=method)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/ks_distribution.py", line 98, in _weighting
    tree_path, pairwise_estimates['Ks'])
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/phy.py", line 123, in phylogenetic_tree_to_cluster_format
    t = Tree(tree)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/ete3/coretype/tree.py", line 211, in __init__
    quoted_names=quoted_node_names)
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/ete3/parser/newick.py", line 249, in read_newick
    raise NewickError('Unexisting tree file or Malformed newick tree structure.')
ete3.parser.newick.NewickError: Unexisting tree file or Malformed newick tree structure.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 359, in __call__
    raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
NewickError                                        Wed Feb 27 18:10:41 2019
PID: 62472             Python 3.6.8: /share/apps/anaconda3-5.1.0/bin/python
...........................................................................
/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function analyse_family>, ('GF_000012', {'Botryllus_schlosseri_TRINITY_DN47590_c0_g1::TRINITY_DN47590_c0_g1_i1::g.253656::m.253656': 'MGETKSTITNLYGHNNRAKRRTQLRELAIRTRTEKSHLIAGDFNGIDS...ARRQKDQLRSDEKEMQKVRNRVIQNQQYLSDYRKIKKKIWKEKAESLHK', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g1::TRINITY_DN50048_c0_g1_i1::g.248848::m.248848': 'GAVISIDFLKAYDSVDHSFLHNTLEEAGFGVKVRAFFKAIYQGGSAKV...SGMKGKIATPSYADDVTITLAKEEESTKALQIVAEFGKASGLQINRKKT', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g2::TRINITY_DN50048_c0_g2_i1::g.248849::m.248849': 'QKTTSQFARGIIKTIFKKGDKEDIRNYRPITILNVDYKIISKVITNRIQKVLPTITHRHQFINPPNTIGDLNLLLREVTSDMRERSRGA', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g3::TRINITY_DN50048_c0_g3_i1::g.248850::m.248850': 'AGDFNGIDDIELDRDPVNIRHDAADAKYSKRVMEVIGVTDAFRQVHGS...CNHMPCPFSDHGATTALVKLTDHRPRRPNTWKNNTKVYEMEAFETELEV', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g4::TRINITY_DN50048_c0_g4_i1::g.248851::m.248851': 'KTWADKATDLGKLQLEAKADAEEALGKRPHLLCEKIKVRRDAVSITAIKDATGKTTEDPEEIRETVEEFYQKLYSKRETDKCTANSFHRYQDAKLSTR', 'Botryllus_schlosseri_TRINITY_DN50200_c0_g1::TRINITY_DN50200_c0_g1_i1::g.61823::m.61823': 'GEDGLSSELYMVNLDLMKKELTEVYNEIYEAQGTTTSLGRAVLKIIHK...KNYRPISLLNSDYKILSKILTNRLKQALPSITHQHQHVNPPKTIGQINL', 'Botryllus_schlosseri_TRINITY_DN52423_c0_g2::TRINITY_DN52423_c0_g2_i1::g.262149::m.262149': 'GKANISIGGKLGGNIRLGRGIKQGDPISMLLFTMATDPLLQRLNHDLD...DVNITLAHQADVNEALKIIQDFEEASALKLNKNKSKGITYHPKPPPGSK', 'Botryllus_schlosseri_TRINITY_DN52423_c0_g3::TRINITY_DN52423_c0_g3_i1::g.262150::m.262150': 'PPPGSKNVLKWVQSMEVLGHVINRHPPNDHETWNGLANKAKDLMREIK...YVATLKKMPINTRRELETAVTELLFGKSMRPDYRKLIQRREAGGIGLVD', 'Botryllus_schlosseri_TRINITY_DN58171_c0_g1::TRINITY_DN58171_c0_g1_i1::g.196397::m.196397': 'LEYEIVEFLFGKGKRPEYKKLVQQEIAGGREVKDIPTITDIIFIKPAV...RTDHQLALTLGWLRERPINNSRPHTWQPRQHWAEMAKIMKEMEYKRDYI', 'Botryllus_schlosseri_TRINITY_DN61293_c1_g1::TRINITY_DN61293_c1_g1_i1::g.134659::m.134659': 'AAFVMDLDGKIEYEKIIQARESGGLELVDIPTMTDLAFVKPALRYLQR...MKRYGMRKINNAIPHVFQPLQHWQETEKTMRSLGRQQQDIKSKRRERYR', ...}, {'Botryllus_schlosseri_TRINITY_DN10080_c0_g1::TRINITY_DN10080_c0_g1_i1::g.308497::m.308497': 'CTTAATTGTGCTTCATTTTTTAGCTCTCAAATGGCTCGATCAACGAAA...CATGGCGGAAGGCACGAAGGCCGTCGCCAAGCTCGCTGCCAGCAAATAA', 'Botryllus_schlosseri_TRINITY_DN10087_c0_g1::TRINITY_DN10087_c0_g1_i1::g.308496::m.308496': 'ATGGCGGTGGTGACACTGTTCTCGGTGGGGCATTCTAGGGAGGTGCGG...CTACCTCGACCCTCGGCTAGACCAACCATGGCCCCGCGTCGCGAAGGCC', 'Botryllus_schlosseri_TRINITY_DN1013_c0_g1::TRINITY_DN1013_c0_g1_i1::g.248880::m.248880': 'TGGGCAGTCCTGACTGGCACTATCAAGAATAACAGCAACATCCAATGC...GAAAAACTATGGGGACATTATTAACCCTGCTGATGGAATCTGTTTGACC', 'Botryllus_schlosseri_TRINITY_DN1013_c0_g2::TRINITY_DN1013_c0_g2_i1::g.248881::m.248881': 'CTGATTCTTCTCATCGGCTCTCTCTGCTTCACTCTGACCCACGGCCTT...TCTGAAGGCCTACGCGATTGCCAAGACGGCCAAAGACTCTTACAACTGG', 'Botryllus_schlosseri_TRINITY_DN10166_c0_g2::TRINITY_DN10166_c0_g2_i1::g.27418::m.27418': 'GTACATTTGAAGAAAATGTCTGAATTGATCCATCACGAAAAAGCTTTC...GGTCATCTCGAACAAAATGCACAGAACCATCGTGATCAGAAGAGACTAC', 'Botryllus_schlosseri_TRINITY_DN1017_c0_g1::TRINITY_DN1017_c0_g1_i1::g.248877::m.248877': 'TGCCATCTTCCGGCACTCATGGCCAAGATTGACGAGGTGCACCACTCG...CCAAGTTAAGACCTTGCCCACTGAGAAGCATACCCAGCCGGAGTGCTAG', 'Botryllus_schlosseri_TRINITY_DN101_c0_g1::TRINITY_DN101_c0_g1_i1::g.75736::m.75736': 'GAGCGATATCACGCTCGCTCGTTCAGCAAGCGCGAACGGCTGTATCGG...GAACCGGACCTTCGTCGATCTACCTGAGTTCATAGATGAAATGGATCCG', 'Botryllus_schlosseri_TRINITY_DN10203_c0_g1::TRINITY_DN10203_c0_g1_i1::g.234676::m.234676': 'GAAGGGCGGTTCGTGCCGCACCTGGTGGTCCGCATCGACGCCGGCACG...GGTCAACACAGATAAGGTATCGGCTACCACGGCCATCGAGGTCTTGCGC', 'Botryllus_schlosseri_TRINITY_DN10220_c0_g2::TRINITY_DN10220_c0_g2_i1::g.234678::m.234678': 'GATCATCGGTATGGACTACCCGTGGAAATCGAGATGACTACAATGGAC...TCGAGTGGTGCTAAATGCTGCTACTGATACATTTCGTGGTATACTAGAT', 'Botryllus_schlosseri_TRINITY_DN10249_c0_g1::TRINITY_DN10249_c0_g1_i1::g.234674::m.234674': 'GAAGGACTCTCGTTGATCGAACGCCGCAACTGCATGGAGCTGTTCCGA...GGATCGTCTCGACAAGGTTTACGCCAGGCACAGCGACATGGTCCTGTTG', ...}, '/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862', 'codeml', False, 1, 100, 'fasttree', 'muscle', '/groups/pupko/ashermoshe/dorothee/wgd_ksd'), {})]
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function analyse_family>
        args = ('GF_000012', {'Botryllus_schlosseri_TRINITY_DN47590_c0_g1::TRINITY_DN47590_c0_g1_i1::g.253656::m.253656': 'MGETKSTITNLYGHNNRAKRRTQLRELAIRTRTEKSHLIAGDFNGIDS...ARRQKDQLRSDEKEMQKVRNRVIQNQQYLSDYRKIKKKIWKEKAESLHK', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g1::TRINITY_DN50048_c0_g1_i1::g.248848::m.248848': 'GAVISIDFLKAYDSVDHSFLHNTLEEAGFGVKVRAFFKAIYQGGSAKV...SGMKGKIATPSYADDVTITLAKEEESTKALQIVAEFGKASGLQINRKKT', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g2::TRINITY_DN50048_c0_g2_i1::g.248849::m.248849': 'QKTTSQFARGIIKTIFKKGDKEDIRNYRPITILNVDYKIISKVITNRIQKVLPTITHRHQFINPPNTIGDLNLLLREVTSDMRERSRGA', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g3::TRINITY_DN50048_c0_g3_i1::g.248850::m.248850': 'AGDFNGIDDIELDRDPVNIRHDAADAKYSKRVMEVIGVTDAFRQVHGS...CNHMPCPFSDHGATTALVKLTDHRPRRPNTWKNNTKVYEMEAFETELEV', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g4::TRINITY_DN50048_c0_g4_i1::g.248851::m.248851': 'KTWADKATDLGKLQLEAKADAEEALGKRPHLLCEKIKVRRDAVSITAIKDATGKTTEDPEEIRETVEEFYQKLYSKRETDKCTANSFHRYQDAKLSTR', 'Botryllus_schlosseri_TRINITY_DN50200_c0_g1::TRINITY_DN50200_c0_g1_i1::g.61823::m.61823': 'GEDGLSSELYMVNLDLMKKELTEVYNEIYEAQGTTTSLGRAVLKIIHK...KNYRPISLLNSDYKILSKILTNRLKQALPSITHQHQHVNPPKTIGQINL', 'Botryllus_schlosseri_TRINITY_DN52423_c0_g2::TRINITY_DN52423_c0_g2_i1::g.262149::m.262149': 'GKANISIGGKLGGNIRLGRGIKQGDPISMLLFTMATDPLLQRLNHDLD...DVNITLAHQADVNEALKIIQDFEEASALKLNKNKSKGITYHPKPPPGSK', 'Botryllus_schlosseri_TRINITY_DN52423_c0_g3::TRINITY_DN52423_c0_g3_i1::g.262150::m.262150': 'PPPGSKNVLKWVQSMEVLGHVINRHPPNDHETWNGLANKAKDLMREIK...YVATLKKMPINTRRELETAVTELLFGKSMRPDYRKLIQRREAGGIGLVD', 'Botryllus_schlosseri_TRINITY_DN58171_c0_g1::TRINITY_DN58171_c0_g1_i1::g.196397::m.196397': 'LEYEIVEFLFGKGKRPEYKKLVQQEIAGGREVKDIPTITDIIFIKPAV...RTDHQLALTLGWLRERPINNSRPHTWQPRQHWAEMAKIMKEMEYKRDYI', 'Botryllus_schlosseri_TRINITY_DN61293_c1_g1::TRINITY_DN61293_c1_g1_i1::g.134659::m.134659': 'AAFVMDLDGKIEYEKIIQARESGGLELVDIPTMTDLAFVKPALRYLQR...MKRYGMRKINNAIPHVFQPLQHWQETEKTMRSLGRQQQDIKSKRRERYR', ...}, {'Botryllus_schlosseri_TRINITY_DN10080_c0_g1::TRINITY_DN10080_c0_g1_i1::g.308497::m.308497': 'CTTAATTGTGCTTCATTTTTTAGCTCTCAAATGGCTCGATCAACGAAA...CATGGCGGAAGGCACGAAGGCCGTCGCCAAGCTCGCTGCCAGCAAATAA', 'Botryllus_schlosseri_TRINITY_DN10087_c0_g1::TRINITY_DN10087_c0_g1_i1::g.308496::m.308496': 'ATGGCGGTGGTGACACTGTTCTCGGTGGGGCATTCTAGGGAGGTGCGG...CTACCTCGACCCTCGGCTAGACCAACCATGGCCCCGCGTCGCGAAGGCC', 'Botryllus_schlosseri_TRINITY_DN1013_c0_g1::TRINITY_DN1013_c0_g1_i1::g.248880::m.248880': 'TGGGCAGTCCTGACTGGCACTATCAAGAATAACAGCAACATCCAATGC...GAAAAACTATGGGGACATTATTAACCCTGCTGATGGAATCTGTTTGACC', 'Botryllus_schlosseri_TRINITY_DN1013_c0_g2::TRINITY_DN1013_c0_g2_i1::g.248881::m.248881': 'CTGATTCTTCTCATCGGCTCTCTCTGCTTCACTCTGACCCACGGCCTT...TCTGAAGGCCTACGCGATTGCCAAGACGGCCAAAGACTCTTACAACTGG', 'Botryllus_schlosseri_TRINITY_DN10166_c0_g2::TRINITY_DN10166_c0_g2_i1::g.27418::m.27418': 'GTACATTTGAAGAAAATGTCTGAATTGATCCATCACGAAAAAGCTTTC...GGTCATCTCGAACAAAATGCACAGAACCATCGTGATCAGAAGAGACTAC', 'Botryllus_schlosseri_TRINITY_DN1017_c0_g1::TRINITY_DN1017_c0_g1_i1::g.248877::m.248877': 'TGCCATCTTCCGGCACTCATGGCCAAGATTGACGAGGTGCACCACTCG...CCAAGTTAAGACCTTGCCCACTGAGAAGCATACCCAGCCGGAGTGCTAG', 'Botryllus_schlosseri_TRINITY_DN101_c0_g1::TRINITY_DN101_c0_g1_i1::g.75736::m.75736': 'GAGCGATATCACGCTCGCTCGTTCAGCAAGCGCGAACGGCTGTATCGG...GAACCGGACCTTCGTCGATCTACCTGAGTTCATAGATGAAATGGATCCG', 'Botryllus_schlosseri_TRINITY_DN10203_c0_g1::TRINITY_DN10203_c0_g1_i1::g.234676::m.234676': 'GAAGGGCGGTTCGTGCCGCACCTGGTGGTCCGCATCGACGCCGGCACG...GGTCAACACAGATAAGGTATCGGCTACCACGGCCATCGAGGTCTTGCGC', 'Botryllus_schlosseri_TRINITY_DN10220_c0_g2::TRINITY_DN10220_c0_g2_i1::g.234678::m.234678': 'GATCATCGGTATGGACTACCCGTGGAAATCGAGATGACTACAATGGAC...TCGAGTGGTGCTAAATGCTGCTACTGATACATTTCGTGGTATACTAGAT', 'Botryllus_schlosseri_TRINITY_DN10249_c0_g1::TRINITY_DN10249_c0_g1_i1::g.234674::m.234674': 'GAAGGACTCTCGTTGATCGAACGCCGCAACTGCATGGAGCTGTTCCGA...GGATCGTCTCGACAAGGTTTACGCCAGGCACAGCGACATGGTCCTGTTG', ...}, '/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862', 'codeml', False, 1, 100, 'fasttree', 'muscle', '/groups/pupko/ashermoshe/dorothee/wgd_ksd')
        kwargs = {}
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/ks_distribution.py in analyse_family(family_id='GF_000012', family={'Botryllus_schlosseri_TRINITY_DN47590_c0_g1::TRINITY_DN47590_c0_g1_i1::g.253656::m.253656': 'MGETKSTITNLYGHNNRAKRRTQLRELAIRTRTEKSHLIAGDFNGIDS...ARRQKDQLRSDEKEMQKVRNRVIQNQQYLSDYRKIKKKIWKEKAESLHK', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g1::TRINITY_DN50048_c0_g1_i1::g.248848::m.248848': 'GAVISIDFLKAYDSVDHSFLHNTLEEAGFGVKVRAFFKAIYQGGSAKV...SGMKGKIATPSYADDVTITLAKEEESTKALQIVAEFGKASGLQINRKKT', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g2::TRINITY_DN50048_c0_g2_i1::g.248849::m.248849': 'QKTTSQFARGIIKTIFKKGDKEDIRNYRPITILNVDYKIISKVITNRIQKVLPTITHRHQFINPPNTIGDLNLLLREVTSDMRERSRGA', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g3::TRINITY_DN50048_c0_g3_i1::g.248850::m.248850': 'AGDFNGIDDIELDRDPVNIRHDAADAKYSKRVMEVIGVTDAFRQVHGS...CNHMPCPFSDHGATTALVKLTDHRPRRPNTWKNNTKVYEMEAFETELEV', 'Botryllus_schlosseri_TRINITY_DN50048_c0_g4::TRINITY_DN50048_c0_g4_i1::g.248851::m.248851': 'KTWADKATDLGKLQLEAKADAEEALGKRPHLLCEKIKVRRDAVSITAIKDATGKTTEDPEEIRETVEEFYQKLYSKRETDKCTANSFHRYQDAKLSTR', 'Botryllus_schlosseri_TRINITY_DN50200_c0_g1::TRINITY_DN50200_c0_g1_i1::g.61823::m.61823': 'GEDGLSSELYMVNLDLMKKELTEVYNEIYEAQGTTTSLGRAVLKIIHK...KNYRPISLLNSDYKILSKILTNRLKQALPSITHQHQHVNPPKTIGQINL', 'Botryllus_schlosseri_TRINITY_DN52423_c0_g2::TRINITY_DN52423_c0_g2_i1::g.262149::m.262149': 'GKANISIGGKLGGNIRLGRGIKQGDPISMLLFTMATDPLLQRLNHDLD...DVNITLAHQADVNEALKIIQDFEEASALKLNKNKSKGITYHPKPPPGSK', 'Botryllus_schlosseri_TRINITY_DN52423_c0_g3::TRINITY_DN52423_c0_g3_i1::g.262150::m.262150': 'PPPGSKNVLKWVQSMEVLGHVINRHPPNDHETWNGLANKAKDLMREIK...YVATLKKMPINTRRELETAVTELLFGKSMRPDYRKLIQRREAGGIGLVD', 'Botryllus_schlosseri_TRINITY_DN58171_c0_g1::TRINITY_DN58171_c0_g1_i1::g.196397::m.196397': 'LEYEIVEFLFGKGKRPEYKKLVQQEIAGGREVKDIPTITDIIFIKPAV...RTDHQLALTLGWLRERPINNSRPHTWQPRQHWAEMAKIMKEMEYKRDYI', 'Botryllus_schlosseri_TRINITY_DN61293_c1_g1::TRINITY_DN61293_c1_g1_i1::g.134659::m.134659': 'AAFVMDLDGKIEYEKIIQARESGGLELVDIPTMTDLAFVKPALRYLQR...MKRYGMRKINNAIPHVFQPLQHWQETEKTMRSLGRQQQDIKSKRRERYR', ...}, nucleotide={'Botryllus_schlosseri_TRINITY_DN10080_c0_g1::TRINITY_DN10080_c0_g1_i1::g.308497::m.308497': 'CTTAATTGTGCTTCATTTTTTAGCTCTCAAATGGCTCGATCAACGAAA...CATGGCGGAAGGCACGAAGGCCGTCGCCAAGCTCGCTGCCAGCAAATAA', 'Botryllus_schlosseri_TRINITY_DN10087_c0_g1::TRINITY_DN10087_c0_g1_i1::g.308496::m.308496': 'ATGGCGGTGGTGACACTGTTCTCGGTGGGGCATTCTAGGGAGGTGCGG...CTACCTCGACCCTCGGCTAGACCAACCATGGCCCCGCGTCGCGAAGGCC', 'Botryllus_schlosseri_TRINITY_DN1013_c0_g1::TRINITY_DN1013_c0_g1_i1::g.248880::m.248880': 'TGGGCAGTCCTGACTGGCACTATCAAGAATAACAGCAACATCCAATGC...GAAAAACTATGGGGACATTATTAACCCTGCTGATGGAATCTGTTTGACC', 'Botryllus_schlosseri_TRINITY_DN1013_c0_g2::TRINITY_DN1013_c0_g2_i1::g.248881::m.248881': 'CTGATTCTTCTCATCGGCTCTCTCTGCTTCACTCTGACCCACGGCCTT...TCTGAAGGCCTACGCGATTGCCAAGACGGCCAAAGACTCTTACAACTGG', 'Botryllus_schlosseri_TRINITY_DN10166_c0_g2::TRINITY_DN10166_c0_g2_i1::g.27418::m.27418': 'GTACATTTGAAGAAAATGTCTGAATTGATCCATCACGAAAAAGCTTTC...GGTCATCTCGAACAAAATGCACAGAACCATCGTGATCAGAAGAGACTAC', 'Botryllus_schlosseri_TRINITY_DN1017_c0_g1::TRINITY_DN1017_c0_g1_i1::g.248877::m.248877': 'TGCCATCTTCCGGCACTCATGGCCAAGATTGACGAGGTGCACCACTCG...CCAAGTTAAGACCTTGCCCACTGAGAAGCATACCCAGCCGGAGTGCTAG', 'Botryllus_schlosseri_TRINITY_DN101_c0_g1::TRINITY_DN101_c0_g1_i1::g.75736::m.75736': 'GAGCGATATCACGCTCGCTCGTTCAGCAAGCGCGAACGGCTGTATCGG...GAACCGGACCTTCGTCGATCTACCTGAGTTCATAGATGAAATGGATCCG', 'Botryllus_schlosseri_TRINITY_DN10203_c0_g1::TRINITY_DN10203_c0_g1_i1::g.234676::m.234676': 'GAAGGGCGGTTCGTGCCGCACCTGGTGGTCCGCATCGACGCCGGCACG...GGTCAACACAGATAAGGTATCGGCTACCACGGCCATCGAGGTCTTGCGC', 'Botryllus_schlosseri_TRINITY_DN10220_c0_g2::TRINITY_DN10220_c0_g2_i1::g.234678::m.234678': 'GATCATCGGTATGGACTACCCGTGGAAATCGAGATGACTACAATGGAC...TCGAGTGGTGCTAAATGCTGCTACTGATACATTTCGTGGTATACTAGAT', 'Botryllus_schlosseri_TRINITY_DN10249_c0_g1::TRINITY_DN10249_c0_g1_i1::g.234674::m.234674': 'GAAGGACTCTCGTTGATCGAACGCCGCAACTGCATGGAGCTGTTCCGA...GGATCGTCTCGACAAGGTTTACGCCAGGCACAGCGACATGGTCCTGTTG', ...}, tmp='/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862', codeml=<wgd.codeml.Codeml object>, preserve=False, times=1, min_length=100, method='fasttree', aligner='muscle', output_dir='/groups/pupko/ashermoshe/dorothee/wgd_ksd')
    298         logging.debug("Distance will be in Ks units!")
    299         clustering, pairwise_distances, tree_path = _weighting(
    300                 results_dict, msa=msa_path_protein, method="alc")
    301     else:
    302         clustering, pairwise_distances, tree_path = _weighting(
--> 303                 results_dict, msa=msa_path_protein, method=method)
        results_dict = {'Ka':                                                 ...

[94 rows x 94 columns], 'Ks':                                                 ...

[94 rows x 94 columns], 'Omega':                                                 ...

[94 rows x 94 columns]}
        msa_path_protein = '/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862/GF_000012.fasta.msa'
        method = 'fasttree'
    304     if clustering is not None:
    305         out = _calculate_weighted_ks(
    306                 clustering, results_dict, pairwise_distances, family_id
    307         )

...........................................................................
/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/ks_distribution.py in _weighting(pairwise_estimates={'Ka':                                                 ...      

[94 rows x 94 columns], 'Ks':                                                 ...

[94 rows x 94 columns], 'Omega':                                                 ...

[94 rows x 94 columns]}, msa='/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862/GF_000012.fasta.msa', method='fasttree')
     93     elif method == 'fasttree':
     94         # FastTree tree construction
     95         logging.debug('Constructing phylogenetic tree with FastTree')
     96         tree_path = run_fasttree(msa)
     97         clustering, pairwise_distances = phylogenetic_tree_to_cluster_format(
---> 98                 tree_path, pairwise_estimates['Ks'])
        tree_path = '/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862/GF_000012.fasta.msa.nw'
        pairwise_estimates = {'Ka':                                                 ...

[94 rows x 94 columns], 'Ks':                                                 ...

[94 rows x 94 columns], 'Omega':                                                 ...

[94 rows x 94 columns]}
     99
    100     else:
    101         # Average linkage clustering based on Ks
    102         logging.debug('Performing average linkage clustering on Ks values.')

...........................................................................
/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/wgd/phy.py in phylogenetic_tree_to_cluster_format(tree='/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862/GF_000012.fasta.msa.nw', pairwise_estimates=                                                ...

[94 rows x 94 columns])
    118         (only the index is used)
    119     :return: clustering data structure, pairwise distances dictionary
    120     """
    121     id_map = {
    122         pairwise_estimates.index[i]: i for i in range(len(pairwise_estimates))}
--> 123     t = Tree(tree)
        t = undefined
        tree = '/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862/GF_000012.fasta.msa.nw'
    124
    125     # midpoint rooting
    126     midpoint = t.get_midpoint_outgroup()
    127     if not midpoint:  # midpoint = None when their are only two leaves

...........................................................................
/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/ete3/coretype/tree.py in __init__(self=Tree node '' (-0x7ffff800877d65c4), newick='/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862/GF_000012.fasta.msa.nw', format=0, dist=None, support=None, name=None, quoted_node_names=False)
    206
    207         # Initialize tree
    208         if newick is not None:
    209             self._dist = 0.0
    210             read_newick(newick, root_node = self, format=format,
--> 211                         quoted_names=quoted_node_names)
        quoted_node_names = False
    212
    213
    214     def __nonzero__(self):
    215         return True

...........................................................................
/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/ete3/parser/newick.py in read_newick(newick='/groups/pupko/ashermoshe/dorothee/ks_tmp.371cd0d9072862/GF_000012.fasta.msa.nw', root_node=Tree node '' (-0x7ffff800877d65c4), format=0, quoted_names=False)
    244         nw = nw.strip()
    245         if not nw.startswith('(') and nw.endswith(';'):
    246             #return _read_node_data(nw[:-1], root_node, "single", matcher, format)
    247             return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
    248         elif not nw.startswith('(') or not nw.endswith(';'):
--> 249             raise NewickError('Unexisting tree file or Malformed newick tree structure.')
    250         else:
    251             return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
    252
    253     else:

NewickError: Unexisting tree file or Malformed newick tree structure.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.
___________________________________________________________________________
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/site-packages/joblib/parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/share/apps/anaconda3-5.1.0/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
joblib.my_exceptions.TransportableException: TransportableException
___________________________________________________________________________
NewickError                                        Wed Feb 27 18:10:41 2019
PID: 62472             Python 3.6.8: /share/apps/anaconda3-5.1.0/bin/python
...........................................................................

Ks distribution construction error

Hi,
I use the wgd mcl sub command get the GENE_FAMILIES file and then ksd analysis like this:
./software/wgd_venv/bin/wgd ksd --n_threads 2 --wm phyml -p ALATA.PEP.fa ALATA.PEP.fa.blast.tsv.mcl
and get error:

2018-11-29 15:59:57: INFO
2018-11-29 15:59:57: INFO codeml found
2018-11-29 15:59:57: INFO MUSCLE v3.8.31 by Robert C. Edgar
2018-11-29 15:59:57: INFO . command-line: phyml --version
. This is PhyML version 20160207.
2018-11-29 15:59:57: ERROR No gene families or no sequences provided.

why?

codeml stalling with default tmp folder name

Hi,

Running the "wgd ks" under default temporary folder name causes codeml to fail to find the .ctrl file.
Then it never completes the process.
Probably is some character used in the folder name.
Specifying a simple name with "-tmp" everything runs smoothly.

Cheers,
Ricardo

wgd ksd python error

hello,I had a problem using the software,The first step is to generate mcl It's normal,but when run
wgd ksd sample.blast.tsv.mcl genome.cds.fa -n 60
It reported a mistake,Here is the error log:

/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=60), iterable=<generator object ks_analysis_paranome.<locals>.<genexpr>>)
    784             if pre_dispatch == "all" or n_jobs == 1:
    785                 # The iterable was consumed all at once by the above for loop.
    786                 # No need to wait for async callbacks to trigger to
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=60)>
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time
    792             self._print('Done %3i out of %3i | elapsed: %s finished',
    793                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
IndexError                                         Thu Jan 28 17:52:58 2021
PID: 91670   Python 3.6.10: /home/zhangxc/miniconda3/envs/wgd/bin/python3.6
...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function analyse_family>, ('GF_000240', {'evm.model.CTG_1185.7': 'MMKNSAVVRNVPHSWNMSAWGGNLILNNPEKHEVIFLYKCLYWLGLLK...IRPERGPFDKPEFEFTACGILNYKRSNMLAVIGALLTYTIIVFNSKDSS', 'evm.model.CTG_1357.17': 'MDFLLCFYKFIGLVDKSERNRPVNIGSLMFLIYLVLLFLDNVMMVYMN...NPVVSTTTTYTFEDTKVATTTAAPITKAPSPALTTAAAKKGKNRGNKKQ', 'evm.model.CTG_1405.53': 'MQPGSVTSYSGFITVNKEYNSNTFFWFFPAMNDNKKAPIILWLQGGPG...DKTDVAGYVRNYGNLFGILVRNAGHMVPHDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_246.62': 'MHINTLQMDNTKSWICKTDFLISFLKLAGLFENSGTNQCTNIGFKIFL...ERLEMLLCEKTDMVLTGGNVIHFRRSLILTFLGTVLTYTFLLMNTNCTK', 'evm.model.CTG_247.55_evm.model.CTG_247.56_evm.model.CTG_247.57': 'MTFLKSFIFVLSCLSITFATHPDEVQDLPGLSFGLNYKHYSGYLNATS...KFNDQIAGFVKSYKNLSYLTIKGSGHMVPQDKPGPAFKMIDSFLNNKPY', 'evm.model.CTG_450.26': 'MNSTFVVLLCSLLALAIVSVSSGPVEKKEAWGYVNVRKDAYMFWWLYY...ASSSTQTGGYVKSFKQFRLFWILKAGHMIPADAPEAALQMLDMILSKKK', 'evm.model.CTG_482.10': 'MVRKAIHVGNLAFNDISETANHYLKEDPVVSTSTAQVATLANNYKVLI...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFVKDIPYA', 'evm.model.CTG_482.8': 'MVLLFTILCVVLIKLAVAEDMSSPTGKPLFLTPYIENGDVQKGRLLSR...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_591.1': 'MHPILNQLVTLLFFRLCHQCCENLRKITNDIHNCPVLNFTHSVQIEMF...EPKEHSYYIDYVSTEMVRKAIHVGNLAFNDLSETANHYLKVDPVVSTST'}, {'evm.model.CTG_1.1': 'ATGAAGGGTGGGGGTACTGAGGGGATGCGCGACGTGCGCGTGGACTTG...ACCACTTTTATCAGCGTTATTATTTGCAAAAATCTTGTTATCCATCTAA', 'evm.model.CTG_1.100': 'ATGCCTGCTAGTCTGATAAGCGAAATGGATCGGAGTAGTCGGGTGTTG...CGTGTACTGCATGCAACCGGCCGCCCAGGCCATCCGCATCTACAACTAG', 'evm.model.CTG_1.102': 'ATGGGGAATAAGGTGGTTACGTTTACTGAAGAACAGCTAGAAGACTAT...CTACGCGTTCAGGGTGTACGATTACGATGGGACAAATTCATCGCGCTGA', 'evm.model.CTG_1.103': 'ATGGATGCAAAGGAAATTAAATCGAATCTTAAGCAAGCTAGAGAAGCT...AATGCATGTAAAGACTGATGAACGTGAGAAATGGACTACAAGTTGGTGA', 'evm.model.CTG_1.104': 'ATGTATCGCCAACATTTAAAAATGGTTAATAAAGAGTTGGAAAGAAAT...TCTTGTATTATTTAATGTAACTCAAAACCAAAGTATAGCTGTAACGTAA', 'evm.model.CTG_1.105': 'ATGATGATAGAGGTGCATGCAAGATTACCCCATGGCTCGGTTTTCCTT...AGTGGCTCATGGCTTACCTTCTAAACTGGATCATATGATAACAATTTAA', 'evm.model.CTG_1.106': 'ATGTCTTACATGTTGGAGCACCTTCACAACGGATGGCAAGTCGACCGA...GGGTTTGGTTATTTCTCCCAAAGATTATTCCACAAAATACAGATATTAA', 'evm.model.CTG_1.107': 'ATGGGAGCTCTTGGATCAAAACGAAGGGAATACAATATCAATAGTGAT...GTGTGTCAATCTTGGAGAAACCATACTTAAGAGTCAGATTGATTCTTAA', 'evm.model.CTG_1.108': 'ATGACTCGATTAAGAAACGATTTTTTCTCCCTACTCATCTTTGGATTA...ACATAGAGACCAAAAAGGGAAAGGACAAGACGTTAGGCAAAGAAAATGA', 'evm.model.CTG_1.109': 'ATGAAACTGGAACAAGAGAGAGAAAAATGTAACACATTTAAAAAGGAT...TGGGGAAAACGAACTACAAGAGGAGGATTTTGAATTGAATATTGTTTAA', ...}, '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac', 'codeml', True, 1, 100, 'fasttree', 'mafft', '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/wgd_ksd'), {})]
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function analyse_family>
        args = ('GF_000240', {'evm.model.CTG_1185.7': 'MMKNSAVVRNVPHSWNMSAWGGNLILNNPEKHEVIFLYKCLYWLGLLK...IRPERGPFDKPEFEFTACGILNYKRSNMLAVIGALLTYTIIVFNSKDSS', 'evm.model.CTG_1357.17': 'MDFLLCFYKFIGLVDKSERNRPVNIGSLMFLIYLVLLFLDNVMMVYMN...NPVVSTTTTYTFEDTKVATTTAAPITKAPSPALTTAAAKKGKNRGNKKQ', 'evm.model.CTG_1405.53': 'MQPGSVTSYSGFITVNKEYNSNTFFWFFPAMNDNKKAPIILWLQGGPG...DKTDVAGYVRNYGNLFGILVRNAGHMVPHDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_246.62': 'MHINTLQMDNTKSWICKTDFLISFLKLAGLFENSGTNQCTNIGFKIFL...ERLEMLLCEKTDMVLTGGNVIHFRRSLILTFLGTVLTYTFLLMNTNCTK', 'evm.model.CTG_247.55_evm.model.CTG_247.56_evm.model.CTG_247.57': 'MTFLKSFIFVLSCLSITFATHPDEVQDLPGLSFGLNYKHYSGYLNATS...KFNDQIAGFVKSYKNLSYLTIKGSGHMVPQDKPGPAFKMIDSFLNNKPY', 'evm.model.CTG_450.26': 'MNSTFVVLLCSLLALAIVSVSSGPVEKKEAWGYVNVRKDAYMFWWLYY...ASSSTQTGGYVKSFKQFRLFWILKAGHMIPADAPEAALQMLDMILSKKK', 'evm.model.CTG_482.10': 'MVRKAIHVGNLAFNDISETANHYLKEDPVVSTSTAQVATLANNYKVLI...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFVKDIPYA', 'evm.model.CTG_482.8': 'MVLLFTILCVVLIKLAVAEDMSSPTGKPLFLTPYIENGDVQKGRLLSR...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_591.1': 'MHPILNQLVTLLFFRLCHQCCENLRKITNDIHNCPVLNFTHSVQIEMF...EPKEHSYYIDYVSTEMVRKAIHVGNLAFNDLSETANHYLKVDPVVSTST'}, {'evm.model.CTG_1.1': 'ATGAAGGGTGGGGGTACTGAGGGGATGCGCGACGTGCGCGTGGACTTG...ACCACTTTTATCAGCGTTATTATTTGCAAAAATCTTGTTATCCATCTAA', 'evm.model.CTG_1.100': 'ATGCCTGCTAGTCTGATAAGCGAAATGGATCGGAGTAGTCGGGTGTTG...CGTGTACTGCATGCAACCGGCCGCCCAGGCCATCCGCATCTACAACTAG', 'evm.model.CTG_1.102': 'ATGGGGAATAAGGTGGTTACGTTTACTGAAGAACAGCTAGAAGACTAT...CTACGCGTTCAGGGTGTACGATTACGATGGGACAAATTCATCGCGCTGA', 'evm.model.CTG_1.103': 'ATGGATGCAAAGGAAATTAAATCGAATCTTAAGCAAGCTAGAGAAGCT...AATGCATGTAAAGACTGATGAACGTGAGAAATGGACTACAAGTTGGTGA', 'evm.model.CTG_1.104': 'ATGTATCGCCAACATTTAAAAATGGTTAATAAAGAGTTGGAAAGAAAT...TCTTGTATTATTTAATGTAACTCAAAACCAAAGTATAGCTGTAACGTAA', 'evm.model.CTG_1.105': 'ATGATGATAGAGGTGCATGCAAGATTACCCCATGGCTCGGTTTTCCTT...AGTGGCTCATGGCTTACCTTCTAAACTGGATCATATGATAACAATTTAA', 'evm.model.CTG_1.106': 'ATGTCTTACATGTTGGAGCACCTTCACAACGGATGGCAAGTCGACCGA...GGGTTTGGTTATTTCTCCCAAAGATTATTCCACAAAATACAGATATTAA', 'evm.model.CTG_1.107': 'ATGGGAGCTCTTGGATCAAAACGAAGGGAATACAATATCAATAGTGAT...GTGTGTCAATCTTGGAGAAACCATACTTAAGAGTCAGATTGATTCTTAA', 'evm.model.CTG_1.108': 'ATGACTCGATTAAGAAACGATTTTTTCTCCCTACTCATCTTTGGATTA...ACATAGAGACCAAAAAGGGAAAGGACAAGACGTTAGGCAAAGAAAATGA', 'evm.model.CTG_1.109': 'ATGAAACTGGAACAAGAGAGAGAAAAATGTAACACATTTAAAAAGGAT...TGGGGAAAACGAACTACAAGAGGAGGATTTTGAATTGAATATTGTTTAA', ...}, '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac', 'codeml', True, 1, 100, 'fasttree', 'mafft', '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/wgd_ksd')
        kwargs = {}
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/wgd/ks_distribution.py in analyse_family(family_id='GF_000240', family={'evm.model.CTG_1185.7': 'MMKNSAVVRNVPHSWNMSAWGGNLILNNPEKHEVIFLYKCLYWLGLLK...IRPERGPFDKPEFEFTACGILNYKRSNMLAVIGALLTYTIIVFNSKDSS', 'evm.model.CTG_1357.17': 'MDFLLCFYKFIGLVDKSERNRPVNIGSLMFLIYLVLLFLDNVMMVYMN...NPVVSTTTTYTFEDTKVATTTAAPITKAPSPALTTAAAKKGKNRGNKKQ', 'evm.model.CTG_1405.53': 'MQPGSVTSYSGFITVNKEYNSNTFFWFFPAMNDNKKAPIILWLQGGPG...DKTDVAGYVRNYGNLFGILVRNAGHMVPHDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_246.62': 'MHINTLQMDNTKSWICKTDFLISFLKLAGLFENSGTNQCTNIGFKIFL...ERLEMLLCEKTDMVLTGGNVIHFRRSLILTFLGTVLTYTFLLMNTNCTK', 'evm.model.CTG_247.55_evm.model.CTG_247.56_evm.model.CTG_247.57': 'MTFLKSFIFVLSCLSITFATHPDEVQDLPGLSFGLNYKHYSGYLNATS...KFNDQIAGFVKSYKNLSYLTIKGSGHMVPQDKPGPAFKMIDSFLNNKPY', 'evm.model.CTG_450.26': 'MNSTFVVLLCSLLALAIVSVSSGPVEKKEAWGYVNVRKDAYMFWWLYY...ASSSTQTGGYVKSFKQFRLFWILKAGHMIPADAPEAALQMLDMILSKKK', 'evm.model.CTG_482.10': 'MVRKAIHVGNLAFNDISETANHYLKEDPVVSTSTAQVATLANNYKVLI...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFVKDIPYA', 'evm.model.CTG_482.8': 'MVLLFTILCVVLIKLAVAEDMSSPTGKPLFLTPYIENGDVQKGRLLSR...DKNDVAGYVRNYGNLFGILVRNAGHLVPYDQPEAALDLITRFIKDIPYA', 'evm.model.CTG_591.1': 'MHPILNQLVTLLFFRLCHQCCENLRKITNDIHNCPVLNFTHSVQIEMF...EPKEHSYYIDYVSTEMVRKAIHVGNLAFNDLSETANHYLKVDPVVSTST'}, nucleotide={'evm.model.CTG_1.1': 'ATGAAGGGTGGGGGTACTGAGGGGATGCGCGACGTGCGCGTGGACTTG...ACCACTTTTATCAGCGTTATTATTTGCAAAAATCTTGTTATCCATCTAA', 'evm.model.CTG_1.100': 'ATGCCTGCTAGTCTGATAAGCGAAATGGATCGGAGTAGTCGGGTGTTG...CGTGTACTGCATGCAACCGGCCGCCCAGGCCATCCGCATCTACAACTAG', 'evm.model.CTG_1.102': 'ATGGGGAATAAGGTGGTTACGTTTACTGAAGAACAGCTAGAAGACTAT...CTACGCGTTCAGGGTGTACGATTACGATGGGACAAATTCATCGCGCTGA', 'evm.model.CTG_1.103': 'ATGGATGCAAAGGAAATTAAATCGAATCTTAAGCAAGCTAGAGAAGCT...AATGCATGTAAAGACTGATGAACGTGAGAAATGGACTACAAGTTGGTGA', 'evm.model.CTG_1.104': 'ATGTATCGCCAACATTTAAAAATGGTTAATAAAGAGTTGGAAAGAAAT...TCTTGTATTATTTAATGTAACTCAAAACCAAAGTATAGCTGTAACGTAA', 'evm.model.CTG_1.105': 'ATGATGATAGAGGTGCATGCAAGATTACCCCATGGCTCGGTTTTCCTT...AGTGGCTCATGGCTTACCTTCTAAACTGGATCATATGATAACAATTTAA', 'evm.model.CTG_1.106': 'ATGTCTTACATGTTGGAGCACCTTCACAACGGATGGCAAGTCGACCGA...GGGTTTGGTTATTTCTCCCAAAGATTATTCCACAAAATACAGATATTAA', 'evm.model.CTG_1.107': 'ATGGGAGCTCTTGGATCAAAACGAAGGGAATACAATATCAATAGTGAT...GTGTGTCAATCTTGGAGAAACCATACTTAAGAGTCAGATTGATTCTTAA', 'evm.model.CTG_1.108': 'ATGACTCGATTAAGAAACGATTTTTTCTCCCTACTCATCTTTGGATTA...ACATAGAGACCAAAAAGGGAAAGGACAAGACGTTAGGCAAAGAAAATGA', 'evm.model.CTG_1.109': 'ATGAAACTGGAACAAGAGAGAGAAAAATGTAACACATTTAAAAAGGAT...TGGGGAAAACGAACTACAAGAGGAGGATTTTGAATTGAATATTGTTTAA', ...}, tmp='/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac', codeml=<wgd.codeml.Codeml object>, preserve=True, times=1, min_length=100, method='fasttree', aligner='mafft', output_dir='/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/wgd_ksd', **kwargs={})
    284
    285     # Calculate Ks values (codeml) ---------------------------------------------
    286     codeml = Codeml(codeml=codeml, tmp=tmp, id=family_id, **kwargs)
    287     logging.debug('Performing codeml analysis on {}'.format(family_id))
    288     results_dict, codeml_out = codeml.run_codeml(
--> 289             os.path.basename(msa_path), preserve=preserve, times=times)
        msa_path = '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.fasta.msa.nuc'
        preserve = True
        times = 1
    290     if not results_dict:
    291         logging.warning('No codeml results for {}!'.format(family_id))
    292         return
    293

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/wgd/codeml.py in run_codeml(self=<wgd.codeml.Codeml object>, msa='GF_000240.fasta.msa.nuc', raw=False, preserve=True, times=1)
    311             if not os.path.isfile(self.out_file):
    312                 logging.warning(
    313                         'Codeml output file {} not found'.format(self.out_file))
    314                 return None
    315
--> 316             d, likelihood = _parse_codeml_out(self.out_file)
        d = undefined
        likelihood = undefined
        self.out_file = '/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.codeml'
    317             output.append(d)
    318             if not best or likelihood > best:
    319                 best = likelihood
    320                 best_index = i

...........................................................................
/home/zhangxc/miniconda3/envs/wgd/lib/python3.6/site-packages/wgd/codeml.py in _parse_codeml_out(codeml_out='/data/zhangxc/biosoft/wgd/Genome/Zhuby/wgd_out/wgd_mcl/ks_tmp.39437c46c24bac/GF_000240.codeml')
     63     likelihood = re.compile('lnL\s*=(\D*\d+.\d+)')
     64
     65     # read codeml output file
     66     with open(codeml_out, 'r') as f:
     67         fcont = f.read()
---> 68     n = int(fcont.split("\n")[3].split()[2].strip())
        n = undefined
        fcont.split.split.strip = undefined
     69     codeml_results = fcont.split('pairwise comparison')[-1].split("\n\n\n")[1:]
     70     if len(codeml_results) != n*(n-1)/2:
     71         logging.error("Not all gene pairs present in {}".format(codeml_out))
     72         return None, None

IndexError: list index out of range

look forward to your reply!!

cannot find testdata

Hey,
I just started using wgd and wanted to run it with the testdata, but I couldn't find the data
I used:
wget ftp://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_04//Fasta/cds.ath.fasta.gz
and
wget ftp://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_04//GFF/ath/Arabidopsis_thaliana.COL0.Araport11.longest_transcript.all_features.gff3.gz

but for both I got the issue "No such file"
Are the files renamed or moved to another place maybe?
Thanks for any help
regards

wgd ksd never finishes

Hi,

I am having a bit of trouble with the wgd ksd step.

The program can run smoothly and produces a number of files in the tmp folder (incl. fasta, msa, and Ks files), but at some point in time, it just freezes and nothing happened after that. No info was given from the command line interface.
I have let my current program run for 4 days, but still no changes.
Here is the screenshot of the last INFO given by the program

image

I tried running a smaller subset of the data (1000 random ones, similar to the supplemental info in the paper), and the program has no problem giving the output.

Do you know what went wrong? Could it be due to the size of the data or other problems?
FYI, the CDS fasta file is ~30 Mb, and the mcl file is ~369 kb.

mcl error about Not all gene pairs present

hello, I run in test data , it works well. but when I run in my data , I get an error at mcl step like that,could you give some advise. Thank you very much.

99% (55851 of 55927) |################# | Elapsed Time: 0:00:29 ETA:   0:00:00
100% (55927 of 55927) |##################| Elapsed Time: 0:00:29 Time:  0:00:29
2020-10-25 03:30:40: WARNING	There were 1 warnings during translation
2020-10-25 03:30:40: INFO	Started whole paranome Ks analysis
2020-10-25 03:30:40: WARNING	Filtered out the 1 largest gene families because n*(n-1)/2 > `max_pairwise`
2020-10-25 03:30:40: WARNING	If you want to analyse these large families anyhow, please raise the `max_pairwise` parameter. 
2020-10-25 03:30:40: INFO	Started analysis in parallel (n_threads = 8)
2020-10-25 03:30:41: INFO	Performing analysis on gene family GF_000002
2020-10-25 03:30:42: INFO	Performing analysis on gene family GF_000003
2020-10-25 03:30:43: INFO	Performing analysis on gene family GF_000004
2020-10-25 03:30:43: INFO	Performing analysis on gene family GF_000005
2020-10-25 03:30:44: INFO	Performing analysis on gene family GF_000006
2020-10-25 03:30:45: INFO	Performing analysis on gene family GF_000007
2020-10-25 03:30:45: INFO	Performing analysis on gene family GF_000008
2020-10-25 03:30:46: INFO	Performing analysis on gene family GF_000009
2020-10-25 03:31:30: ERROR	Not all gene pairs present in /ds3512/home/panyp/NN1138-2/04.WGD_data/ks_tmp.38f85d40a97f8c/GF_000003.codeml
2020-10-25 03:31:30: WARNING	No codeml results for GF_000003!
2020-10-25 03:31:30: INFO	Performing analysis on gene family GF_000010
2020-10-25 03:31:58: ERROR	Not all gene pairs present in /ds3512/home/panyp/NN1138-2/04.WGD_data/ks_tmp.38f85d40a97f8c/GF_000004.codeml
2020-10-25 03:31:58: WARNING	No codeml results for GF_000004!
2020-10-25 03:32:00: ERROR	Not all gene pairs present in /ds3512/home/panyp/NN1138-2/04.WGD_data/ks_tmp.38f85d40a97f8c/GF_000010.codeml

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.