benoitmorel / generax Goto Github PK

License: GNU Affero General Public License v3.0

CMake 4.64% C++ 74.18% Shell 0.58% Python 20.61%

generax's Introduction

News:

03.2021: GeneRax 2 is released, allowing rooted species tree inference from gene trees (I am still working on updating the bioconda package)
11.2020: we just created a new GeneRax google groups for asking questions and reporting issues: https://groups.google.com/g/generaxusers

GeneRax

GeneRax is a parallel tool for species tree-aware maximum likelihood based gene family tree inference under gene duplication, transfer, and loss.

It infers gene family trees from their aligned sequences, the mapping between genes and species, and a rooted undated species tree. In addition, it infers the duplication, transfer and loss events that best (in terms of maximum likelihood) reconcile the gene family trees with the species trees.

It accounts for sequence substitutions, gene duplication, gene loss and horizontal gene transfer.

When using GeneRax, please cite: https://academic.oup.com/mbe/article/doi/10.1093/molbev/msaa141/5851843

GeneRax is also available on bioconda

SpeciesRax

SpeciesRax is part of the GeneRax tool and is available since GeneRax v2.0.0. SpeciesRax infers a rooted species tree from a set of unrooted gene trees. When using SpeciesRax, please cite https://academic.oup.com/mbe/article/39/2/msab365/6503503

Requirement

(If you are not installing with bioconda)

A Linux or MacOS environnement
gcc 5.0 or >
CMake 3.6 or >
MPI

Installation

(Please note that you can also install through bioconda)

To download GeneRax, please use git, and clone with --recursive!!!

git clone --recursive https://github.com/BenoitMorel/GeneRax

To build the sources:

./install.sh

Running

See the wiki (https://github.com/BenoitMorel/GeneRax/wiki)

Issues and questions

For questions, issues or feedback, please post on our google group: https://groups.google.com/g/generaxusers When reporting an issue, please send us at least the command line you ran, the logs file and the families file. The more information we get, the quicker we can solve the problems :-)

generax's People

Contributors

Stargazers

Watchers

Forkers

csu-anzai csu-xiao-an oborniklep davidgoldlab abdo3a karenvn klarael harmsm wook2014 xiaojun928 xingxingshen

generax's Issues

Support more gene-species mapping formats

We currently support the same format as Phyldog:

species1:gene1;gene2
species2:gene3;gene4;gene5

We should also support (as in Treerecs):

gene1 species1
gene2 species1
gene3 species2
gene4 species2
gene5 species2

And also a mapping based on the gene and species names (in this format, gene1 should be named species1_gene1) as in Astral.

The format should be auto-detected.

JointSearch: crash whith random trees

Because we try to read "random" as a file in GeneSpeciesMapping

tip label related assertion fails

Hi Benoit,

I'm getting this error when running GeneRax on an alignment dataset (normal mode, not reconciliation sampling):

generax: /home/florent/software/GeneRax/src/core/trees/PLLTreeInfo.cpp:119: pll_partition_t* PLLTreeInfo::buildPartition(const PLLSequencePtrs&, unsigned int*): Assertion `tipsLabelling.find(leaf->label) != tipsLabelling.end()' failed.

I assume this come from invalid characters in the sequence labels, or an invalid format, but I don't know what to do to fix it.
Here is a sample of the sequence header in the input fasta alignment files:

>BRADIA5_USDA110__1
>BRADYR67_ORS285__1
>BRADYR68_ORS3257__1
>BRAELK4_USDA76__1
>BRADYR20_th.b2__1
>BRADYR25_EC3.3__1
>BRAELK12_BLY3-8__1
>BRAELK11_BLY6-1__1

and the species tree is of this form:

(BRAYUA4:6.94934184835e-06,BRAYUA1:4.22488966695e-05)clade78:0.00738623926217

I tried to substitute the . dot characters with - hyphen characters in the sequence names but it did not help. I also ran it with simpler sequence labels like:

>BRADYR25_1
>BRAELK12_1
>BRAELK11_1

but I get the same error message.

Would you have any idea on what is the problem and guidance on how to fix it?

Below is the log of error i get (it is headed with some issue with the MPI interface I am not sure that works either). I used a simplified dataset with just one family and with the minimal sequence headings (SPECIES_1) in the fasta alignment input:

beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
--------------------------------------------------------------------------
[[19240,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: pantagrueltestb

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
[00:00:00] GeneRax v1.1.0
Logs will also be printed into /pantagruel_databases/Brady2019/albingenes/fullgenetree_GeneRax_recs/generax_fullgenetree_pointestimate_1.test/generax.log
GeneRax was called as follow:
generax -r UndatedDTL --max-spr-radius 5 --strategy SPR -s /pantagruel_databases/Brady2019/05.core_genome/core-genome-based_reference_tree_Brady2019.full_clade_defs.nwk -f /pantagruel_databases/Brady2019/albingenes/fullgenetree_GeneRax_recs/generax_fullgenetree_pointestimate_1.test_generax.families -p /pantagruel_databases/Brady2019/albingenes/fullgenetree_GeneRax_recs/generax_fullgenetree_pointestimate_1.test

Parameters summary:
Families information: /pantagruel_databases/Brady2019/albingenes/fullgenetree_GeneRax_recs/generax_fullgenetree_pointestimate_1.test_generax.families
Species tree: /pantagruel_databases/Brady2019/05.core_genome/core-genome-based_reference_tree_Brady2019.full_clade_defs.nwk
Species Strategy: SPR
Strategy: SPR
Reconciliation model: UndatedDTL
Reconciliation opt: grid
DTL rates: global rates
Prefix: /pantagruel_databases/Brady2019/albingenes/fullgenetree_GeneRax_recs/generax_fullgenetree_pointestimate_1.test
Unrooted reconciliation likelihood: OFF
Prune species tree: OFF
Reconciliation radius: 0
MPI Ranks: 1
Max gene SPR radius: 5
Gene support threshold: -1
Reconciliation likelihood weight: 1
Random seed: 123
Infer ML reconciliation: OFF

[00:00:00] Filtering invalid families...

[00:00:00] Filtering invalid families based on the starting species tree...


[00:00:00] [Initialization] Initial optimization of the starting random gene trees
[00:00:00] [Initialization] All the families will first be optimized with sequences only
generax: /home/florent/software/GeneRax/src/core/trees/PLLTreeInfo.cpp:119: pll_partition_t* PLLTreeInfo::buildPartition(const PLLSequencePtrs&, unsigned int*): Assertion `tipsLabelling.find(leaf->label) != tipsLabelling.end()' failed.
Aborted (core dumped)
[00:00:00] [Initialization] Finished optimizing some of the gene trees

[00:00:00] Gathering statistics about the families...
terminate called after throwing an instance of 'LibpllException'
  what():  Could not load open newick file /pantagruel_databases/Brady2019/albingenes/fullgenetree_GeneRax_recs/generax_fullgenetree_pointestimate_1.test/results/brady_nodA/geneTree.newick
[pantagrueltestb:02033] *** Process received signal ***
[pantagrueltestb:02033] Signal: Aborted (6)
[pantagrueltestb:02033] Signal code:  (-6)
[pantagrueltestb:02033] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f6b000be890]
[pantagrueltestb:02033] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f6affcf9e97]
[pantagrueltestb:02033] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f6affcfb801]
[pantagrueltestb:02033] [ 3] /home/linuxbrew/.linuxbrew/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x7f6b006f75cd]
[pantagrueltestb:02033] [ 4] /home/linuxbrew/.linuxbrew/lib/libstdc++.so.6(+0x8c636)[0x7f6b006f5636]
[pantagrueltestb:02033] [ 5] /home/linuxbrew/.linuxbrew/lib/libstdc++.so.6(+0x8c681)[0x7f6b006f5681]
[pantagrueltestb:02033] [ 6] /home/linuxbrew/.linuxbrew/lib/libstdc++.so.6(+0x8c898)[0x7f6b006f5898]
[pantagrueltestb:02033] [ 7] generax(_ZN13LibpllParsers18readNewickFromFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x802)[0x46a382]
[pantagrueltestb:02033] [ 8] generax(_ZN13LibpllParsers20parallelGetTreeSizesERKSt6vectorI10FamilyInfoSaIS1_EE+0xb7)[0x46b047]
[pantagrueltestb:02033] [ 9] generax(_ZN16PerCoreGeneTreesC2ERKSt6vectorI10FamilyInfoSaIS1_EE+0x60)[0x49d9b0]
[pantagrueltestb:02033] [10] generax(_ZN6Family10printStatsERSt6vectorI10FamilyInfoSaIS1_EERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_+0x1b9)[0x45af99]
[pantagrueltestb:02033] [11] generax(_ZN11GeneRaxCore10printStatsER15GeneRaxInstance+0x348)[0x43f8c8]
[pantagrueltestb:02033] [12] generax(_Z12generax_mainiPPcPv+0x2ea)[0x43b23a]
[pantagrueltestb:02033] [13] generax(_Z13internal_mainiPPcPv+0x2b)[0x43bf1b]
[pantagrueltestb:02033] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f6affcdcb97]
[pantagrueltestb:02033] [15] generax(_start+0x2a)[0x43ae6a]
[pantagrueltestb:02033] *** End of error message ***

GeneRax is compiled from the last source code on a Ubuntu 18.04; GNU C/CXX compilers v5.5.0; Flex v2.6.4; Bison v3.0.4.

Thanks for your help!

Recommended reconciliation model

Hi Benoit,

Firstly, thanks for a great tool, really helpful and easy to use!

Secondly, I was idly perusing the wiki and saw (under SpeciesRax), that you recommend the use of UndatedDTL over UndatedDL even in cases where there is no horizontal transfer expected. I was wondering if this applies only to the (to-be-released) species tree inference, or if it is also recommended when inferring reconciliations on a previously inferred species tree? We are working with a plant group where we expect no transfer events, so have been using UndatedDL until now.

Thanks in advance for your help!

Nat

What is the --reconciliation-samples parameter doing?

Hi Benoit,

Sorry for yet another simple question. This one is pretty explanatory. I've read through the paper and the documentation again and can't find reference to the parameter. I've done runs with and without, and found differences in _events.nwk, and seen the newly generated files when running with --reconciliation-samples 100. However, I didn't fix the seed, so couldn't say if differences in the reconciliation were truly due to that difference. Is it sampling the likelihood space of the reconciliations?

Thanks very much!

Nat

Add explicit error messages in the following cases:

the species tree is unrooted
the substitution model does not match the sequences (DNA vs PROTEINS)

new visualizations

I found a visualization software which is based on recphyloxml.
But it is more usable because recommanded wesite can't not be downloaded as pdf.

here is its link.
https://github.com/UdeS-CoBIUS/DoubleRecViz

Also, it based on recphyloxml but I found a little difference between your generated xml and his.
That just a suggestions for you to write a convertor or not

using GeneRax on symbiont-host trees with multiple symbionts per host

Dear All

I wanted to ask if it is possible to use GeneRax on symbiont-host trees where we have >1 symbiont per host sample.
In a recent paper, Satler et al 2019 showed that if there are multiple symbionts per host then its important to generate sister host tips for each symbiont per host so that there are equal no. of tips in both trees (line "shared wasp tips into two sister tips for all fig wasp species" in the paper). I wanted to confirm this step. Would GeneRax overestimate DTL events if symbiont tree tips are more than host tree tips?

Looking forward to your reply!

nhx output is missing

Thank you for developing this useful program. I've recently updated the version to v1.1.0 using bioconda and found that this version does not generate .nhx output which was available in v1.0.0. Is it intended behavior? Because the node annotations in nhx were quite informative, I would like to use it in my downstream analysis. Input files are attached.
data.zip

Do we need to consider the whole species tree?

Try restricting the species tree to the subtree under the LCA of all the species leaves mapped to a tree in a gene tree: if a gene family is found in mammals and not in fishes, and if the input species tree includes fishes and mammal, we can get ignore the fishes part.

This might make the computations faster when the input species tree is larger than needed.

A step in the search sometimes leads to a worse likelihood

To reproduce, on phobos;
python scripts/generax/launch_generax.py jsimdtl_s5_f200_sites200_dna4_bl1.0_d0.1_l0.2_t0.1 SPR raxml-ng normald 40 --run generax-hoho --rec-model UndatedDTL

Likelihoods: joint = -248784, libpll = -247813, rec = -971.106
Likelihoods: joint = -248824, libpll = -247810, rec = -1013.84

This might be an issue with the rates optimization.

Add unit tests in travis

I already have some "obvious" trees for which RAxML finds the wrong tree but JointSearch always finds the true tree. Use these trees to build unit tests.

Check several options (all reconciliation models, rooted vs unrooted, hardcoded rates etc.)
Check proteins and DNA.
Check all supported file formats (MSA, mapping etc.)

Check gcc and clang compilation.

Check openMPI and mpich implementations.

Availability of Aleml-like (.rec file) output in newick format?

Hey all!

I am interested in examining some of the main transfer events in my species tree-gene tree reconciliation, but both .xml and .nhx gives me all the transfer events. I was wondering if it's possible to extract out the reconciliation in Newick format similar to aleml's .rec file that contains the DTL events on the gene tree and species tree?

Assertion fail in reconciliation sampling

Can I use Reference strain as a rooting strain for a SNP based species tree?

I have generated a phylogenetic tree based on whole-genome SNP, wherein I used Reference strain as an outgroup to root my species tree. The gene family tree also includes the reference strain. Is it the correct way to carry out the GeneRax analysis or should I have to include outgroup organism other than reference strain?.

Account for incompleteness in the reconciliation model

Implement the fraction of missing genes, as done in ALE. Use the same file format to make users life easier.

Compilation error

Dear Benoît,

Our institute deployed a new server and shut down the previous one, so I had to recompile generax. I tried the conda version but our new system is using AMD so I am not sure whether the conda version is compiled under AVX or not, and also, I have the conda version seems to stall in the process after the step:
[00:00:50] [Initialization] All the families will first be optimized with sequences only
nothing more is printed. Is there a way to know what is the progress of the job.

So, I tried to compile directly and I encounter the error below. What do you think?

[ 45%] Building C object ext/pll-modules/libs/libpll/src/CMakeFiles/pll_obj.dir/core_derivatives_sse.c.o
Scanning dependencies of target mpi-scheduler
[ 46%] Building C object ext/pll-modules/libs/libpll/src/CMakeFiles/pll_obj.dir/core_likelihood_sse.c.o
[ 47%] Building CXX object ext/MPIScheduler/src/CMakeFiles/mpi-scheduler.dir/main.cpp.o
c++: fatal error: no input files
compilation terminated.
/bin/sh: -pthread: command not found
make[2]: *** [ext/MPIScheduler/src/CMakeFiles/mpi-scheduler.dir/build.make:63: ext/MPIScheduler/src/CMakeFiles/mpi-scheduler.dir/main.cpp.o] Error 127
make[1]: *** [CMakeFiles/Makefile2:861: ext/MPIScheduler/src/CMakeFiles/mpi-scheduler.dir/all] Error 2

Thanks!

Try using species BL to forbid some transfers

Try this hack: for each species branch, compute all the contemporary species branch (that share a time slice). Then only allow transfers from and to these branches.
Cons: this might be much more expansive than the other models, and would require an ultrametric species tree.

Clarification on output rates and weird transfer-less scenarios

Hi Benoit,

I have been successfully running GeneRax on a few gene families with protein alignment and a species tree as input, using per family rates and the undated DTL model, but I am having trouble with interpreting the output and assessing whether it is sound.

First, I have issues identifying what is what in the output due to the absence of headings in the files like the report of per-family DTL rates in results/*/stats.txt files: I don't know which is D, T or L.
Same with the reconciliations/*_speciesEventCounts.txt files, I don't know which is S, D, T or L - I assume those are the four represented events, unless one is for gene presence?
Could you please specify the format of this output, either in the documentation or directly as a header of the files?

Second, my problem above then prevents me to fully assess the result of my run. I think there is something wrong, as none of the families had any Transfer event detected! instead the scenario came up with a very dubious combination of many D+L - just like if I was running a UndatedDL model instead of UndatedDTL, which I selected for this run.
Checking the estimated event rates in the results/*/stats.txt files, I see that for all families, the first one is stuck at what seems to be the minimum value, 1e-7. One example:

-9489.08 -318.888
Reconciliation rates = 1e-07 0.24704 0.15936

Maybe that first rate is the T rate, that somehow got stuck?

I was running GeneRax as compiled from the dev branch, on commit 2192e79. I'm now re-running that with the master branch, v1.1.0 released version see if it makes any difference.

here is the header of the log:

GeneRax was called as follow:
generax -r UndatedDTL --max-spr-radius 5 --strategy SPR -s /pantagruel_databases/Brady2019/05.core_genome/core-genome-based_reference_tree_Brady2019.full_clade_defs.nwk -f /pantagruel_databases/Brady2019/albingenes/prot_fullgenetree_GeneRax_recs/generax_prot_fullgenetree_pointestimate_2_generax.families -p /pantagruel_databases/Brady2019/albingenes/prot_fullgenetree_GeneRax_recs/generax_prot_fullgenetree_pointestimate_2/generax_prot_fullgenetree_pointestimate_2_generax --per-family-rates --reconcile 

Parameters summary: 
Families information: /pantagruel_databases/Brady2019/albingenes/prot_fullgenetree_GeneRax_recs/generax_prot_fullgenetree_pointestimate_2_generax.families
Species tree: /pantagruel_databases/Brady2019/05.core_genome/core-genome-based_reference_tree_Brady2019.full_clade_defs.nwk
Species Strategy: SPR
Strategy: SPR
Reconciliation model: UndatedDTL
DTL rates: per family rates
Prefix: /pantagruel_databases/Brady2019/albingenes/prot_fullgenetree_GeneRax_recs/generax_prot_fullgenetree_pointestimate_2/generax_prot_fullgenetree_pointestimate_2_generax
Unrooted reconciliation likelihood: OFF
Prune species tree: OFF
Reconciliation radius: 0
MPI Ranks: 8
Max gene SPR radius: 5
Gene support threshold: -1
Reconciliation likelihood weight: 1
Random seed: 123
Infer ML reconciliation: ON

Can you please provide some help on this one? I can supply example output files separately if needed.

All the best,

Florent

Generax results interpretation

Hello Benoit!
I tried to run generax on data from my research, and the results seemed very strange to me. Can I ask you some questions about them?
I study magnetotactic bacteria. These bacteria synthesize nanomagnets called magnetosomes.
To date, genes for magnetosome synthesis have been found in genomes from several different phyla, and I would like to study their evolution.
I built a species tree and rooted it (species_tree.nwk in attached files). I also have a fasta file with genes of magnetosome synthesis (for example, MamA is one of these genes) and ran generax with this code:
python /path/to/generax/build_families_file.py /path/to/generax/alignment/MamA/ NONE NONE LG+I+F+G4 /path/to/generax/families/families_mamA.txt
mpiexec -np 4 generax -f /path/to/generax/families/families_mamA.txt -s /path/to/generax/species_tree/species_tree.nwk --rec-model UndatedDTL --reconcile -p /path/to/generax/output/output_mamA
And I got as a result a file with DTLs (PDF attached).
Could you please tell me is it normal that the DTL lines are outside the species tree? Or am I doing something wrong?
Also, some genomes have 2 MamA genes, which control the synthesis of different magnetosomes. Do I understand correctly that these genes need to be named as species-name_gene1 and species-name_gene2?

Thanks in advance,
Maria

MamA_reconcilliated.pdf
mamA1.txt
species_tree1.txt

SpeciesRax: assertion fail with slow SPR radius

To reproduce:

python scripts/speciesrax/launch_speciesrax.py ensembl_96_ncrna_primates GTR+G random raxml-ng haswelld 16 --rec-model UndatedDL --fast-radius 6 --slow-radius 6

[00:01:47] Starting DTL rates optimization
[00:01:54] Starting species SPR search (FAST, radius=6, bestLL=-251686)
[00:02:02] Trying to re-root the species tree
RecLL = -251686
FIRST SLOW RADIUS SEARCH, WITHOUT TRANSFER 
speciesrax: /hits/basement/cme/morel/github/GeneRax/src/core/search/SPRSearch.cpp:146: static bool SPRSearch::applySPRRound(JointTree&, int, double&, bool): Assertion `fabs(ll - bestLoglk) < 0.00000001' failed.
[haswell-111:20120] *** Process received signal ***

Fix reconciliation backtracking origin

Add a mode to start from a random tree

Add a mode to stop a SPR round when a better tree is found

This might not be easy with the current parallelization scheme

Error when cloning the repository (with suggested solution)

When I run this command:

git clone --recursive [email protected]:BenoitMorel/GeneRax.git

I get the following error:

fatal: could not read from remote repository

However, I was able to download the repository using the following command instead:

git clone --recursive https://github.com/BenoitMorel/GeneRax

Also personally I was missing Flex and Bison and had to run:

sudo apt install flex bison

So you might want to add a dependencies section or something.

Add a --check mode, as in raxml-ng

try opening each alignment
check whether it's protein or DNA, compare with the input model.
Check all the trees files
Check whether all gene taxa are mapped to a species taxa
Check the species tree, and that it's rooted

Add GeneRax unit tests to travis

Current tests only test the old tool JointSearch.
Tests should include:

building a family file with the helper script I provide
running GeneRax without MPI, and check the output trees (maybe output the hash of the trees from GeneRax)
running GeneRax with MPI, if supported by travis. Else add at least a separate script to run manually
UndatedDL and UndatedDTL models
starting from random and raxml trees
EVAL and SPR modes

redundancy in transfer output files in sampling mode

Hi Benoit,

I first wanted to confirm that v1.2.0 works well, with scenarios that make sense and no bugs. Well done!

Only one thing I found was not correct was the output in reconciliations/*_transfers.txt files are redundant:
For every file reconciliations/*_$n_transfers.txt with $n an integer corresponding to a sampled scenario, the content is a concatenate of the record of the transfer events in that scenario, appended to the content of the transfer file for the previous scenario in the sample $(( $n - 1 )).
This concatenation accumulates over each iteration, so the transfer file for scenario n=99 is roughly a hundred times bigger than the file for scenario n=0.

Once removed the redundancy, the records of events are distinct, showing that the sampling is taking place correctly.

JointSearch testing problem

Dear Benoît,
I ran run_tests.py and got this error:
[[41208,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
Host: CalculateUbuntu

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.

JointSearch: /home/uzun/tools/GeneRax/src/core/likelihoods/ReconciliationEvaluation.cpp:44: void ReconciliationEvaluation::setRates(const Parameters&): Assertion `0 == parameters.dimensions() % freeParameters' failed.
[CalculateUbuntu:23868] *** Process received signal ***
[CalculateUbuntu:23868] Signal: Aborted (6)
[CalculateUbuntu:23868] Signal code: (-6)
[CalculateUbuntu:23868] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12dd0)[0x7f546c7c7dd0]
[CalculateUbuntu:23868] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f546c60a077]
[CalculateUbuntu:23868] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f546c5eb535]
[CalculateUbuntu:23868] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2240f)[0x7f546c5eb40f]
[CalculateUbuntu:23868] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x32142)[0x7f546c5fb142]
[CalculateUbuntu:23868] [ 5] /home/uzun/tools/GeneRax/build/bin/JointSearch(_ZN24ReconciliationEvaluation8setRatesERK10Parameters+0x258)[0x557abc71e458]
[CalculateUbuntu:23868] [ 6] /home/uzun/tools/GeneRax/build/bin/JointSearch(_ZN9JointTreeC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_S7_S7_S7_8RecModel6RecOptbddbbRK10Parameters+0x28c)[0x557abc749f4c]
[CalculateUbuntu:23868] [ 7] /home/uzun/tools/GeneRax/build/bin/JointSearch(_Z13internal_mainiPPcPv+0x602)[0x557abc708632]
[CalculateUbuntu:23868] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f546c5ed09b]
[CalculateUbuntu:23868] [ 9] /home/uzun/tools/GeneRax/build/bin/JointSearch(_start+0x2a)[0x557abc70764a]
[CalculateUbuntu:23868] *** End of error message ***
Traceback (most recent call last):
File "/home/uzun/tools/GeneRax/tests/run_tests.py", line 67, in
test_family(family_dir)
File "/home/uzun/tools/GeneRax/tests/run_tests.py", line 63, in test_family
test_family_with_parameters(family_dir, True)
File "/home/uzun/tools/GeneRax/tests/run_tests.py", line 53, in test_family_with_parameters
subprocess.check_call(command, stdout = FNULL)
File "/home/uzun/tools/miniconda3/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/uzun/tools/GeneRax/build/bin/JointSearch', '-s', '/home/uzun/tools/GeneRax/data/simulated_1/speciesTree.newick', '-a', '/home/uzun/tools/GeneRax/data/simulated_1/alignment.msa', '-m', '/home/uzun/tools/GeneRax/data/simulated_1/mapping.link', '-p', '/home/uzun/tools/GeneRax/tests/outputs/test', '--strategy', 'SPR', '-g', '/home/uzun/tools/GeneRax/data/simulated_1/raxmlGeneTree.newick']' died with <Signals.SIGABRT: 6>.

The other 2 tests work well.
Could you please help me to solve this problem?

Best regards,
Maria.

incomplete event reporting with NHX reconciliated tree format

Hi Benoit,

I am implementing a parse for GeneRax output, and specifically the reconciled tree in NHX format as that's how they come out of the program with the --reconciliation-samples option.

Doing so, I noticed that some transfer events were not reported in the NHX format, while they were in the recPhyloXML equivalent. See example below:

(BRADYR26_WSM3983__1:0.249317[&&NHX:S=BRADYR26:D=N:H=N:B=0.249317],(...))n338:0.028047[&&NHX:S=clade112:D=N:H=N:B=0.028047]

<clade>
	<name>NULL</name>
	<eventsRec>
		<speciation speciesLocation="clade112"/>
	</eventsRec>
	<clade>
		<name>BRADYR26_WSM3983__1</name>
		<eventsRec>
			<branchingOut speciesLocation="clade113"/>
		</eventsRec>
			<clade>
			<name>loss</name>
			<eventsRec>
				<loss speciesLocation="clade113"/>
			</eventsRec>
		</clade>
		<clade>
			<name>BRADYR26_WSM3983__1</name>
			<eventsRec>
				<transferBack destinationSpecies="BRADYR26"/>
				<leaf speciesLocation="BRADYR26"/>
			</eventsRec>
		</clade>
	</clade>
	<clade>
             ...
	</clade>
</clade>

(I omitted the content of a big clade but I will send you the full files by email)

I don't know if this is an error or just something that is not covered by this tree format.
If it is the latter - or in any case - may I suggest to generate the reconciliation output in the recPhyloXML format instead or in addition of the NHX format only, so to avoid information loss?

I know that the recPhyloXML format is more verbose (~ 6x more) than the NHX format, but still that would not lead to too crazy file sizes, even with 1000 samples in a file; in addition such files would naturally lend themselves to efficient compression given their repetitive nature.

Thanks again for providing this great tool!

All the best,

Florent

Fix hardcoded generaxslave path

One way would be to call generax itself again (and get its path using windows and unix specific calls), with an additional argument that would redirect the main to generaxslave main

Find what causes the recent worse results on simulations

Maybe the infinite precision switch?

GeneRax: crash when starting from random tree with less than 3 taxa

Either remove the corresponding family from the analysis or stop the run

Handle non binary species trees

recPhyloxml output placing gene branches outside species tree

I have run generax with the following options
generax -s MainSample_CATGTR_editedfixed.newick -f Families_acsc.txt -r UndatedDL -p acsc

The reconciliation appears to have worked, with the program completing and not showing any error messages but when I visualize the reconciliation .xml file the output is obviously wrong with gene tree branches placed everywhere, violating the bounds of the species tree and not forming any recognizable tree structure.

The output newick genetree file works fine and all the inputs seem correct.

I have emailed you the inputs and the output genetree.newick, .xlm and .svg from recphylo.

Have you encountered this problem before?

double free or corruption

I am not sure what kind of mistakes were meet.
here is the error. The memory map part are collapsed.

*** Error in `generax': double free or corruption (out): 0x00007f412ed43450 ***
======= Backtrace: =========
/usr/lib64/libc.so.6(+0x7c503)[0x7f412bf20503]
generax(_ZN27AbstractReconciliationModelI11ScaledValueE18setInitialGeneTreeEP11pll_utree_s+0x192)[0x7f412cdfa832]
generax(_ZN15UndatedDTLModelI11ScaledValueE18setInitialGeneTreeEP11pll_utree_s+0x1a)[0x7f412cdfaf8a]
generax(_ZN24ReconciliationEvaluation19buildRecModelObjectE8RecModelb+0x3f0)[0x7f412cdec140]
generax(_ZN24ReconciliationEvaluationC2ER13PLLRootedTreeR15PLLUnrootedTreeRK18GeneSpeciesMapping8RecModelbb+0xf5)[0x7f412cdec655]
generax(_ZN20SpeciesTreeOptimizer17updateEvaluationsEv+0x263)[0x7f412cdb5d83]
generax(_ZN20SpeciesTreeOptimizerC2ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorI10FamilyInfoSaIS7_EE8RecModelRK10ParametersbbbdRKS5_SH_+0x6
90)[0x7f412cdb8480]
generax(_ZN11GeneRaxCore17rerootSpeciesTreeER15GeneRaxInstance+0x13e)[0x7f412cd66a4e]
generax(_Z12generax_mainiPPcPv+0x38f)[0x7f412cd557ff]
generax(_Z13internal_mainiPPcPv+0x3c)[0x7f412cd5650c]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f412bec5b35]
generax(+0x35bc1)[0x7f412cd54bc1]
======= Memory map: ========
...
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[cl001:04754] *** Process received signal ***
[cl001:04754] Signal: Aborted (6)
[cl001:04754] Signal code:  (-6)
[cl001:04754] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x7f412c274370]
[cl001:04754] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7f412bed91d7]
[cl001:04754] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7f412beda8c8]
[cl001:04754] [ 3] /usr/lib64/libc.so.6(+0x74f07)[0x7f412bf18f07]
[cl001:04754] [ 4] /usr/lib64/libc.so.6(+0x7c503)[0x7f412bf20503]
[cl001:04754] [ 5] generax(_ZN27AbstractReconciliationModelI11ScaledValueE18setInitialGeneTreeEP11pll_utree_s+0x192)[0x7f412cdfa832]
[cl001:04754] [ 6] generax(_ZN15UndatedDTLModelI11ScaledValueE18setInitialGeneTreeEP11pll_utree_s+0x1a)[0x7f412cdfaf8a]
[cl001:04754] [ 7] generax(_ZN24ReconciliationEvaluation19buildRecModelObjectE8RecModelb+0x3f0)[0x7f412cdec140]
[cl001:04754] [ 8] generax(_ZN24ReconciliationEvaluationC2ER13PLLRootedTreeR15PLLUnrootedTreeRK18GeneSpeciesMapping8RecModelbb+0xf5)[0x7f412cdec655]
[cl001:04754] [ 9] generax(_ZN20SpeciesTreeOptimizer17updateEvaluationsEv+0x263)[0x7f412cdb5d83]
[cl001:04754] [10] generax(_ZN20SpeciesTreeOptimizerC2ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorI10FamilyInfoSaIS7_EE8RecModelRK10Paramet
ersbbbdRKS5_SH_+0x690)[0x7f412cdb8480]
[cl001:04754] [11] generax(_ZN11GeneRaxCore17rerootSpeciesTreeER15GeneRaxInstance+0x13e)[0x7f412cd66a4e]
[cl001:04754] [12] generax(_Z12generax_mainiPPcPv+0x38f)[0x7f412cd557ff]
[cl001:04754] [13] generax(_Z13internal_mainiPPcPv+0x3c)[0x7f412cd5650c]
[cl001:04754] [14] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f412bec5b35]
[cl001:04754] [15] generax(+0x35bc1)[0x7f412cd54bc1]
[cl001:04754] *** End of error message ***
[1]    4754 abort      generax -s ./trees/iqtree/Planctomycetes.reroot.newick -p  -f  >

Here is the used input files.
generax_Input.zip

Questions?

Hi -
Thanks for developing generax, seems like a very promising tool. I have a few questions: (1) the species tree should be rooted but the root can only be one species from what I experienced, that is quite embarrassing in case one wants to use an outgroup that includes several species and (2) the reconciliations output folder is missing in my run, should I activate an option to obtain it?
Thanks a lot!

Display the inferred D(T)L events in the output tree, and the mapping with the species tree

Maybe talk with some other reconciliation teams to know if there is a standard file format for that. Or if we could create one.

SpeciesRax: deadlock on haswell when using more than 1 node

Deadlock seems to happen when trying all the roots.

generax crashes when reconciling gene trees with the species tree

Hi,
I'm trying to run generax on ~ 20k gene families and a rooted species tree, but it crashes when starting the to reconcile gene trees with species tree. I've attached here the log file. Could you have a look?
Thank you very much,
Hien

generax.log

Detect polytomies in the input gene trees and return a clean error

Reported here https://groups.google.com/g/generaxusers/c/jFZhyEJfbA4

Error reading species tree

Dear Benoît,

I am having trouble getting GeneRax to recognize my species tree. I have built the program using conda on Linux and mac systems, and I can successfully run the data in your "simulated_1" folder. This makes me think the problem is with my tree. The error message my tree file are below. Please let me know if there is any additional information you need.

Thank you for the help,
David

[00:00:00] Filtering invalid families...

libc++abi.dylib: terminating with uncaught exception of type LibpllException: Error while reading tree from file: Species_Tree.newick
*** Process received signal ***
Signal: Abort trap: 6 (6)
Signal code: (0)
[ 0] 0 libsystem_platform.dylib 0x00007fff6bd2142d _sigtramp + 29
[ 1] 0 ??? 0x0000000000000400 0x0 + 1024
[ 2] 0 libsystem_c.dylib 0x00007fff6bbf6a1c abort + 120
[ 3] 0 libc++abi.dylib 0x00007fff68c94be8 __cxa_bad_cast + 0
[ 4] 0 libc++abi.dylib 0x00007fff68c94d84 _ZL28demangling_terminate_handlerv + 238
[ 5] 0 libobjc.A.dylib 0x00007fff6a7bc792 _ZL15_objc_terminatev + 104
[ 6] 0 libc++abi.dylib 0x00007fff68ca1dc7 _ZSt11__terminatePFvvE + 8
[ 7] 0 libc++abi.dylib 0x00007fff68ca1b6c __cxa_get_exception_ptr + 0
[ 8] 0 libc++abi.dylib 0x00007fff68c9345d __cxa_get_globals + 0
[ 9] 0 generax 0x000000010c31350a _ZN13LibpllParsers18readRootedFromFileERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE + 266
[10] 0 generax 0x000000010c3133cf ZN13LibpllParsers15labelRootedTreeERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEES8 + 15
[11] 0 generax 0x000000010c2ec5d3 _ZN11GeneRaxCore12initInstanceER15GeneRaxInstance + 851
[12] 0 generax 0x000000010c2eac55 _Z12generax_mainiPPcPv + 117
[13] 0 generax 0x000000010c2eb362 main + 82
[14] 0 libdyld.dylib 0x00007fff6bb287fd start + 1
*** End of error message ***
Abort trap: 6

Here is the Newick tree:

Species_Tree.newick.txt

Number of parentheses does not match in .nhx output

Hi, First of all, thank you for developing a wonderful tool. I would like to report that the number of parentheses in the .nhx output was inconsistent in my dataset.
"("=31
")"=32

Here are the input files and some outputs including the nhx.
Archive.zip

Command:

generax \
--species-tree "${species_tree}" \
--families generax_families.txt \
--strategy "SPR" \
--rec-model "UndatedDL" \
--prefix "generax_${og_id}" \
--seed 12345

Thank you.

Kenji

"reconciliations" folder missing

Hi Benoit,

Thank you for developing such a nice tool. I ran GeneRax and was surprised that some of the described output files are missing. There are no errors reported in any of the log files. Finally, I found the issue #31 in which you explain that the "reconciliations" folder can be missing if the "--reconcile" flag is not used. I was told by a colleague that this issue is fixed in the latest version. It turns out that my installation via conda is v1.1.0 and apparently not the latest version. If you are already aware of this problem, feel free to close this issue directly.

Best wishes,
Boas

Automatic tree generation: problematic characters retained

Hi, I have some fasta files with the following characters in the header: (,),|,: which, when used to automatically generate random starting trees with GeneRax, will be retained causing a segmentation fault without specifying why.

It could be worth a warning or a note, as some tree software automatically changes those problematic characters into '_' characters, but then, if using prebuilt trees, they won't match the input alignments, this error is presented to the user though.

Also, Hi Benoit, hope you're doing well, we met on the CoME2019 course a few months ago!

benoitmorel / generax Goto Github PK

generax's Introduction

GeneRax

SpeciesRax

Requirement

Installation

Running

Issues and questions

generax's People

Contributors

Stargazers

Watchers

Forkers

generax's Issues

NOTE: You can disable this warning by setting the MCA parameter btl_base_warn_component_unused to 0.

Recommend Projects

Recommend Topics

Recommend Org

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.