evoldoers / historian Goto Github PK
View Code? Open in Web Editor NEWReconstruction of phylogenetic insertion-deletion histories
License: BSD 3-Clause "New" or "Revised" License
Reconstruction of phylogenetic insertion-deletion histories
License: BSD 3-Clause "New" or "Revised" License
I am trying to reconstruct the ancestors of a publicly-available HIV sequence data set called target
using Historian by feeding it a guide MSA and a guide tree. However, when I provide my guide tree, Historian creates a problematic output.
Here's an overview of what I've tried as input to Historian already
(target
is the data set causing problems):
Guide MSA and guide alignment on a completely unrelated data set (F1):
historian r -guide F1_MSA.fasta -tree F1.tree -output fasta > F1_recon.txt
...which works fine and provides normal output: F1_recon.txt
Raw sequence file OR guide MSA file with no guide tree
historian r target.fasta -output fasta > target_recon.fasta
historian r -guide target_MSA.fasta -output fasta > target_recon.fasta
...which generates normal output too, like the reconstruction file shown below
Guide MSA file with a guide tree
historian r -guide target_MSA.fasta -tree target.tree -output fasta > target_recon.fasta
...which creates a problematic reconstruction file (first 4 sequences only): target_recon.txt
...and aligns like this in Aliview:
Tree: target_tree.txt
MSA: target_MSA.txt
Any advice or help would be greatly appreciated.
On current master, I get:
$ historian a.fasta
You can't fix both tree and alignment when doing MCMC - you must sample one of them!
The goal is to update the HMMs of Historian (ML and MCMC components) to use the systematic approximation to the indel model described here: https://academic.oup.com/genetics/article/216/4/1187/6065876
The primary difficulties in doing this are as follows:
(A limited strategy may be to leave the current inference code in place for the ML alignment/reconstruction phase (and associated HMMs), and update only the MCMC and parameter-fitting code.
Viability of this strategy is unclear though, it seems a little risky from a correctness standpoint.)
For a full update, at a bare minimum, the following code will need to be updated:
ProbModel::transProb
IndelCounts
, EventCounts
One question is... would it be better to build a generic, parallelizable, MCMC-only system using Machine Boss?
Pros of doing in in Machine Boss:
Cons of doing it in Machine Boss:
Not just parents that are in range defined by current parent (which violates ergodicity)
Originally requested by @bredelings in #6
Actually two tests:
Hi there,
I've been using Historian to reconstruct the ancestral sequences within multiple different sequence data sets. My largest data set, which contains roughly 1000 sequences, has been unable to complete a run with Historian. The error message is displayed below with a high verbosity setting, simply stating "Killed" when it crashes.
I was first wondering whether this is expected when handling a large data set like this one. Is Historian able to handle sequence sets of this size?
If this failure isn't expected, would you know of a way to retrieve more information about the problem? Or know of possible adjustments to try?
Thanks in advance,
John
I'm very impressed at what historian can do! It seems like it can/should replace PRANK, at the very least. I will make sure to include it in any future benchmarks.
For the ancestral sequence reconstructions, two thoughts:
*
for an unknown letter? I think that *
can mean "stop codon" in some cases, and I don't think many people in bioinformatics use *
On my desktop, but not the remote cluster, I get this:
clang++ -std=c++11 -g -O3 -I/usr/include -Isrc -c -o obj/jcrna.o src/jcrna.cpp
clang++ -lstdc++ -lm -lgsl -lgslcblas -lm -lz -o bin/historian obj/historian.o obj/alignpath.o obj/ctok.o obj/dayhoff.o obj/diagenv.o obj/ECMrest.o obj/ECMunrest.o obj/fastseq.o obj/forward.o obj/gamma.o obj/gason.o obj/jc.o obj/jcrna.o obj/jones.o obj/jsonutil.o obj/knhx.o obj/lg.o obj/logger.o obj/logsumexp.o obj/memsize.o obj/model.o obj/nexus.o obj/optparser.o obj/pairhmm.o obj/presets.o obj/profile.o obj/quickalign.o obj/recon.o obj/refiner.o obj/sampler.o obj/seqgraph.o obj/simulator.o obj/span.o obj/stockholm.o obj/sumprod.o obj/tree.o obj/util.o obj/wag.o
/usr/bin/ld: obj/logger.o: in function `std::recursive_timed_mutex::_M_clocklock(int, timespec const&)':
/usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/mutex:336: undefined reference to `pthread_mutex_clocklock'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Hi Ian,
I hope all is well with you!
If I include zero-length sequences in the alignment, I get the following:
Abort: Zero forward likelihood even in the absence of guide alignment constraints - this is not good
Stack trace:
historian() [0x51302e]
historian() [0x4a7aaa]
historian() [0x4afa9b]
historian() [0x418079]
historian() [0x417dc4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7ffb37193d0a]
historian() [0x406b5a]
terminate called without an active exception
Aborted
I'm doing some polishing of genome-to-genome multiple-alignments ("pan-genome alignments") by realigning chunks, and historian does quite well on these, I think with -careful
. Sometimes these chunks do have zero-length sequences.
No pressure, just logging the issue.
-BenRI
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.