richelbilderbeek / pirouette Goto Github PK

R package that estimates the error BEAST2 makes from a given phylogeny

License: GNU General Public License v3.0

R 99.54% Shell 0.46%

beast2 tree compare-distributions pirouette twin-tree phylogeny error-beast2 beast

pirouette's Introduction

pirouette

Branch
`master`
`develop`

pirouette is an R package that estimates the error BEAST2 makes from a given phylogeny. This phylogeny can be created using any (non-BEAST) speciation model, for example the Protracted Birth-Death or Multiple-Birth-Death models.

Common abbreviations

nsm: Nucleotide Substitution Model
tral: TRue ALignment
twal: TWin ALignment

FAQ

See the FAQ.

There is a feature I miss

See CONTRIBUTING, at Submitting use cases

I want to collaborate

See CONTRIBUTING, at 'Submitting code'

I think I have found a bug

See CONTRIBUTING, at 'Submitting bugs'

There's something else I want to say

Sure, just add an Issue. Or send an email.

Package dependencies

`master` branches

Package
beautier
beastier
mauricer
mcbette
tracerer

`develop` branches

Package
beautier
beastier
mauricer
mcbette
tracerer

Windows

Package	Status
babette_on_windows
beastier_on_windows
beautier_on_windows
mauricer_on_windows
tracerer_on_windows

External links

BEAST2 GitHub

References

Bilderbeek, RJC, Laudanno, G, Etienne, RS. Quantifying the impact of an inference model in Bayesian phylogenetics. Methods Ecol Evol. 2020; 00: 1– 8. https://doi.org/10.1111/2041-210X.13514

pirouette's People

Contributors

Stargazers

Watchers

Forkers

giappo thijsjanzen

pirouette's Issues

Bug: pir_run incorrectly assumes model_select_param$verbose exists

Cause:

create_fig_3()

Error:

Error in if (verbose) print(msg) : argument is of length zero
Calls: create_fig_3_file ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted

Bug:

richel@oldskool:~/GitHubs/pirouette$ egrep -R "verbose" --include=*.R
# ...
R/pir_run.R:        verbose = model_select_param$verbose

model_select_param$verbose does not exist.

Write a script to create figure 3 of the article

See richelbilderbeek/pirouette_article#3.

Depends on #36.

Merge model_select_params and inference_params into a list of experiments

Depends on #87
Depends on beautier
Depends on beastier
Depends on babette
Depends on mcbette
an experiment must have a beast2_options_inference and beast2_options_est_evidence, #80

Remove obsolete pirouette install scripts

In the scripts folder, there are

install_pirouette (created by Richel)
install_pirouette_develop (created by Giovanni)
install_pirouette.bash (created by Giovanni)
install_pirouette_develop.bash (created by Giovanni)

I would enjoy having only one install file.

In case Italians love their file extensions, I can live with renaming install_pirouette to install_pirouette.sh 🌈

Good luck 👍

`pir_run` should save the marginal likelihoods somewhere findable

Currently, if run with a most_evidence setup,pir_run returns only the weight of the model used in inference. The weights of the other models is discarded. A user may be interested in the weights for these models.

One of the ways is to let pir_run return the usual data frame, with all model weights (if requested), and NA values for the errors. Another -I think superior- way, is to store a filename where the evidences are saved, when most_evidence params are created!

Create Peregrine install script

Build under Windows

Depends on:

#69:

I see there are some vignettes that ruthlessly ignore Windows users. No wonder AppVeyor's build does not work. It would be good if a Windows user would try to get 'Check Package' working. If that works, assign me/you to #70 to get it built under AppVeyor.

I think @Giappo would like to express his love for his OS by doing this Issue 🌈🌈🌈

Add 'error_measure_params' functions

The user should be able to:

select the error measure, #32
specify the burn-in, #29

phylogeny -> posterior

Setup option to use twinning

Check 'error_function' argument

in 'check_error_measure_params':
check if error_function is indeed a function.
Test is in check_error_measure_params, in 'use' section.
in 'check_error_measure_params':
check if error_function is indeed a function with at least 2 parameters.
Test is in check_error_measure_params, in 'use' section.
in 'check_error_measure_params':
check if error_function is indeed a function that has a lowest
value for identical trees.

There are test is in check_error_measure_params, in 'use' section.

Installation error: running command 'R --no-site-file --no-environ --no-save --no-restore --quiet CMD config CC' had status 2

From @Giappo :

[...] I am trying to install pirouette also on my laptop. This time I
get this error:

devtools::install_github("richelbilderbeek/pirouette")
Downloading GitHub repo richelbilderbeek/pirouette@master
from URL https://api.github.com/repos/richelbilderbeek/pirouette/zipball/master
Installing pirouette
Downloading GitHub repo richelbilderbeek/babette@master
from URL https://api.github.com/repos/richelbilderbeek/babette/zipball/master
Installing babette
Downloading GitHub repo richelbilderbeek/beastier@master
from URL https://api.github.com/repos/richelbilderbeek/beastier/zipball/master
Installing beastier
Downloading GitHub repo richelbilderbeek/tracerer@master
from URL https://api.github.com/repos/richelbilderbeek/tracerer/zipball/master
Error in system(full, intern = TRUE, ignore.stderr = quiet, ...) :
running command '"D:/Program Files/Microsoft/R
Open/R-3.5.0/bin/x64/R" --no-site-file --no-environ --no-save
--no-restore --quiet CMD config CC' had status 2

Do you know if there is a way out?

Fix figure

The overview figure gave some confusion. Improve from that feedback.

Allow to specify site and clock model used in alignment simulation

Currently, the alignment is always simulated with a JC69 site model and a strict clock model. Allow the user to pick a different site and clock model.

Copy-paste some doc in pirouette.R

In the R folder, there is a file called pirouette.R. It's a common thing that put a package's help page in a file with its own name.

However, the file is clearly only a stub, while there is plenty of doc in the README and the vignettes.

Copy-paste some documentation into pirouette.R. It does not need to be perfect. Also don't forget to add a reference to our article-in-preparation 👍

phylogeny -> nLTT

Call pirouette from razzo

A nice test case.

Allow to select error measure

Currently, the nLTT statistic is used as an error measure. It would be better to let the user specify which error measure he/she wants.

This need not be hard (code only a guideline):

pir_run(
  error = function(phylogeny, phylogenies) { nLTT.nltt_diffs(phylogeny, phylogenies) },
  ...
)

Get AppVeyor build working

Depends on:

I volunteer to do so, but Giappo would be just as welcome to try.

Add pirouette call in razzo

Depends on:

See progress at the razzo 'Launch' project page

pirouette is not silent

From @Giappo:

I noticed that in "pir_run" even if "verbose" is set to its default
(which is FALSE) you will still get a very abundant printed output. Do
you think user might need a method to drastically cut the printed
output or remove it completely?

Improve codecov

twin tree creation: simplify

Currently, the create_twin_tree function has a very clean interface. In the back-end, there is room for improvement, as now new functionality (e.g. dododo::phylo2L) exists.

Also, the tests now rely on razzo. pirouette cannot depend on razzo, so the tests need to be rewritten not to do so, yet achieving 100% code coverage.

Simplify 'sim_alignment_file'

Currently sim_alignment_file takes three arguments:

  sim_alignment_file(
    fasta_filename # alignment_params$fasta_filename,
    phylogeny,
    alignment_params
  )

Because a fasta_filename is already part of an alignment_params, change its interface to:

  sim_alignment_file(
    phylogeny,
    alignment_params
  ) {
    fasta_filename <- alignment_params$fasta_filename
    # ...
  }

Collect test functions in a single file

The function load_tree used to load the pre-simulated trees from extdata is used in many tests. However this function is written many times across all tests. It would be desirable to have it only once.

Setup option to run model with most evidence

Specify where the BEAST2 output files are stored

Depends on:

It should be possible to specify to pirouette where the BEAST2 output files are stored. This is essantial for both raket and razzo.

Currently, due to the model selection, this feature is lost.

And additionally, I never liked those model selection functions.

I think it's best to ditch the model_select_params in favour of a list of inference models. Each inference model must have:

a type: generative or candidate
a site model, clock model, tree prior
an optional MRCA prior
the BEAST2 input and output filenames

Separate alignment params from BEAST2 params

pirouette should import dependencies

Currently, pirouette does not import any of its dependencies. Due to this, the user must know the correct namespace of all functions. Also, this makes the pirouette article hard to read.

Import all dependencies, like babette also does

Mutation rate can be a function working on a phylogeny

Currently, the mutation rate must be a known and constant value, set when creating an alignment params.

If pirouette would be given phylogenies of different crown ages, the mutation rate must be re-set each time.

A superior alternative would be to allow the user to specify a function to determine the mutation rate, based on the crown age:

create_alignment_params(
  mutation_rate = function(crown_age) { 1.0 / crown_age },
  ...
}

pirouette can then calculate the mutation rate itself in pir_run.

The old interface should keep working:

create_alignment_params(
  mutation_rate = 0.1,
  ...
}

Allow to generate twin tree using a Yule model

Currently, the twin tree simulated follows a BD model, and no other speciation model is allowed.

Yet, a user may want to choose to let the twin tree follow a Yule (pure-birth) model.

It would be good if the user can specify the tree prior of the twin tree, which can be BD and Yule.

There are three more tree priors (coalescent bayesian skyline, coalescent constant population, coalescent exponential population) in BEAST2, but I do not think these are relevant enough.

Tag v1.0

If all Issues in project 'v1.0' are done, the package needs to be tagged to be version 1.0.

Write a script to create figure 4 of the article

See richelbilderbeek/pirouette_article#4.

Depends on #36.

Fix test-phylo_to_errors

If you try to get nltts calling phylo_to_errors in its test you get the following error message:

Error: file.exists(alignment_params$fasta_filename) is not TRUE
5.
stop(sprintf(ngettext(length(val), "%s is not TRUE", "%s are not all TRUE"),
deparse_key(expr)), call. = FALSE, domain = NA)
4.
assert2(fact, if (one) mc[[2]][-1] else mc[-1], parent.frame(),
!one)
3.
testit::assert(file.exists(alignment_params$fasta_filename)) at alignment_params_to_posterior_trees.R#28
2.
alignment_params_to_posterior_trees(alignment_params = alignment_params,
inference_model = inference_model, inference_param = inference_param) at phylo_to_errors.R#31
1.
phylo_to_errors(phylogeny = phylogeny, alignment_params = create_alignment_params(root_sequence = "acgt",
mutation_rate = 0, site_model = beautier::create_jc69_site_model(),
clock_model = beautier::create_strict_clock_model(), rng_seed = 0),
inference_model = create_inference_model(site_model = beautier::create_jc69_site_model(), ...

This is related to the fact that one of the inputs for "create_alignment_params" has to be "fasta_filename" which is set to default as "tempfile(fileext = ".fasta")". This, however, makes one testit::assert to trigger in "alignment_params_to_posterior_trees.R#28".

pir_run's model_select_params may be a model_select_params (instead of a list of 1 element which is a model_select_params)

Currently, pir_run's model_select_params argument assumes a list of 1 or more elements:

model_select_params <- list(
  create_gen_model_select_param(
    alignment_params = alignment_params,
    tree_prior = create_bd_tree_prior()
  )
)
pir_run(
  model_select_params = model_select_params,
  ...
)

Creating a list is awkward for the user. pir_run must up the params to a list, allowing this interface:

# This should work
model_select_param <- create_gen_model_select_param(
  alignment_params = alignment_params,
  tree_prior = create_bd_tree_prior()
)
pir_run(
  model_select_params = model_select_param,
  ...
)

Doing this is simple: in pir_run if is_model_select_param(model_select_params) == TRUE, model_select_params is not a list and must be made so. Something like this:

if (is_model_select_param(model_select_params) == TRUE) {
  model_select_params <- list(model_select_params)
}

Write a test that the listing marked with # This should work works fine, using expect_silent.

Fix vignette "twinning.rmd"

At lines 83-90 pir_run cannot run.
I report the full error message here:

Error in if (verbose) print(msg) : argument is of length zero
7.
value[3L]
6.
tryCatchOne(expr, names, parentenv, handlers[[1L]])
5.
tryCatchList(expr, classes, parentenv, handlers)
4.
tryCatch({
marg_lik <- babette::bbt_run(fasta_filename = fasta_filename,
site_model = site_model, clock_model = clock_model, tree_prior = tree_prior,
mcmc = beautier::create_mcmc_nested_sampling(epsilon = epsilon), ...
3.
mcbette::est_marg_liks(fasta_filename = alignment_params$fasta_filename,
site_models = model_select_param$site_models, clock_models = model_select_param$clock_models,
tree_priors = model_select_param$tree_priors, epsilon = model_select_param$epsilon,
verbose = model_select_param$verbose) at pir_run.R#92
2.
pir_run_tree(phylogeny = phylogeny, tree_type = "true", alignment_params = alignment_params,
model_select_params = model_select_params, inference_param = inference_param) at pir_run.R#37
1.
pirouette::pir_run(phylogeny = phylogeny, twinning_params = twinning_params,
alignment_params = alignment_params, model_select_params = model_select_params,
inference_param = inference_param)

twin tree creation: why is the test wrong?

When creating a twin tree, one expects that the closest related taxa remain the closest related and vice versa for the most distant related.

Multiple tests confirm that create_twin_tree does that perfectly well, I show only one here:

test_that("node distances should remain in the same order, 4 taxa, hard", {

  tree <- ape::read.tree(text = "((A:2, (B:1, C:1):1):1, D:3);")
  twin_tree <- create_twin_tree(tree)
  n_tips <- ape::Ntip(tree)
  expect_equal(
    order(ape::dist.nodes(tree)[1:n_tips, 1:n_tips]),
    order(ape::dist.nodes(twin_tree)[1:n_tips, 1:n_tips])
  )
})

However, for a tree obtained from brute-forcing for errors:

test_that("node distances should remain in the same order, 4 taxa", {

  skip("twin tree creation")
  tree <- ape::read.tree(text = "(t2:1.9827033,((t4:0.2338486712,t3:0.2338486712):0.4930762889,t1:0.7269249601):1.25577834);") # nolint indeed this is a long line, but it is what the brute-force below generated
  twin_tree <- create_twin_tree(tree)
  n_tips <- ape::Ntip(tree)
  expect_equal(
    order(ape::dist.nodes(tree)[1:n_tips, 1:n_tips]),
    order(ape::dist.nodes(twin_tree)[1:n_tips, 1:n_tips])
  )
})

The trees do look correct:

Original tree:

Twin tree:

So, why does the test think something is wrong? Or: how to test correctly?

Goal of this Issue is to test that, when twinning. the closest related taxa remain the closest related in the correct way. The brute-force test below should always work. If the brute-force test tests correctly and passed, this Issue can be closed.

test_that("node distances should remain in the same order, brute-force", {

  skip("twin tree creation")

  if (!is_on_travis()) return()
  # Or:
  #  - taxa that are closest, should remain closest in the twin tree
  #  - taxa that are farthest, should remain farthest in the twin tree
  for (i in seq(1, 100)) {
    set.seed(i)
    tree <- beastier:::create_random_phylogeny(n_taxa = 4)
    ape::write.tree(tree)
    ape::plot.phylo(tree)
    twin_tree <- create_twin_tree(tree)
    n_tips <- ape::Ntip(tree)
    # Only care about nodes that are tips
    expect_equal(
      order(ape::dist.nodes(tree)[1:n_tips, 1:n_tips]),
      order(ape::dist.nodes(twin_tree)[1:n_tips, 1:n_tips]),
      info = paste("seed:", i)
    )
  }
})

Rename 'inference_param'

Rename:

From	To
`inference_param`	`inference_params`
`check_inference_param`	`check_inference_params`
`create_inference_param`	`create_inference_params`

Twin trees should have an ideal branch length distribution

Currently, the twin trees are simulated using a desired diversification process.

Goal of the twin tree is to assess the minimal level of noise (i.e. error) BEAST2 gives.

To achieve a minimal level of noise, an idealized tree should be used, i.e. a tree that exactly follows the expected branch lengths distribution. Cool bonus: the RNG seed for the twin tree creation can be removed.

There is something to be said about using a random twin tree, but I think an idealized tree would be more useful (and described in the manuscript).

@Giappo: go ahead and share your thoughts 🌈

If you agree, go ahead and assign yourself 👍

Deprecate 'get_crown_age'

Deprecate 'get_crown_age' and use 'beautier::get_phylo_crown_age' instead.

Improve dododo

See the dododo Issue.

I think @Giappo will volunteer to do so 👍

(if not, sorry, just let me know)

Specify burn-in

raket has a burn-in of 20% (that is, removing the first 20% of a posterior). pirouette does not provide for this yet.

pir_run should also return the BEAST2 filenames

Currently, pir_run returns the model setup and errors. The BEAST2 files are all discarded. The user should have the choice to see these: BEAST2 input file, .trees, .xml.state and .log filename.

Put in figures of pirouette paper

Before #58, then assign me again. OK?

Rename 'get_phylo_crown_age' to 'get_crown_age', in babette and all its dependencies

If there is an MRCA prior specified, pirouette has to set its alignment ID and taxon names

An MRCA prior needs an alignment ID and taxon names. Because pirouette simulates an alignmnent, this ID is unknown beforehand. Only pirouette can, after simulating an alignment, set an MRCA prior's alignment ID and taxon names. Make pirouette do so.

richelbilderbeek / pirouette Goto Github PK

pirouette's Introduction

pirouette

Common abbreviations

There is a feature I miss

I want to collaborate

I think I have found a bug

There's something else I want to say

Package dependencies

master branches

develop branches

Windows

External links

References

pirouette's People

Contributors

Stargazers

Watchers

Forkers

pirouette's Issues

Recommend Projects

Recommend Topics

Recommend Org

`master` branches

`develop` branches