Giter VIP home page Giter VIP logo

pirouette's Introduction

pirouette

Branch GitHub Actions Codecov logo
master R-CMD-check codecov.io
develop R-CMD-check codecov.io

pirouette is an R package that estimates the error BEAST2 makes from a given phylogeny. This phylogeny can be created using any (non-BEAST) speciation model, for example the Protracted Birth-Death or Multiple-Birth-Death models.

Common abbreviations

  • nsm: Nucleotide Substitution Model
  • tral: TRue ALignment
  • twal: TWin ALignment

See the FAQ.

There is a feature I miss

See CONTRIBUTING, at Submitting use cases

I want to collaborate

See CONTRIBUTING, at 'Submitting code'

I think I have found a bug

See CONTRIBUTING, at 'Submitting bugs'

There's something else I want to say

Sure, just add an Issue. Or send an email.

Package dependencies

master branches

Package GitHub Actions Codecov logo
beautier R-CMD-check codecov.io
beastier R-CMD-check codecov.io
mauricer R-CMD-check codecov.io
mcbette R-CMD-check codecov.io
tracerer R-CMD-check codecov.io

develop branches

Package GitHub Actions Codecov logo
beautier R-CMD-check codecov.io
beastier R-CMD-check codecov.io
mauricer R-CMD-check codecov.io
mcbette R-CMD-check codecov.io
tracerer R-CMD-check codecov.io

Windows

Package Status
babette_on_windows Build status
beastier_on_windows Build status
beautier_on_windows Build status
mauricer_on_windows Build status
tracerer_on_windows Build status

External links

References

  • Bilderbeek, RJC, Laudanno, G, Etienne, RS. Quantifying the impact of an inference model in Bayesian phylogenetics. Methods Ecol Evol. 2020; 00: 1โ€“ 8. https://doi.org/10.1111/2041-210X.13514

pirouette's People

Contributors

giappo avatar richelbilderbeek avatar thijsjanzen avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pirouette's Issues

Bug: pir_run incorrectly assumes model_select_param$verbose exists

Cause:

create_fig_3()

Error:

Error in if (verbose) print(msg) : argument is of length zero
Calls: create_fig_3_file ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted

Bug:

richel@oldskool:~/GitHubs/pirouette$ egrep -R "verbose" --include=*.R
# ...
R/pir_run.R:        verbose = model_select_param$verbose

model_select_param$verbose does not exist.

Remove obsolete pirouette install scripts

In the scripts folder, there are

  • install_pirouette (created by Richel)
  • install_pirouette_develop (created by Giovanni)
  • install_pirouette.bash (created by Giovanni)
  • install_pirouette_develop.bash (created by Giovanni)

I would enjoy having only one install file.

In case Italians love their file extensions, I can live with renaming install_pirouette to install_pirouette.sh ๐ŸŒˆ

Good luck ๐Ÿ‘

`pir_run` should save the marginal likelihoods somewhere findable

Currently, if run with a most_evidence setup,pir_run returns only the weight of the model used in inference. The weights of the other models is discarded. A user may be interested in the weights for these models.

One of the ways is to let pir_run return the usual data frame, with all model weights (if requested), and NA values for the errors. Another -I think superior- way, is to store a filename where the evidences are saved, when most_evidence params are created!

Build under Windows

Depends on:

I see there are some vignettes that ruthlessly ignore Windows users. No wonder AppVeyor's build does not work. It would be good if a Windows user would try to get 'Check Package' working. If that works, assign me/you to #70 to get it built under AppVeyor.

I think @Giappo would like to express his love for his OS by doing this Issue ๐ŸŒˆ๐ŸŒˆ๐ŸŒˆ

Check 'error_function' argument

  • in 'check_error_measure_params':
    check if error_function is indeed a function.
    Test is in check_error_measure_params, in 'use' section.
  • in 'check_error_measure_params':
    check if error_function is indeed a function with at least 2 parameters.
    Test is in check_error_measure_params, in 'use' section.
  • in 'check_error_measure_params':
    check if error_function is indeed a function that has a lowest
    value for identical trees.

There are test is in check_error_measure_params, in 'use' section.

Installation error: running command 'R --no-site-file --no-environ --no-save --no-restore --quiet CMD config CC' had status 2

From @Giappo :

[...] I am trying to install pirouette also on my laptop. This time I
get this error:

devtools::install_github("richelbilderbeek/pirouette")
Downloading GitHub repo richelbilderbeek/pirouette@master
from URL https://api.github.com/repos/richelbilderbeek/pirouette/zipball/master
Installing pirouette
Downloading GitHub repo richelbilderbeek/babette@master
from URL https://api.github.com/repos/richelbilderbeek/babette/zipball/master
Installing babette
Downloading GitHub repo richelbilderbeek/beastier@master
from URL https://api.github.com/repos/richelbilderbeek/beastier/zipball/master
Installing beastier
Downloading GitHub repo richelbilderbeek/tracerer@master
from URL https://api.github.com/repos/richelbilderbeek/tracerer/zipball/master
Error in system(full, intern = TRUE, ignore.stderr = quiet, ...) :
running command '"D:/Program Files/Microsoft/R
Open/R-3.5.0/bin/x64/R" --no-site-file --no-environ --no-save
--no-restore --quiet CMD config CC' had status 2

Do you know if there is a way out?

Fix figure

The overview figure gave some confusion. Improve from that feedback.

Copy-paste some doc in pirouette.R

In the R folder, there is a file called pirouette.R. It's a common thing that put a package's help page in a file with its own name.

However, the file is clearly only a stub, while there is plenty of doc in the README and the vignettes.

Copy-paste some documentation into pirouette.R. It does not need to be perfect. Also don't forget to add a reference to our article-in-preparation ๐Ÿ‘

Allow to select error measure

Currently, the nLTT statistic is used as an error measure. It would be better to let the user specify which error measure he/she wants.

This need not be hard (code only a guideline):

pir_run(
  error = function(phylogeny, phylogenies) { nLTT.nltt_diffs(phylogeny, phylogenies) },
  ...
)

pirouette is not silent

From @Giappo:

I noticed that in "pir_run" even if "verbose" is set to its default
(which is FALSE) you will still get a very abundant printed output. Do
you think user might need a method to drastically cut the printed
output or remove it completely?

twin tree creation: simplify

Currently, the create_twin_tree function has a very clean interface. In the back-end, there is room for improvement, as now new functionality (e.g. dododo::phylo2L) exists.

Also, the tests now rely on razzo. pirouette cannot depend on razzo, so the tests need to be rewritten not to do so, yet achieving 100% code coverage.

Simplify 'sim_alignment_file'

Currently sim_alignment_file takes three arguments:

  sim_alignment_file(
    fasta_filename # alignment_params$fasta_filename,
    phylogeny,
    alignment_params
  )

Because a fasta_filename is already part of an alignment_params, change its interface to:

  sim_alignment_file(
    phylogeny,
    alignment_params
  ) {
    fasta_filename <- alignment_params$fasta_filename
    # ...
  }

Collect test functions in a single file

The function load_tree used to load the pre-simulated trees from extdata is used in many tests. However this function is written many times across all tests. It would be desirable to have it only once.

Specify where the BEAST2 output files are stored

Depends on:

It should be possible to specify to pirouette where the BEAST2 output files are stored. This is essantial for both raket and razzo.

Currently, due to the model selection, this feature is lost.

And additionally, I never liked those model selection functions.

I think it's best to ditch the model_select_params in favour of a list of inference models. Each inference model must have:

  • a type: generative or candidate
  • a site model, clock model, tree prior
  • an optional MRCA prior
  • the BEAST2 input and output filenames

pirouette should import dependencies

Currently, pirouette does not import any of its dependencies. Due to this, the user must know the correct namespace of all functions. Also, this makes the pirouette article hard to read.

Import all dependencies, like babette also does

Mutation rate can be a function working on a phylogeny

Currently, the mutation rate must be a known and constant value, set when creating an alignment params.

If pirouette would be given phylogenies of different crown ages, the mutation rate must be re-set each time.

A superior alternative would be to allow the user to specify a function to determine the mutation rate, based on the crown age:

create_alignment_params(
  mutation_rate = function(crown_age) { 1.0 / crown_age },
  ...
}

pirouette can then calculate the mutation rate itself in pir_run.

The old interface should keep working:

create_alignment_params(
  mutation_rate = 0.1,
  ...
}

Allow to generate twin tree using a Yule model

Currently, the twin tree simulated follows a BD model, and no other speciation model is allowed.

Yet, a user may want to choose to let the twin tree follow a Yule (pure-birth) model.

It would be good if the user can specify the tree prior of the twin tree, which can be BD and Yule.

There are three more tree priors (coalescent bayesian skyline, coalescent constant population, coalescent exponential population) in BEAST2, but I do not think these are relevant enough.

Tag v1.0

If all Issues in project 'v1.0' are done, the package needs to be tagged to be version 1.0.

Fix test-phylo_to_errors

If you try to get nltts calling phylo_to_errors in its test you get the following error message:

Error: file.exists(alignment_params$fasta_filename) is not TRUE
5.
stop(sprintf(ngettext(length(val), "%s is not TRUE", "%s are not all TRUE"),
deparse_key(expr)), call. = FALSE, domain = NA)
4.
assert2(fact, if (one) mc[[2]][-1] else mc[-1], parent.frame(),
!one)
3.
testit::assert(file.exists(alignment_params$fasta_filename)) at alignment_params_to_posterior_trees.R#28
2.
alignment_params_to_posterior_trees(alignment_params = alignment_params,
inference_model = inference_model, inference_param = inference_param) at phylo_to_errors.R#31
1.
phylo_to_errors(phylogeny = phylogeny, alignment_params = create_alignment_params(root_sequence = "acgt",
mutation_rate = 0, site_model = beautier::create_jc69_site_model(),
clock_model = beautier::create_strict_clock_model(), rng_seed = 0),
inference_model = create_inference_model(site_model = beautier::create_jc69_site_model(), ...

This is related to the fact that one of the inputs for "create_alignment_params" has to be "fasta_filename" which is set to default as "tempfile(fileext = ".fasta")". This, however, makes one testit::assert to trigger in "alignment_params_to_posterior_trees.R#28".

pir_run's model_select_params may be a model_select_params (instead of a list of 1 element which is a model_select_params)

Currently, pir_run's model_select_params argument assumes a list of 1 or more elements:

model_select_params <- list(
  create_gen_model_select_param(
    alignment_params = alignment_params,
    tree_prior = create_bd_tree_prior()
  )
)
pir_run(
  model_select_params = model_select_params,
  ...
)

Creating a list is awkward for the user. pir_run must up the params to a list, allowing this interface:

# This should work
model_select_param <- create_gen_model_select_param(
  alignment_params = alignment_params,
  tree_prior = create_bd_tree_prior()
)
pir_run(
  model_select_params = model_select_param,
  ...
)

Doing this is simple: in pir_run if is_model_select_param(model_select_params) == TRUE, model_select_params is not a list and must be made so. Something like this:

if (is_model_select_param(model_select_params) == TRUE) {
  model_select_params <- list(model_select_params)
}

Write a test that the listing marked with # This should work works fine, using expect_silent.

Fix vignette "twinning.rmd"

At lines 83-90 pir_run cannot run.
I report the full error message here:

Error in if (verbose) print(msg) : argument is of length zero
7.
value[3L]
6.
tryCatchOne(expr, names, parentenv, handlers[[1L]])
5.
tryCatchList(expr, classes, parentenv, handlers)
4.
tryCatch({
marg_lik <- babette::bbt_run(fasta_filename = fasta_filename,
site_model = site_model, clock_model = clock_model, tree_prior = tree_prior,
mcmc = beautier::create_mcmc_nested_sampling(epsilon = epsilon), ...
3.
mcbette::est_marg_liks(fasta_filename = alignment_params$fasta_filename,
site_models = model_select_param$site_models, clock_models = model_select_param$clock_models,
tree_priors = model_select_param$tree_priors, epsilon = model_select_param$epsilon,
verbose = model_select_param$verbose) at pir_run.R#92
2.
pir_run_tree(phylogeny = phylogeny, tree_type = "true", alignment_params = alignment_params,
model_select_params = model_select_params, inference_param = inference_param) at pir_run.R#37
1.
pirouette::pir_run(phylogeny = phylogeny, twinning_params = twinning_params,
alignment_params = alignment_params, model_select_params = model_select_params,
inference_param = inference_param)

twin tree creation: why is the test wrong?

When creating a twin tree, one expects that the closest related taxa remain the closest related and vice versa for the most distant related.

Multiple tests confirm that create_twin_tree does that perfectly well, I show only one here:

test_that("node distances should remain in the same order, 4 taxa, hard", {

  tree <- ape::read.tree(text = "((A:2, (B:1, C:1):1):1, D:3);")
  twin_tree <- create_twin_tree(tree)
  n_tips <- ape::Ntip(tree)
  expect_equal(
    order(ape::dist.nodes(tree)[1:n_tips, 1:n_tips]),
    order(ape::dist.nodes(twin_tree)[1:n_tips, 1:n_tips])
  )
})

However, for a tree obtained from brute-forcing for errors:

test_that("node distances should remain in the same order, 4 taxa", {

  skip("twin tree creation")
  tree <- ape::read.tree(text = "(t2:1.9827033,((t4:0.2338486712,t3:0.2338486712):0.4930762889,t1:0.7269249601):1.25577834);") # nolint indeed this is a long line, but it is what the brute-force below generated
  twin_tree <- create_twin_tree(tree)
  n_tips <- ape::Ntip(tree)
  expect_equal(
    order(ape::dist.nodes(tree)[1:n_tips, 1:n_tips]),
    order(ape::dist.nodes(twin_tree)[1:n_tips, 1:n_tips])
  )
})

The trees do look correct:

Original tree:

2018-12-20-085828_1202x849_scrot

Twin tree:

2018-12-20-085834_1202x849_scrot

So, why does the test think something is wrong? Or: how to test correctly?

Goal of this Issue is to test that, when twinning. the closest related taxa remain the closest related in the correct way. The brute-force test below should always work. If the brute-force test tests correctly and passed, this Issue can be closed.

test_that("node distances should remain in the same order, brute-force", {

  skip("twin tree creation")

  if (!is_on_travis()) return()
  # Or:
  #  - taxa that are closest, should remain closest in the twin tree
  #  - taxa that are farthest, should remain farthest in the twin tree
  for (i in seq(1, 100)) {
    set.seed(i)
    tree <- beastier:::create_random_phylogeny(n_taxa = 4)
    ape::write.tree(tree)
    ape::plot.phylo(tree)
    twin_tree <- create_twin_tree(tree)
    n_tips <- ape::Ntip(tree)
    # Only care about nodes that are tips
    expect_equal(
      order(ape::dist.nodes(tree)[1:n_tips, 1:n_tips]),
      order(ape::dist.nodes(twin_tree)[1:n_tips, 1:n_tips]),
      info = paste("seed:", i)
    )
  }
})

Rename 'inference_param'

Rename:

From To
inference_param inference_params
check_inference_param check_inference_params
create_inference_param create_inference_params

Twin trees should have an ideal branch length distribution

Currently, the twin trees are simulated using a desired diversification process.

Goal of the twin tree is to assess the minimal level of noise (i.e. error) BEAST2 gives.

To achieve a minimal level of noise, an idealized tree should be used, i.e. a tree that exactly follows the expected branch lengths distribution. Cool bonus: the RNG seed for the twin tree creation can be removed.

There is something to be said about using a random twin tree, but I think an idealized tree would be more useful (and described in the manuscript).

@Giappo: go ahead and share your thoughts ๐ŸŒˆ

If you agree, go ahead and assign yourself ๐Ÿ‘

Specify burn-in

raket has a burn-in of 20% (that is, removing the first 20% of a posterior). pirouette does not provide for this yet.

pir_run should also return the BEAST2 filenames

Currently, pir_run returns the model setup and errors. The BEAST2 files are all discarded. The user should have the choice to see these: BEAST2 input file, .trees, .xml.state and .log filename.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.