Giter VIP home page Giter VIP logo

metacal's Introduction

metacal

DOI

The metacal package provides tools for bias estimation and calibration in marker-gene and metagenomics sequencing experiments. It implements the methods described in McLaren MR, Willis AD, Callahan BJ (2019) and is used for the analysis associated with that manuscript, available at the manuscript's repository.

Installation

Install the development version of metacal from from GitHub,

# install.packages("devtools")
devtools::install_github("mikemc/metacal")

Usage

See the package tutorial for a demonstration of how to estimate bias from control samples with known composition (i.e., mock community samples), and how to calibrate the relative abundances in unknown samples of the taxa that were in the controls.

The primary utility of this package is quantitatively estimating the bias of protocols in quality control experiments, where samples with known composition are measured or samples with unknown composition are measured by multiple protocols.

It is currently not possible to calibrate the composition of a natural community without making strong and untested assumptions about bias being the same for constructed and natural samples and about the efficiencies of taxa not in the controls (e.g., approximating them by that of the closest relative or the average efficiency). For this and other limitations described in the Discussion of our manuscript, calibration as a practical method to obtain quantitatively accurate composition measurements is not currently feasible using this or any package. However, calibration using a hypothesized bias (perhaps partially informed by experimental measurement) can still be useful to analyze the sensitivity of downstream results to bias, a use case we will illustrate in a future vignette.

metacal's People

Contributors

mikemc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

metacal's Issues

Can I use this package for correcting batch effect using technical replicates?

Hi,

I came across your package through an issue (batch effect) that I was trying to solve with Benjamin Callahan (dada2) - benjjneb/dada2#876

Benjamin suggested that I check metacal. I don't have mock communities but I did add technical replicates in the different sequencing runs. Based on the description of the package, I was left with the impression that I could use this package with the technical replicates, but from the tutorial, it is not clear.

Can you please clarify if the technical replicates are useful for the usage of this package?

Thanks

Add `estimate_bias()` and `calibrate()` functions that work with phyloseq objects

Idea is to create a higher-level interface for estimating bias using data already stored in a phyloseq object.

Bias estimation. User has a phyloseq object or otu_table object that contains the observed abundances for target + control samples, or just the control samples, and an otu_table object that contains the actual abundances for the control samples. This function returns an estimate of bias, perhaps w/ bootstrap replicates and standard errors.

Calibration. The user supplies a phyloseq object or otu_table of observed abundances, and an estimated bias vector. Returns a modified phyloseq object with calibrated abundances.

Eventually, when the new function for doing differential bias estimation via compositional regression is implemented, should also support that

allow calibrate() to work with mc_bias_fit objects

How it should work: If the 'bias' argument of calibate() is an 'mc_bias_fit' object, then the estimated bias should be used for the calibration. An additional feature could be to have an option to use the bootreps, to return an array calibrated by the bootreps

Should we force setting the 'type' argument in `mean_efficiency()`?

As I've been working with the new mean_efficiency() function with phyloseq objects of observed read counts, I have several times forgotten to set type = 'observed', creating some confusing results that took some time to debug. It might be best to remove the default value of type = 'actual' so that the user is always forced to specify (except when calling on an 'mc_bias_fit' object).

`center()` fails when `.data` doesn't have rownames

I think center() should be able to handle this case; rows correspond to samples and there is no need for the samples to have names for computing the center to make sense. Note, the function works fine w/o column names if enframe is not used.

Z <- matrix(c(
        NaN, 1, 3.5, 
        -1, NaN, 4,
        -2, 3, NaN,
        -1, 2, 3), ncol = 3, byrow = T)
# colnames(Z) <- paste0("T", 1:3)
# rownames(Z) <- paste0("S", 1:4)
metacal::center(Z, in_scale = "log")
#> Object passed to `as_tibble()` must have row names if the `rownames`
#> argument is set.

Created on 2019-08-06 by the reprex package (v0.3.0)

Create function to facilitate performing calibration from a reference species

It might be convenient to have a function that performs a simple plug-in approach to the the 'reference-species' approach to calibration described by https://github.com/mikemc/differential-abundance-theory. In its simplest form, the function simply needs to take an 'observed' matrix and a set of reference measurements for 1 or more species, and it can calibrate all species in observed by multiplying by the geometric mean of the reference measurements divided by the observed measurements for those species. However, since I don't necessarily recommend such a non-statistical approach except for exploration and demonstration, it might be better to instead just make a function that facilitates applying sample-specific normalizations - essentially, an easier to use version of 'sweep()'. This function would allow any type of normalization, including to the total abundance (as in so called 'quantitative microbiome profiling')

Installation from github using devtools, not working

Hi Mike,

tried installing the way it is mentioned in the readme.md using devtools. Following was the error.

> devtools::install_github("mikemc/metacal")
Error: Failed to install 'unknown package' from GitHub:
  HTTP error 404.
  No commit found for the ref master

  Did you spell the repo owner (`mikemc`) and repo name (`metacal`) correctly?
  - If spelling is correct, check that you have the required permissions to access the repo.

I installed it using the source code and installation was successful.

Cheers !!!
Anubhav

`build_matrix()` behaves incorrectly on grouped tibbles

E.g.

tb %>%
  group_by(var1, var2) %>%
  summarize_at("var3", sum) %>%
  build_matrix(var1, var2, var3)

First time noticing this bug, I got a message about the grouped row var1 being added, and the elements (var3) being coerced from numeric to characters.

Allow `estimate_bias()` to take "observed" objects with extra samples

Use case: You have a full OTU table with natural and mock samples; you want to estimate bias from just the mock samples. As long as the sample names match with actual, we can just subset the samples rather than making the user do this first. As long as all the samples in actual are found in observed then we can be pretty confident this is good input and can proceed. I think we should still throw an error if there are samples in actual that are not in observed.

Comments on the package

I came here because this post.

I don't know where (if) you plan to submit this to CRAN or Bioconductor. I would recommend Bioconductor for the topic of the package. But in that case you'll get a more through review if you submit to one of these repositories.

In any case, it seems that the package doesn't work well with other packages like phyloseq, or metagenomeSeq, or with other useful classes like SummarizedExperiment (used in Bioconductor to store data about a sequencing experiment). Doing so would help to use the package in existing pipelines/scripts.

Some functions would need more documentation of the parameters that they need and have some examples (at least that is a requirement for Bioconductor packages).

To get the error matrix, it would be perfect if we could distinguish what type of NA is a 0/0 (which imho for the purpose of the error matrix it should be then 0) or a 500/0.

In the vignette it is clearly explained how does the package work. It would be interesting to know how to use this information in other downstream analysis. Also it focus a lot on the tidy data frames, which might reduce the memory footprint of the data if it is very sparse but there are other solutions like data.table or Matrix, so I'm not sure if such an extensive space should be given to it in the tutorial.
The vignette focus on the error matrix and estimating bias, but I couldn't find any function to do it.

I've seen the tests and they should be more minimal, include just the data and the tests (you can create and have data just for tests). But at the same time it should test more than just the center function.

Many thanks for tacking the effort to create this nice package. I'm sure it will be very well received by the community.

Create a function to facilitate mean efficiency computations

Ideda: Have a mean efficiency function that acts on mc_bias_fit objects or bias vectors + actual and/or observed matrices and return the mean efficiency in each sample.

The mean efficiency for a set of samples in a matrix with samples as rows can be calculated by first normalizing to proportions and then doing perturb(mat, bias, margin = 1, norm = 'none') %>% rowSums; or we could compute weighted means with apply() and weighted.mean()

Taxa names sometimes lost when using `pairwise_ratios()`

The problem seems to arise specifically when there are only two taxa in the original phyloseq object (so just one taxon in the result) and ratios are computed on taxa prior to being computed on samples.
This suggests the problem arises when ratios are computed on samples and there is just one taxon.

library(phyloseq)
library(magrittr)
data(enterotype)
p <- enterotype %>%
  prune_taxa(taxa_names(.)[2:3], .)
p %>%
  metacal::pairwise_ratios("taxa") %>%
  metacal::pairwise_ratios("samples", filter = FALSE) %>%
  taxa_names
#> [1] "sp1"
p %>%
  metacal::pairwise_ratios("samples", filter = FALSE) %>%
  metacal::pairwise_ratios("taxa") %>%
  taxa_names
#> [1] "Bacteria:Prosthecochloris"

Created on 2021-06-15 by the reprex package (v2.0.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.