Giter VIP home page Giter VIP logo

biocore / gemelli Goto Github PK

View Code? Open in Web Editor NEW
65.0 7.0 17.0 124.85 MB

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.

License: BSD 3-Clause "New" or "Revised" License

Python 80.17% Makefile 0.01% TeX 0.15% HTML 4.16% JavaScript 14.57% CSS 0.64% Shell 0.29%
rpca ctf qiime2 rclr compositional-data-analysis microbiome joint-rpca tempted

gemelli's Introduction

Gemelli

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.

RPCA can be used on cross-sectional datasets where each subject is sampled only once. CTF can be used on repeated-measure data where each subject is sampled multiple times (e.g. longitudinal sampling). TEMPTED is specifically designed for longitundal (time series) repeated measure studies, especially when samples are irregularly sampled across subjects. Joint-RPCA allows for the exploration of multiple omics datasets with shared samples at once. All these methods are unsupervised and aim to describe sample/subject variation and the biological features that separate them.

The preprocessing transform for both RPCA and CTF is the robust centered log-ratio transform (rlcr) which accounts for sparse data (i.e. many missing/zero values). Details on the rclr can be found here and a interactive introduction into the transformation can be found here. In short, the rclr log transforms the observed (nonzero) values before centering. RPCA and CTF then perform a matrix or tensor factorization on only the observed values after rclr transformation, similar to Aitchison PCA performed on dense data. If the data also has an associated phylogeny it can be incorporated through the phylogenetic rclr, details can be found here.

Installation

To install the most up to date version of gemelli, run the following command

# pip (only supported for QIIME2 >= 2018.8)
pip install gemelli

Note: that gemelli is not compatible with python 2, and is compatible with Python 3.4 or later.

Documentation

Gemelli can be run standalone or through QIIME2 and as a python API or CLI.

Cross-sectional / multi-omics study (i.e. one sample per subject) with RPCA

If you have a cross-sectional study design with only one sample per subject then RPCA is the appropriate method to use in gemelli. For examples of using RPCA we provide tutorials below exploring the microbiome between body sites.

Joint-RPCA allows for the exploration of those feature that seperate jointly across sample groupings and the potential interactions of those features.

Tutorials

Tutorials with QIIME2

Standalone tutorial outside of QIIME2

Repeated measures study (i.e. multiple sample per subject) with CTF & TEMPTED

Tutorials

If you have a repeated measures study design with multiple samples per subject over time or space then CTF is the appropriate method to use in gemelli. For optimal results CTF requires samples for each subject in each time or space measurement. If that is not the case and your study has irregular time sampling, then TEMPTED should be used. TEMPTED also allows for the projection of new data into an existing factorization which is necessary for machine learning. For examples, explore the tutorials below.

Tutorials with QIIME2

Standalone tutorial outside of QIIME2

Performing parameter optimization and QC on results

For an introduction to these QC methods see the tutorial here. Examples are also provided in the RPCA tutorials here (RPCA QIIME2 CLI) & here (RPCA Python API & CLI). Users are encrouaged to report the QC/CV results for thier data.

Citations

If you found this tool useful please cite the method(s) you used:

Citation for CTF

Martino, C. and Shenhav, L. et al. Context-aware dimensionality reduction deconvolutes gut microbial community dynamics. Nat. Biotechnol. (2020) doi:10.1038/s41587-020-0660-7
@article {Martino2020,
	author = {Martino, Cameron and Shenhav, Liat and Marotz, Clarisse A and Armstrong, George and McDonald, Daniel and V{\'a}zquez-Baeza, Yoshiki and Morton, James T and Jiang, Lingjing and Dominguez-Bello, Maria Gloria and Swafford, Austin D and Halperin, Eran and Knight, Rob},
	title = {Context-aware dimensionality reduction deconvolutes gut microbial community dynamics},
	year = {2020},
	journal = {Nature biotechnology},
}

Citation for RPCA

Martino, C. et al. A Novel Sparse Compositional Technique Reveals Microbial Perturbations. mSystems 4, (2019)
@article {Martino2019,
	author = {Martino, Cameron and Morton, James T. and Marotz, Clarisse A. and Thompson, Luke R. and Tripathi, Anupriya and Knight, Rob and Zengler, Karsten},
	editor = {Neufeld, Josh D.},
	title = {A Novel Sparse Compositional Technique Reveals Microbial Perturbations},
	volume = {4},
	number = {1},
	elocation-id = {e00016-19},
	year = {2019},
	doi = {10.1128/mSystems.00016-19},
	publisher = {American Society for Microbiology Journals},
	URL = {https://msystems.asm.org/content/4/1/e00016-19},
	eprint = {https://msystems.asm.org/content/4/1/e00016-19.full.pdf},
	journal = {mSystems}
}

Citation for Phylogenetic RPCA

Martino, C. et al. A Novel Sparse Compositional Technique Reveals Microbial Perturbations. mSystems 4, (2019)
@ARTICLE{Martino2022,
  author = {Martino, Cameron and McDonald, Daniel and Cantrell, Kalen and
            Dilmore, Amanda Hazel and Vázquez-Baeza, Yoshiki and Shenhav,
            Liat and Shaffer, Justin P and Rahman, Gibraan and Armstrong,
            George and Allaband, Celeste and Song, Se Jin and Knight, Rob},
  title = {Compositionally Aware Phylogenetic {Beta-Diversity} Measures
           Better Resolve Microbiomes Associated with Phenotype},
  volume = {7},
  number = {3},
  elocation-id = {e0005022},
  year =  {2022},
  doi = {10.1128/msystems.00050-22},
  publisher = {American Society for Microbiology Journals},
  URL = {http://dx.doi.org/10.1128/msystems.00050-22},
  journal = {mSystems},
}

Citation for TEMPTED

Shi, p. et al. Time-Informed Dimensionality Reduction for Longitudinal Microbiome Studies. bioRxiv, (2023)
@ARTICLE{Shi2023,
  author = {Shi, Pixu and Martino, Cameron and Han, Rungang and Janssen,
            Stefan and Buck, Gregory and Serrano, Myrna and Owzar, Kouros and
            Knight, Rob and Shenhav, Liat and Zhang, Anru R},
  title = {{Time-Informed} Dimensionality Reduction for Longitudinal
           Microbiome Studies},
  year =  {2023},
  doi = {10.1101/2023.07.26.550749},
  URL = {https://www.biorxiv.org/content/10.1101/2023.07.26.550749v1},
  journal = {bioRxiv},
}

Other Resources

gemelli's People

Contributors

ahdilmore avatar cameronmartino avatar gibsramen avatar gwarmstrong avatar kwcantrell avatar lisa55asil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

gemelli's Issues

Tying features to modalities in joint-RPCA

Is there a way to easily tie which feature came from which modality at the moment? As far as I can tell, the current implementation stores all features across tables in an OrdinationResults instance. You can certainly make a bespoke mapping of feature:table but I think it would be useful to incorporate this into the codebase. Specifically, this would be useful for visualization if, for example, you wanted to plot a network and color the nodes by 16S/18S/ITS/etc.

Are species scores appropriate to calculate when using RPCA using vegan's dbRDA+sppscores

Hey @cameronmartino,

In the past I've used QIIME 2 to create the species/sites biplot using gemelli's phylo-/RPCA. This biplot is of course unconstrained by env variables, which leaves much to desire, so I've been playing with importing these into R to work with RDA.

The choices are to either import the rclr transformed table (actually you can do this transform directly in vegan now) and use vegan::rda() or import the distance matrix and use dbrda. I'm inclined to use the latter approach as I'm guessing that the additional matrix completion layer gemelli adds does improve the signal. Is this first assumption right?

If so, then the 2nd challenge is that unlike the capscale function dbrda doesn't calculate species scores, you have to add those after using sppscores. There was a nice discussion about why this is the case here with regards to the limitations of calculating these species scores. One line worth highlighting here wrt to the scores:

It is strictly correct only with Euclidean distances and can be misleading with other distances (even metric ones)

From what I gather the phylo-/RPCA would be considered Euclidean and so ok to calculate species scores with. Is my speculation correct here, meaning we can technically get an RPCA triplot! or, is there something I'm missing here?

Thanks!

Update CI to GH actions

It looks like Gemelli is still using Travis, which has changed its free CI model, making it much less desirable. Much of Biocore has migrated to GH actions. See here for a minimal-ish workflow config file.

Gemelli for spatial data

Hi,

Sorry if this is a naive question but i'm not familiar with the methods used in gemelli and having a bit of trouble understanding how exactly the method works.

I have some microbiome data measured across space, but only along one axis i.e. the sampling points vary along the longitudinal axis but not laterally. Thus this dataset is essentially 1D in that sense.

In this case, would roughly following the CTF tutorial be appropriate for my dataset and allow me to see changes between conditions across space rather than time?

thanks,

Adam

Adding min-feature-frequency to CTF

A convenient thing, I noticed that with auto-rpca one can filter based on min-feature-frequency which is very useful, but this isn't available for ctf. Would be handy if all filtering options were available for all gemelli commands. I'm on version 0.0.7.

Misorder labels from contructed tensor after tensor factorization

Hey @cameronmartino, @gwarmstrong,
Thanks for this awesome tool.

I was concerned about the sorting procedure for loadings when fitting tensor factorization, would it cause misorder of loading labels in the label step?

Codes of sorting procedure excerpted from factorization.py line 219 - 224 in _fit function

# save array of loadings for subjects
self.subjects = loads[0].copy()
self.subjects = self.subjects[self.subjects[:, 0].argsort()]
# save array of loadings for features
self.features = loads[1].copy()
self.features = self.features[self.features[:, 0].argsort()]

Codes of labeling step excerpted from factorization.py since line 335 in label function

# DataFrame single non-condition dependent loadings
self.subjects = pd.DataFrame(self.subjects,
                             columns=self.biplot_labels,
                             index=construct.subject_order)  # self.subjects reordered, but construct.subject_order didn't
self.features = pd.DataFrame(self.features,
                             columns=self.biplot_labels,
                             index=construct.feature_order)  # self.features reordered, but construct.feature_order didn't
......

Memory usage problems

I'm trying to use gemelli to generate a distance matrix on a large, sparse ASV table that has 97791 ASVs and 285 samples. I have a couple of larger datasets that I'd also like to analyze to get the rclr-based Aitchison distances. Unfortunately, this dataset is seg faulting when I give it 98 GB of RAM and the two other datasets throw an error indicating that numpy is unable to allocate 309 and 450 GB of space for the array which are on the order of 200k by 200k. I readily admit that I'm not 100% that I'm using gemelli correctly and wondering whether there's a way around the memory limitations to generate the distance matrix. Here's the script I'm running along with a compressed version of the data for the 97791 ASV by 285 sample dataset. Any help you can offer would be great.

#!/usr/bin/env python3

import numpy as np
from biom import Table
import pandas as pd
from gemelli.preprocessing import rclr_transformation
from gemelli.rpca import rpca

np.seterr(divide = 'ignore') # for rclr

df = pd.read_csv("data.gemelli.tsv", sep = "\t")

# convert to biom-formatted file
biom = Table(df.to_numpy(),
             df.index,
             df.columns) 

ordination, distance_matrix = rpca(biom) 

distance_matrix.to_data_frame().to_csv("data.gemelli.dist", sep='\t')

data.gemelli.tsv.gz

Baseline correction in tensor factorisation.

Hello, I have a dataset with timepoints and would like to apply Tensor factorisation.

I would like to aubstract the baseline in clr space in order to normalize the inter individual differences.

How would I do this with gemelli?

Qiime2 Phylo-RPCA ValueError

Hello,

Firstly, thanks for making such a cool statistical test!
Unfortunately, I'm having some troubles running the phylo-RPCA.

I'm importing my data from DADA2 that's been run through R and have followed KeyError when attempting phylo-RPCA with Gemelli. I've specifically looked into the qza artifacts so I don't think its a formatting error (but to be fair I am new to qiime2). Eitherway, when I run the phylo-RPCA I get the following Error:

Plugin error from gemelli:

  No requested tips found

Debug info has been saved to /tmp/qiime2-q2cli-err-a24gvsx5.log

The Debug info is as follows

Traceback (most recent call last):
  File "/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/commands.py", line 520, in __call__
    results = self._execute_action(
  File "/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/commands.py", line 581, in _execute_action
    results = action(**arguments)
  File "<decorator-gen-46>", line 2, in phylogenetic_rpca_with_taxonomy
  File "/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
    outputs = self._callable_executor_(
  File "/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/sdk/action.py", line 566, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/gemelli/rpca.py", line 82, in phylogenetic_rpca_with_taxonomy
    output = phylogenetic_rpca(table=table,
  File "/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/gemelli/rpca.py", line 252, in phylogenetic_rpca
    phylogeny = bp_read_phylogeny(table,
  File "/home/aball/anaconda/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/gemelli/preprocessing.py", line 270, in bp_read_phylogeny
    phylogeny = phylogeny.shear(names_to_keep).collapse()
  File "bp/_bp.pyx", line 758, in bp._bp.BP.shear
  File "bp/_bp.pyx", line 800, in bp._bp.BP.shear
ValueError: No requested tips found

I've found a similar error in Greengenes2 taxonomy's forums, but I didn't see an obvious connection to this code.

Here are the various qiime2 artifacts i'm using:
Qiime2inputartifacts.zip

and the qiime2 code

# For the phylogenetic tree
qiime phylogeny align-to-tree-mafft-fasttree \
 --i-sequences rep-seqs.qza \
 --o-alignment aligned-rep-seqs.qza \
 --o-masked-alignment masked-aligned-rep-seqs.qza \
 --o-tree unrooted-tree.qza \
 --o-rooted-tree rooted-tree.qza \
 --p-n-threads 14

#for the phylo-RPCA
qiime gemelli phylogenetic-rpca-with-taxonomy \
    --i-table ts_feature-table.qza \
    --i-phylogeny rooted-tree.qza \
    --m-taxonomy-file taxonomy.qza \
    --p-min-feature-count 10 \
    --p-min-sample-count 500 \
    --o-biplot phylo-ordination.qza \
    --o-distance-matrix phylo-distance.qza \
    --o-counts-by-node-tree phylo-tree.qza \
    --o-counts-by-node phylo-table.qza \
    --o-t2t-taxonomy phylo-taxonomy.qza

Thanks in advance!
Angus

Qiime2 Version - not able to do multiple states

Working on doing CTF using the qiime2 CLI, but got an error I don't quite understand.

!qiime gemelli ctf \
    --i-table  ../data/Microbiome/pool_filtered/RPCA/gemelli_filtered.qza \
    --m-sample-metadata-file ../data/Microbiome/14577_fixed.txt \
    --m-feature-metadata-file ../../shotgun_scripts/woltka_v2_taxonomy.qza \
    --p-state-column timepoint_number \
    --p-state-column timepoint_group \
    --p-individual-id-column marmoset_id \
    --output-dir ../data/Microbiome/pool_filtered/ctf-results

Error: (1/1?) Option '--p-state-column' was specified multiple times in the command.

But...

 --p-state-column TEXT  Metadata column containing state (e.g.,Time,
                         BodySite) across which samples are paired. At least
                         one is required but up to four are allowed by other
                         state inputs.                              [required]

So how do I specify more than one state?
--p-state-column timepoint_number,timepoint_group \
also did not work.

@gibsramen dug into this a bit, but sounds like it might not actually allow multiple states

Installing the standalone version without qiime2

Hello Cameron, I am trying to use CTF on some analysis and I am trying to install it on a fresh python3 environment without qiime2. I used the pip install gemelli command, however, the installation was halted due to some errors, which appeared to be attributed missing packages, numpy and cython. After manually installing these two packages (use the qiime2 channel), the installation went through. Perhaps, you could check why numpy and cython were not automatically downloaded and installed? Thanks! -Jincheng

Issue with citations.bib when pip installing into Qiime2 (with solution)

After installing gemellie in my Qiime2 2019.7 environment via the following commands, I got the error message below:
Steps to reproduce:
conda activate qiime2-2019.7
pip install gemelli
qiime gemelli --help
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Traceback (most recent call last):
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/bin/qiime", line 11, in
sys.exit(qiime())
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 1132, in invoke
cmd_name, cmd, args = self.resolve_command(ctx, args)
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/click/core.py", line 1171, in resolve_command
cmd = self.get_command(ctx, cmd_name)
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/commands.py", line 101, in get_command
plugin = self._plugin_lookup[name]
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/commands.py", line 77, in _plugin_lookup
import q2cli.core.cache
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/core/cache.py", line 403, in
CACHE = DeploymentCache()
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/core/cache.py", line 61, in init
self._state = self._get_cached_state(refresh=refresh)
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/core/cache.py", line 107, in _get_cached_state
self._cache_current_state(current_requirements)
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/core/cache.py", line 200, in _cache_current_state
state = self._get_current_state()
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/core/cache.py", line 238, in _get_current_state
plugin_manager = qiime2.sdk.PluginManager()
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 44, in new
self._init()
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py", line 59, in _init
plugin = entry_point.load()
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/pkg_resources/init.py", line 2322, in load
return self.resolve()
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/pkg_resources/init.py", line 2328, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/gemelli/q2/plugin_setup.py", line 50, in
'citations.bib', package='gemelli')
File "/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/cite.py", line 30, in load
with open(path) as fh:
FileNotFoundError: [Errno 2] No such file or directory: '/home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/gemelli/citations.bib'

The solution was to do:
wget -P /home/adswafford/miniconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/gemelli/ https://raw.githubusercontent.com/cameronmartino/gemelli/master/gemelli/citations.bib

Add % var explained in trajectory plots for q2

This may be a bit difficult because we would like to have the percent var. explained on the axis for the volatility plot -- while still specifying it in the command.

For example in the tutorial we have:

qiime longitudinal volatility \
   ...
    --p-default-metric PC1 \
   ...

but it would be difficult to have this as "PC1 (#%)" so default % is removed. This may be helpful to build intro the trajectory types that could be a mix between Metadata and Ordination types.

ValueError: No more features left. Check to make sure that the sample names between `sample-metadata` and `table` are consistent

Running the stand-alone version of gemelli on the example data used in the tutorial I get the error ValueError: No more features left. Check to make sure that the sample names between sample-metadataandtable are consistent

As I'm not a Python person, I filter the example data in R.

mdat <- read.table("IBD-2538/data/metadata.tsv", sep='\t', header=T) # nrow(mdat) 516
ftbl <- biomformat::read_biom("IBD-2538/data/table.biom")
ftbl <- as(biomformat::biom_data(ftbl), "matrix") # ncol(ftbl) 470

mdat <- mdat %>% filter(sample_name %in% colnames(ftbl))
rownames(mdat) <- mdat $sample_name

ps <- phyloseq(otu_table(ftbl, taxa_are_rows=T),
                   sample_data(mdat))
# here I skip adding the taxonomy

ps <- metagMisc::phyloseq_filter_prevalence(ps, prev.trh=0.2, abund.trh=10, abund.type="total", threshold_condition="AND")

> ps
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 236 taxa and 318 samples ]
sample_data() Sample Data:       [ 318 samples by 128 sample variables ]

# Do we need to filter to only keep subjects with >=t timepoints?

biomformat::write_biom(biomformat::make_biom(t(otu_table(ps))), "table_filt.biom")
write.table(sample_data(ps), "metadata_filt.txt", sep="\t", quote=F)

Having made sure that samples match between the feature table and the metadata (plus filtered the our rare stuff), I run gemelli and get the following error

gemelli \
--in-biom table_filt.biom \
--sample-metadata-file metadata_filt.txt \
--individual-id-column 'host_subject_id' \
--state-column-1 'timepoint' \
--output-dir results      

Traceback (most recent call last):
  File "/Users/johannesbjork/python/miniconda3/bin/gemelli", line 8, in <module>
    sys.exit(standalone_ctf())
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/gemelli/scripts/_standalone_ctf.py", line 131, in standalone_ctf
    feature_metadata)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/gemelli/ctf.py", line 97, in ctf_helper
    raise ValueError(("No more features left.  Check to make sure that "
ValueError: No more features left.  Check to make sure that the sample names between `sample-metadata` and `table` are consistent

Validate links from tutorials

There are a number of static links scattered in #30 , after the PR is merged into master we should go through and double check that all of them work.

ctf won't work if metadata file has extra samples

Hey @cameronmartino,

A little feature-request:

QIIME 2 Plugin 'gemelli' version 0.0.8 (from package 'gemelli' version 0.0.8)
q2cli version 2021.8.0

In the example below, my table ends up with 5 fewer samples than that in the sample-metadata file after filtering based on min-sample-count

!qiime gemelli ctf \
  --i-table table.qza \
  --m-sample-metadata-file ../clean_metadata.tsv \
  --p-state-column stage_char \
  --p-min-sample-count 5000 \
  --p-individual-id-column host_subject_id \
  --output-dir gemelli/ctf-results \
  --verbose 

And so I get the following error:

/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/gemelli/preprocessing.py:884: RuntimeWarning: Subject(s) (131-05,131-09,131-15,131-12,131-07,131-13,131-02,131-08,131-06,131-11) contains multiple samples. Multiple subject counts will be meaned across samples by subject.
  warnings.warn(''.join(["Subject(s) (", str(duplicated_ids),
Traceback (most recent call last):
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-564>", line 2, in ctf
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/qiime2/sdk/action.py", line 391, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/gemelli/ctf.py", line 620, in ctf
    helper_results = ctf_helper(table,
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/gemelli/ctf.py", line 669, in ctf_helper
    tensal_results = tensals_helper(table,
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/gemelli/ctf.py", line 799, in tensals_helper
    tensor.construct(table, sample_metadata,
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/gemelli/preprocessing.py", line 855, in construct
    self._construct()
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/gemelli/preprocessing.py", line 891, in _construct
    table[dup[0]] = table.loc[:, dup].mean(axis=1).astype(int)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 889, in __getitem__
    return self._getitem_tuple(key)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 1069, in _getitem_tuple
    return self._getitem_tuple_same_dim(tup)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 775, in _getitem_tuple_same_dim
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 1113, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 1053, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "/home/mestaki/miniconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/pandas/core/indexing.py", line 1321, in _validate_read_indexer
    raise KeyError(
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['14010.131.08.1SC.swab.28'], dtype='object'). See [https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike%22)

Plugin error from gemelli:

  "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['14010.131.08.1SC.swab.28'], dtype='object'). See [https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike%22)

See above for debug info.

The issue is resolved if I manually remove those 5 samples from my metadata beforehand.

Would be nice if gemelli could do a check/filter on the sample-metadata after filtering the input table, or, include an ignore-missing-samples option like in empress.

QIIME2 phylogenetic_rpca_with_taxonomy error

The phylogenetic_rpca_with_taxonomy is erroring out in the LCA phase. The standalone is working fine after exporting the same data so it is not a file formatting issue. Something to do with passing the data in QIIME2.

error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
</home/cmartino/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/decorator.py:decorator-gen-523> in phylogenetic_rpca_with_taxonomy(table, phylogeny, taxonomy, n_components, min_sample_count, min_feature_count, min_feature_frequency, min_depth, max_iterations)

~/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py in bound_callable(*args, **kwargs)
    238                 # Execute
    239                 outputs = self._callable_executor_(scope, callable_args,
--> 240                                                    output_types, provenance)
    241 
    242                 if len(outputs) != len(self.signature.outputs):

~/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py in _callable_executor_(self, scope, view_args, output_types, provenance)
    381 
    382     def _callable_executor_(self, scope, view_args, output_types, provenance):
--> 383         output_views = self._callable(**view_args)
    384         output_views = tuplize(output_views)
    385 

~/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/gemelli/rpca.py in phylogenetic_rpca_with_taxonomy(table, phylogeny, taxonomy, n_components, min_sample_count, min_feature_count, min_feature_frequency, min_depth, max_iterations)
     87                                min_feature_frequency=min_feature_frequency,
     88                                min_depth=min_depth,
---> 89                                max_iterations=max_iterations)
     90     ord_res, dist_res, phylogeny, counts_by_node, result_taxonomy = output
     91 

~/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/gemelli/rpca.py in phylogenetic_rpca(table, phylogeny, taxonomy, n_components, min_sample_count, min_feature_count, min_feature_frequency, min_depth, max_iterations)
    262     if taxonomy is not None:
    263         # collect taxonomic information for all tree nodes.
--> 264         traversed_taxonomy = retrieve_t2t_taxonomy(phylogeny, taxonomy)
    265         result_taxonomy = create_taxonomy_metadata(phylogeny,
    266                                                    traversed_taxonomy)

~/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/gemelli/preprocessing.py in retrieve_t2t_taxonomy(phylogeny, taxonomy)
     87     consensus_tree = phylogeny.copy()
     88     # validate and convert taxonomy into a StringIO stream
---> 89     consensus_map = _get_taxonomy_io_stream(taxonomy)
     90 
     91     tipname_map = nl.load_consensus_map(consensus_map, False)

~/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/gemelli/preprocessing.py in _get_taxonomy_io_stream(taxonomy)
    188                 tax_col_index = i
    189                 # ("col" has already been set to lowercase)
--> 190                 tax_col_name = taxonomy.columns[i]
    191             else:
    192                 # Error condition 1 -- multiple possible "taxonomy columns" :(

KeyError: 0

ValueError: cannot reindex from a duplicate axis (could use better error)

I've been playing around with CTF and have occasionally noticed the following error

Traceback (most recent call last):
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-538>", line 2, in ctf
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    output_types, provenance)
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/gemelli/ctf.py", line 40, in ctf
    feature_metadata)
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/gemelli/ctf.py", line 187, in ctf_helper
    straj = concat([straj.reindex(all_sample_metadata.index),
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/util/_decorators.py", line 221, in wrapper
    return func(*args, **kwargs)
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/core/frame.py", line 3976, in reindex
    return super().reindex(**kwargs)
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/core/generic.py", line 4514, in reindex
    axes, level, limit, tolerance, method, fill_value, copy
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/core/frame.py", line 3864, in _reindex_axes
    index, method, copy, level, fill_value, limit, tolerance
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/core/frame.py", line 3886, in _reindex_index
    allow_dups=False,
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/core/generic.py", line 4577, in _reindex_with_indexers
    copy=copy,
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1251, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/home/juermieboop/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3362, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

Turns out that the error pops up if there are duplicates in the state-column, and the error goes away of these duplicates are removed (i.e. if week 3 is repeated twice within the same host, it'll throw an error).

It could be nice if a more informative error was thrown.

possible numpy-related issue

A numpy-related failure was brought up in the context of DEICODE's rpca action on the QIIME 2 Forum here. @cameronmartino thought it might also impact gemelli, so I'm posting this here. See the forum post for the command and traceback.

Add multiple state column support to ctf.py

Currently, the ctf() function in the ctf.py script seems to only accept 1 state column. Just copy part of standalone_ctf codes should be able to fix the issue.
--Jincheng

Plugin error from gemelli: "['sample-id'] not found in axis"

Hi,
Please ignore if is not easy to follow as this is my first issue on this forum.

Files used for this analysis
Archive.zip

I am performing 16s rRNA analysis through qiime2. I want to study the longitudinal distribution of features over the period of sample collection (sampling-day). In my metadata file, I have 'sample-id' as first column however

when I run the following command for gemelli analysis
qiime gemelli ctf --i-table lab_wise_analysis/bacteriology/bacteriology_table.qza --m-sample-metadata-file lab_wise_analysis/bacteriology/metadata_bacteriology.tsv --m-feature-metadata-file taxonomy.qza --p-state-column sampling-day --p-individual-id-column sample-id --output-dir lab_wise_analysis/bacteriology/CTF-results-GEMELLI

it gives me following error:
Plugin error from gemelli:
"['sample-id'] not found in axis"
Debug info has been saved to /var/folders/31/7jhpcwd91kq6rmrbrjdqtvm40000gn/T/qiime2-q2cli-err-3f1eij0c.log

When I view the debug info, it says
_Traceback (most recent call last):
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in ctf
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 244, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/gemelli/ctf.py", line 30, in ctf
state_ordn, ord_res, dists, straj, ftraj = ctf_helper(table,
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/gemelli/ctf.py", line 71, in ctf_helper
all_sample_metadata = sample_metadata.drop(keep_cols, axis=1)
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/pandas/core/frame.py", line 4308, in drop
return super().drop(
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/pandas/core/generic.py", line 4153, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/pandas/core/generic.py", line 4188, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/pridelab/opt/miniconda3/envs/qiime2-2021.4/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5591, in drop
raise KeyError(f"{labels[mask]} not found in axis")


KeyError: "['sample-id'] not found in axis"_

Could someone please help as I am a beginner in this field.
Thank you

Error trying to do qc-rarefy

Okay, trying to run qc-rarefy in qiime2-2023.5 using CLI:

qiime gemelli qc-rarefy \
    --i-table feature-table.qza \
    --i-rarefied-distance RPCA-rarefy-distance.qza \
    --i-unrarefied-distance RPCA-dm.qza \
    --o-visualization RPCA-rarefy-qc.qzv

Get this Error:

 Plugin error from gemelli:

  [Errno 2] No such file or directory: '/Users/miniforge3/envs/qiime2-2023.5/lib/python3.8/site-packages/gemelli/q2/qc_assests/index.html'

Debug info has been saved to random.log

Log file:

Traceback (most recent call last):
  File "/Users/miniforge3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2cli/commands.py", line 468, in __call__
    results = action(**arguments)
  File "<decorator-gen-88>", line 2, in qc_rarefy
  File "/Users/miniforge3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
    outputs = self._callable_executor_(
  File "/Users/miniforge3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 558, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/Users/miniforge3/envs/qiime2-2023.5/lib/python3.8/site-packages/gemelli/q2/_visualizer.py", line 83, in qc_rarefy
    q2templates.render(index, output_dir, context=context)
  File "/Users/miniforge3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2templates/_templates.py", line 44, in render
    with open(source_file, 'r') as fh:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/miniforge3/envs/qiime2-2023.5/lib/python3.8/site-packages/gemelli/q2/qc_assests/index.html'

I tried to do pip install gemelli --upgrade again, in case I missed an update, but no luck.

misprint

hi, there's a simple misprint in the header to this section which I think should say multiple samples per subject

cheers

Error "ImportError: cannot import name 'ConstantInputWarning' from 'scipy.stats'"

Hi all,

I experienced a long list of error when going through the tutorial of q2-gemelli, using QIIME2 version = amplicon-2023.9
The error occurs when I ran:
qiime gemelli ctf \

The list of error is quite long, but ends with:
ImportError: cannot import name 'ConstantInputWarning' from 'scipy.stats' (/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/scipy/stats/__init__.py)

Since then, the error appears for every command in the Qiime2 environment, even for activating a new Qiime2 environment doesn't work. So I re-install Qiime2 environment, then install q2-gemelli, q2-qurro, and q2-deicode.
I repeat the command from tutorial after installing q2-gemelli, it works fine.

However, after I install q2-qurro, an error occur:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. umap-learn 0.5.4 requires tbb>=2019.0, which is not installed.
Since then, the situation loops again, the "ImportError: cannot import name 'ConstantInputWarning' from 'scipy.stats'" re-appears.

Any help would be highly appreciated~!
Thanks.

Here is the full list of the error:
Traceback (most recent call last): File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/util.py", line 279, in get_plugin_manager return qiime2.sdk.PluginManager.reuse_existing() File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/sdk/plugin_manager.py", line 58, in reuse_existing raise UninitializedPluginManagerError qiime2.sdk.plugin_manager.UninitializedPluginManagerError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/bin/qiime", line 11, in <module> sys.exit(qiime()) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/click/core.py", line 1682, in invoke cmd_name, cmd, args = self.resolve_command(ctx, args) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/click/core.py", line 1729, in resolve_command cmd = self.get_command(ctx, cmd_name) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/commands.py", line 100, in get_command plugin = self._plugin_lookup[name] File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/commands.py", line 76, in _plugin_lookup import q2cli.core.cache File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/core/cache.py", line 285, in <module> CACHE = DeploymentCache() File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/core/cache.py", line 61, in __init__ self._state = self._get_cached_state(refresh=refresh) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/core/cache.py", line 107, in _get_cached_state self._cache_current_state(current_requirements) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/core/cache.py", line 205, in _cache_current_state state = self._get_current_state() File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/core/cache.py", line 253, in _get_current_state plugin_manager = q2cli.util.get_plugin_manager() File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/q2cli/util.py", line 291, in get_plugin_manager return qiime2.sdk.PluginManager() File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/sdk/plugin_manager.py", line 67, in __new__ self._init(add_plugins=add_plugins) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/qiime2/sdk/plugin_manager.py", line 105, in _init plugin = entry_point.load() File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2518, in load return self.resolve() File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/pkg_resources/__init__.py", line 2524, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/deicode/q2/plugin_setup.py", line 12, in <module> from deicode.rpca import rpca, auto_rpca File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/deicode/rpca.py", line 2, in <module> import skbio File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/skbio/__init__.py", line 11, in <module> import skbio.io # noqa File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/skbio/io/__init__.py", line 248, in <module> import_module('skbio.io.format.lsmat') File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/skbio/io/format/lsmat.py", line 77, in <module> from skbio.stats.distance import DissimilarityMatrix, DistanceMatrix File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/skbio/stats/distance/__init__.py", line 197, in <module> from ._mantel import mantel, pwmantel File "/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/skbio/stats/distance/_mantel.py", line 16, in <module> from scipy.stats import ConstantInputWarning ImportError: cannot import name 'ConstantInputWarning' from 'scipy.stats' (/Users/ivanllampy/miniconda3/envs/qiime2-amplicon-2023.9/lib/python3.8/site-packages/scipy/stats/__init__.py) (qiime2-amplicon-2023.9)

How to access the phylogenetic_rpca object

In the 2022 paper, you show (in Fig1) the overview of the algorithm for phylo-RPCA, which generates a weighted, robust centered log-ratio table before robust aitchison PCA. Is there a way to access that table in gemelli?

won't allow bool metadata columns

If the metadata file has a column with bool entries, it returns an error:

There was an issue with viewing the artifact /projects/cmi_proj/seed_grants/conturie_charlotte/gemelli/output/state_subject_ordination.qza as QIIME 2 Metadata:

Metadata column 'dna_extracted' has an unsupported pandas dtype of bool. Supported dtypes: float, int, object

If you need more details, let me know!

Metadata state-column can't have a zero value or be non-numeric category

Hey @cameronmartino,
Finally getting around to trying out this awesome tool, huge congrats on the paper btw.

I was trying a dataset with paired-data with just 2 timepoints when I ran into a problem with my metadata state-column.
I tried 2 different columns that could have represented my states:
timepoint <- has numeric values of 0 and 6 which correspond to weeks, or
Est_status <- categorical values of Baseline, and 6mosEst

Both represent the same values one is numeric the other categorical. I'm on q2-2020.6, with q2-deicode: 0.2.4, and gemelli: 0.0.6 installed
When I run the below with either of those 2 columns:

!qiime gemelli ctf \
    --i-table table.qza \
    --m-sample-metadata-file pilot_metadata2.txt \
    --m-feature-metadata-file taxonomy_silva132.qza \
    --p-state-column Est_status \
    --p-individual-id-column Patient_ID \
    --output-dir gemelli/ \
    --verbose

I get the following error:

/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
Traceback (most recent call last):
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-555>", line 2, in ctf
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    output_types, provenance)
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/gemelli/ctf.py", line 40, in ctf
    feature_metadata)
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/gemelli/ctf.py", line 129, in ctf_helper
    n_initializations=n_initializations).fit(tensor_rclr(tensor.counts))
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/gemelli/factorization.py", line 171, in fit
    self._fit()
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/gemelli/factorization.py", line 214, in _fit
    fillna=self.fillna)
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/gemelli/factorization.py", line 534, in tenals
    if rank_estimate(obs_tmp, eps_tmp) >= (min(obs_tmp.shape) - 1):
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/gemelli/optspace.py", line 462, in rank_estimate
    r_one = np.argmin(cost)
  File "<__array_function__ internals>", line 6, in argmin
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1267, in argmin
    return _wrapfunc(a, 'argmin', axis=axis, out=out)
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/home/mestaki/miniconda3/envs/qiime2-2020.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 47, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: attempt to get argmin of an empty sequence

Plugin error from gemelli:

  attempt to get argmin of an empty sequence

See above for debug info.

If I add a random 3rd value in either of these columns it runs successfully. So can gammeli not be used on data with only 2 timepoints then? It doesn't fit in the cross-sectional version either, so this would be a big loss since many projects do use only before/after designs.

How to interpret the rank of features?

Is the rank of feature in CTF means the importance of features, just like rank in PCA?
In 2.2.0-compare-feature-ranks/1.0.0-rankings.ipynb, what is the reason of claiming 'pos in both is more vaginal and more neg is C-section'?
Thank you for your help!

tutorial fixes/improvements

  • Need to update tables from Qiita so that sample IDs can not be interpreted as numeric on import
  • Add a standalone tutorial (CLI)
  • Add a standalone tutorial (CLI) with R plotting/processing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.