Currently, TDAstats::calculate_homology only returns a matrix. When converted to a data frame with as.data.frame, users need to convert the dimension column to a factor (instead of numeric) to accurately plot colors in barcode (allow discrete colors instead of quantitative color spectrum). Easy fix would be to add parameter that returns a properly formatted data frame to users so that they don't have to do any extra steps.
Test output, e.g. from permutation_test(), currently returns a somewhat unwieldy list. I think the following methods would be useful to implement:
print() (see various methods for {stats} test output for inspiration)
summary(), possibly
tidy() and glance() from {generics}, as used in {broom} and its extensions
autoplot(), e.g. histograms of null samples to illustrate p-values
However, this requires that the tests return objects of some S3 class. This could be a new class or classes, or the existing 'htest' class---or possibly both, in case some tasks can be dispatched to 'htest' methods but others must be more specific.
Would it make sense to supplement the plot_*() shorthands with stat_*() and geom_*() layers that perform the transformation and visualization duties separately? This would allow users to
produce plots using recognizable and (more) customizable ggplot2 syntax
render alternative visualizations of birth–death PH data
visualize data generated by other means (e.g. via construction of a Čech complex) in persistence and barcode diagrams
Since the plot_*() functions don't require a specific class of data frames (just recognizable column names), this could, i think, be done without any changes to current functionality.
Since i use Mac OS X 10.9 on my laptop, i only have R version 3.3.2. As an experiment, i cloned this repo, changed the dependency to R (>= 3.3), and installed using devtools::install(). It worked fine, and i was able to work through all of the examples. Are there specific reasons for requiring version 3.4?
Need to adjust documentation to reflect above fact. Potentially add parameter in permutation test function to allow user to pick their own distance function (takes persistent homology of two datasets as parameters, returns numeric).
Thanks to @kisungyou for bringing this to my attention.
The following code chunk doesn't produce an error, but fails (for me) to produce the diagonal line in the persistence plot (i had trouble using reprex::reprex()):
Is the problem on my end? It prevents me from fully performing the visual comparison suggested in the "inference" vignette. I will try to reproduce it on a different machine tomorrow.
Hi! When I use calculate_homology over a graph with 7 vertices (for example) I only obtain 6 features at dimension 0 and that start with a filtration weight 0, why is that? Shouldn't be 7 features? I couldn't find the reason in your guidelines or vignettes.
Thanks!
Is it in the scope of this package to re-adjust the ripser source code to support other coefficients in a prime field?
As you know, this should simply involve replacing areas like e.g. here with the code compiled when the USE_COEFFICIENTS preprocessor variable is enabled in the ripser package.
Data frames are better for visualization (using the grammar of graphics system in ggplot2); should be a quick switch to return as a data frame instead of matrix.
Hi,
We can improve the vignettes.
Honestly , there are several parts I don't understand in the vignettes.
What is @roadmap-ph in the vignette "Introduction to persistent homology with TDAstats"?
I think it's kind for newcomers like me if you explain what is @roadmap-ph.
I listed the parts I don't understand below.
The parts I don't understand (click here)
The parts I don't understand in "Introduction to persistent homology with TDAstats"
@roadmap-ph
[@Rcpp-paper]
[@ggplot2-book]
The parts I don't understand in "Hypothesis testing with TDAstats"
@resampling-book
@hyptest
[@wasserstein-calc]
@resampling-book
By the way, references are missing in the vignettes, although there are sections for references in the
bottoms of the vignettes.
You should list some references or remove the sections for references.
Thanks!
Hello! I am using the phom.dist() function to compute the distance between persistence diagrams. Can you clarify what distance measure is being computed by this function? Is there a reference/citation/source for the distance measure being computed? I was under the impression phom.dist() returned the Wasserstein distance based on the function naming, but looking at a previous issue (#13) I see that that isn't the case.
Since persistence plots implicitly rely on a 1:1 aspect ratio for visual interpretability, would it be appropriate to include + ggplot2::coord_fixed(ratio = 1) to the 'ggplot' object returned by plot_persist()? (Certainly lower- and higher-persistence features are discriminable regardless of the aspect ratio, but for professional publications this has, to my knowledge, been the rule. And, like the other plot specs, this could be overridden by the user.)
The toy examples provided in with the package are helpful and wholly appropriate. The package would also benefit, i think, from an illustration using real-world data (as suggested but not required by the JOSS review checklist). Is there a dataset you've used for this purpose that could be included in the package and demonstrated either in the functional documentation or in a separate vignette? This isn't a sticking point for my review, but i do think the package would achieve its aims more effectively with such a case study.