diazrenata / scads Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 113.84 MB

Statistically constrained abundance distributions

License: MIT License

R 97.25% Shell 2.75%

scads's People

Contributors

scads's Issues

Positions of common approximations within FS

A number of functions are popular for fitting the SAD, but it's not clear how much of the support for them comes from the fact that they generate hollow curves vs. they meaningfully predict observed vectors above and beyond the constraint imposed by the feasible set.

The logic here has some nuance/chirality to it, and probably needs more thought. But:

Where do vectors drawn from the fitted distribution (lognormal, geometric, etc), that have been constrained/selected to have the correct S and N, fall in the feasible set compared to empirical distributions?

This has some nuance to it because constraining the samples to fall within the feasible set (have the right S and N) may drag them away from what is likely for the function. An alternative might be to calculate the likelihood of each of the FS samples | the function, and see if the empirical vector has an especially high likelihood compared to the builk of the FS.

Other formulas for skewness

e1071 calculates via 3 formulas. Try the other ones and see if you get the same results.

Comparing coefficients

Am I understanding the euclidean distance piece correctly? Would it make more sense to do boxplots for each coefficient, or some kind of multivariate/ordination thing?

Non-biological data

Linux distros

https://onlinelibrary.wiley.com/doi/epdf/10.1111/ecog.03424 Is this data available? Would be super fun.

https://zenodo.org/record/1120445

"Economic & geological"

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112850

Sampling limit?

What is the (real) sampling limit? So far I have been running up against my own patience + disc space in my personal hpg division.

Magnitude of skewness differences

The skewness percentile effectively gives a P-value-like consistent-ness of difference. It would be nice to be able to describe the magnitude of difference.

Current idea: The difference between the actual vector and the 1:1 line? It should always be lower than the 1:1 line, so there's no issue of absolute values.

Map the fs space

Especially as you pull more distinct samples from the feasible set, you're not mapping what is likely so much as what is possible; all possible draws from any other distribution must be within the feasible set.

That said, especially as S and N get big and the fs balloons, it should take a while to happen on a really weird one (for example, flat)

Toy incorrect SADs

Do skewness and simpsons give convergent results?

See issue 5 in scadsplants: diazrenata/scadsplants#5. Carry this forward in future analyses.

Behaviors at range of S*N

Behavior at small S, small N, small N/S, and then increasing:

number of unique elements (of, say, 10000 draws)
violins(?) of skewness, Shannon, Simpson
heatmaps(?)

Dimensions within:

For small S & N, all possible combinations for which N > S:
- S <- c(2:10)
- N <- c(2:19, seq(20, 200, by = 10))
To explore the broader space, combinations of:
- Large S, moderate N:
  - S <- c(10, 20, 30, seq(50, 250, by = 50))
  - N <- seq(50, 250, by = 50)
- Moderate S, large N:
  - S <- c(seq(5, 40, by = 5), 50, 75)
  - N <- c(seq(500, 2500, by = 500), 5000, 7500)
  - Exclude combinations of S > 40, N > 2500.

identifiability?

pca

Compare all the models

Skewness and singletons

Empirical datasets are most likely to have errors in the precise nb of v rare species (veil line-ish stuff)
Does fuzzying the rare tail of an empirical SAD change its position in the feasible set?

So to do this you would generate fs's for the observed vector > calculate skewnesses > calculate obs %ile; then do the entire process on vectors that have the same values for the abundant species but have singletons added or removed, and see if the percentile changes.

Maximum S and N

What's the maximum S and N I can calculate a p table for in R?

Approximate S and N of datasets?

Download the data from Xiao (White?) et al 2012 and see what the approximate maximum S and N are.

Evaluate historical predictions relative to FS

Closely related to #17.

See theories and predicted distributions collected https://onlinelibrary.wiley.com/doi/full/10.1111/j.1461-0248.2007.01094.x. How close are the predictions to the empirical in terms of their positions within the feasible set?

This gets complex because you're kind of mixing approaches. Many of these predictions are idealized distributions, not actual vectors of counts. Most draws from some of these distributions don't sum to the state variables.

I have previously tried evaluating the likelihood of all samples from the feasible set, from the predicted distribution, and then seeing if the empirical vector has the highest likelihood. I think this tells us if the predicted distribution is pointing at hollow curves in the feasible set generally, or at specifically the empirical hollow curve. But note that winning in this sense is more like a Venn diagram...the empirical is at the intersection of things-predicted-by-the-distribution and the feasible set.

Data foraging

NEON:

Plants
Ground beetles?
* Aquatic plants Too many unidentified
* Aquatic macroinverts? "identified to lowest practical taxon...genus or species" seems likely to muddy the water
Fish
Smammals