statlab / permute Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 44.0 4.45 MB

permutation tests and confidence sets

Home Page: http://statlab.github.io/permute/

License: BSD 2-Clause "Simplified" License

Python 99.65% Makefile 0.35%

permute's People

Contributors

Stargazers

Watchers

permute's Issues

multiple testing, FDR

Implement multiplicity adjustments: Bonferroni, Bonferroni-Holm, Westphall-Young.
Also Sidak? Or should we omit methods that require independence or positive regression dependence?
Implement one or more FDR adjustment methods.
write a "wrapper" for multiplicity adjustment with different methods as parameters, or call different top-level functions?

Westphall-Young is close in structure to NPC: need to track individual permutations).

FDR, Bonferroni, Bonferroni-Holm operate on P-values.

Are the P values corrected?

Are the p values we get from two-sample (inside permute core) t-tests corrected for false positive?

Wrong implementation in "two_sample_shift" when shift != 0.

Implementation looks like

pot_outx = np.concatenate([x, y + shift])
# Potential outcomes for all units under control
pot_outy = np.concatenate([x - shift, y])
pot_out_all = np.column_stack([pot_outx, pot_outy])

dist[i] = tst_stat(pot_out_all [:nx, 0], pot_out_all [nx:, 1])

based on the above code, pot_out_all [:nx, 0] will give the x and pot_out_all [nx:, 1] will give the y. so we are checking x and y not x and y + shift or x-shift and y

Right now, t2p gets called Very Many times, each time requiring order B comparisons and a sum. I think a better approach would be to create, for each test in the combination, the entire vector of p-values at once, by sorting.

I think this is the performance bottleneck in npc().

Rats data example

I'm still on the hunt for the full rats cortical mass dataset, with three treatment groups instead of the two shown in FPP. This reference looks promising, but the data aren't included in the paper.

http://www.sciencedirect.com/science/article/pii/0031938468901613

irr todo items

split by attribute
split by video
sparse -> full
combination method? (NPC, n^{-1/2} weighting, something else)

Pass in RandomState object rather than seed

In the core file, permute_within_groups and permute_rows does this correctly, but there's no random state argument for binom_conf_interval and two_sample takes a seed as its input.

Hypergeometric confidence interval - lower limit too low?

I made up a toy example for writing tests and calculated what I think the confidence intervals should be. Every time I run the function, the lower limit of the confidence interval it gives me back is one less than the value I expected. I think it's coming from this chunk:

if alternative != "upper" and x > 0:
    f = lambda q: cl - hypergeom.cdf(x-1, N, q, n)
    ci_low = (brentq(f, 0.0, G, *kwargs))

because of the x-1 in the call to hypergeom.cdf. Why do we decrease the number of "good" things by 1 here? I realize the confidence intervals we get out of this will tend to be conservative so it's not necessarily bad, but I wonder if it's the "correct" thing to do.

dev notes

Need to start section in developer's guide on things to know:

np.random.seed vs. RandomState
shuffle vs. permutation
np.asarray? type-checking
optimize notes (e.g., brentq)
python 2/3 recommendations
timing issues

cryptorandom causes AppVeyor build failure

How to install cryptorandom (via pip or GitHub) on AppVeyor and ignore it in the conda install command?

review stratified w/ PS

Look over #FIXMES

IRR p-values when doing NPC

In some small examples, the p-values are behaving in ways we don't understand.

Should we do pre-processing to check if every rater gave a 0? What should the p-value/test statistic be in that case?

Look starting at line 262 in irr.py, starting with " if pvalues is None: "

"less than" alternative

Potential confusion I discovered. The first arg is the p-value, second is the test statistic. When we use the difference in means as the test stat, we expect to get back the difference in means. However you get back the negative when you specify alternative="less".

two_sample(x, y, stat="mean", alternative="less")
 (0.32283000000000001, 0.4590029743250863)

two_sample(x, y, stat="mean", alternative="greater")
(0.67778000000000005, -0.4590029743250863)

Reshape nsgk

1st dimension should be the 183 domain. 2nd dimension should be the video. Then rater, timestamp (to match input matrix for irr)

tests for core

consider moving other functions in as well.

version number

In [1]: import permute

In [2]: permute.version
Out[2]: 'unbuilt-dev'

Significant results when comparing the same sample

Hey,

First thanks for all the work you've done to deliver this answome library !

I just noticed a minor issue while using the same sample with permute.core.one_sample ,

import permute.core
import numpy as np
a = np.random.rand(200)
p_value = permute.core.one_sample(a,a, reps=10000, seed=42, alternative='two-sided')[0]
p_value
0.00019998000199983323

For sure means are the same and results should not be significant.

Maybe changing >= to > may solve the problem in :

hits = np.sum([(tst_fun(z * (1 - 2 * prng.binomial(1, .5, size=n)))) >= tst
                       for i in range(reps)])

Permute or shuffle?

We have the scaffolding up to run the permutation over rows. We need to decide on the optimal way to do this.

We want something in the global scope called "permute_rows" so that we only need to modify that function rather than each time it appears in our package. We need a "permute" module to handle all this.

permute_rows can be one of

for loop to iterate over rows and shuffle in place?
figure out how to implement apply_along_axis with shuffle?

tests for stratified

Two-sided p-values

We should be using 2*min(P(X <= x || H_0), P(X >= x || H_0)) instead of P(|X| >= |x| || H_0) for the two-sided p-value.

Revisit Pandas dependency

I am using Pandas for the CSV reader currently. Perhaps it is better to consider storing data as *npy instead.

CIs for shift: brentq for confidence intervals finds a root, not necessarily the largest or smallest root

For finding the endpoints of confidence intervals for a shift, the p-value will generally be constant over a range of values of the shift.

Brentq will find a legitimate CI, but the lower endpoint might be unnecessarily low and the upper endpoint might be unnecessarily high.

The solution would seem to be to use something other than a root-finder to find the endpoints, or--if we do use a root-finder--to then try marching up or down to see whether the p-value changes.

Unit tests don't run properly

I seem to be having issues running the tests that I write for the IRR simulation functions. Specifically, the output of the simulate_ts_dist and simulate_npc_dist functions don't match what is expected when I call "make test" at the command line.

We should figure out a work-around or fix whatever bug is causing this. It'd be nice not to have to skip tests because they always throw an error!

Constrained matrix permutation

To do

Literature search, this looks promising:
- C. J. Carstens, Proof of uniform sampling of binary matrices with fixed row sums and column sums for the fast Curveball algorithm. PHYSICAL REVIEW E 91, 042812 (2015)
- Snijders, T.A.B., Enumeration and simulation methods for 0-1 matrices with given marginals. Psychometrika, 56 (1991), 397-417. (this enumerates all possible matrices -- probably not what we want)
After finding a proof of uniformity we're happy with, code it up
Compare distribution of some test statistic between this method and @stefanv's code. Test statistic could be total number of pairs of sharks observed together

Conform to docstring standard

permute.py
irr.py

Skipping test: test_binom_conf_interval

Function is broken

Incorrect spearman correlation calculation (computation of rank)

I think your computation of spearman correlation is wrong. In your "spearman_corr" function you have:

xnew = np.argsort(x)+1
ynew = np.argsort(y)+1

However, argsort does not give your the actual rank of corresponding elements, just new indexing for sorting (these things are not equal). You should replace those lines, e.g., with scipy.stats.rankdata

xnew = rankdata(x, method='average')
ynew = rankdata(y, method='average')

This also handles possible tied ranks.

style checker

Add nose test to ensure pep8 style. Something like pyflakes, pep8, or flakes8.
E.g.,
http://stackoverflow.com/questions/12227443/is-there-a-plugin-for-pylint-and-pyflakes-for-nose-tests

Review hypergeom_conf_interval

To avoid CI errors using scipy 1.9, I changed some of the expected results for hypergeom_conf_interval to match what was being produced in #202. You can see the changes here: https://github.com/statlab/permute/pull/202/files

I didn't have time to track down the source of the changes, but someone should look into it.

Reporting NSGK IRR results

Columns of output data:

Characteristic
Mean of occurences in vid 1, ..., vid 8
SD of occurrences in vid 1,..., vid 8
Concordance score for each video
P-value of concordance score for each video
Overall p-value

"Spearman correlation" should be "Pearson correlation"

Spearman correlation involves replacing the original data by their ranks. We should implement it as a wrapper; meanwhile, need to correct the terminology in the documentation.

Combinations benchmark

https://gist.github.com/stefanv/954d21cb13f1aad55b0f

For the three methods implemented, I see:

1 loops, best of 3: 1.52 s per loop
10 loops, best of 3: 153 ms per loop
10 loops, best of 3: 148 ms per loop

If you are willing to get your outputs as lists instead of arrays, it's quite a bit quicker still.

Fix README.md

Simplify README.md to include mostly links (website, mailing list, git URL) and simply install instructions. Move current content to project documentation.

Travis CI

Add Travis CI support.

Lockstep permute in NPC

For efficiency, add an option to pass in variables and test stat functions. Do permutations in lock step and broadcast the list of functions over the array of permuted data.

add generic test function to two core.two_sample

I think in lieu of
tst_fun = stats[stat]

we can substitute

if callable(stat):
    tst_fun = stat
else:
    test_fun = stats[stat]

Thoughts?

use random_sample instead of shuffle

For the 2 sample problem and the k-sample problem, the PRNG method used to randomly reallocate items should be random_sample(), not shuffle()

Stratified permutation test

Design:

pass in string OR function, just like in two-sample function. Pass this to the helper function
need to specify a function to combine across strata

Somehow, these need to depend on the alternative. For two-sample, the alternatives are greater, less than, or not equal. We may also want to consider ordered alternatives or differ.

Pass a triple (test statistic, combination function, alternative) into the function?

We also want to think about naming conventions.

NPC

These are some notes for how to simultaneously compute the permutation distribution for each video AND the permutation distribution for the NPC statistic. NB I haven't decided what the NPC test statistic should be - the simplest would be the n^{-1/2} weighted sum of the p-values.

For each video j, j=1,...,Ns, store the permutation test statistic rho's in a vector. We get these from simulate_ts_dist, setting keep_dist = True and iters = B. Put these in a B x Ns matrix.

Make a copy of the rho matrix and sort each column.
For the ith row in the unsorted matrix:
    For the jth column in the unsorted matrix:
        Use numpy.searchsorted OR (rho <= rho[i]).sum to find the percentile of rho_{i,j} --> this is the p-value p-hat_{i,j}
    Compute the NPC for the p-values p-hat_{i,1},...,p-hat_{i,Ns}

From this, we get B NPC test statistics. These are an approximate distribution for the NPC test stat and now we can compare the observed NPC test stat to this distribution to get a p-value.

raise error for 1-sided confidence bounds with confidence levels below 50%?

In routines like the binomial confidence interval, the search for the confidence bound assumes that the bound will be between the sample mean and the appropriate a priori bound (e.g., for an upper confidence bound, between \hat{p} and 1). If you call the routine with a confidence level below 50%, that assumption is false and the search algorithm will not converge.

In my experience, I've never seen a 1-sided confidence bounds with a confidence level below 50%. Should we raise an exception (tacitly on the assumption that the caller used \alpha instead of 1-\alpha)? Or should we (correctly) compute 1-sided confidence bounds with confidence levels below 50%?

I get the impression that pythonic style might not require either solution: the user is assumed to be somewhat sophisticated.

@jarrodmillman @stefanv

permutation p-values should never be zero

I noticed a p-value of zero on this page: https://statlab.github.io/permute/user/one-sample.html

Consider using the more conservative

p_u = (hits + 1) / (reps + 1)

instead of

p = hits / reps

The valid but conservative p-value p_u has been recommended by a number of authors including Davison and Hinkley (1997), Manly (1997) and Ernst (2004). The extent to which p_u over-estimates the exact p-value depends on m_t.
https://www.ncbi.nlm.nih.gov/pubmed/21044043

CI for shift does not have a desired precision option: may cause brentq to thrash

brentq assumes that the objective function is continuous--but it isn't. It has flat spots and jumps

brentq assumes that evaluating the p-value at a particular shift always gives the same result, but that isn't true, because the p-value involves simulation. The randomness in the function needs to be taken into account; it will depend on the number of reps.

Line 607 in 2ff8f48

hits = np.sum(dist >= tst)

dist is list and we can't compare a list like that, it will give an error. #155 pull request may solve this error. You will get the error only if keep_dist=True in the one_sample.

review naomi data cleaning

Review:
4920596

In particular:

-20,45,2,3
-26,79,1,5
-16,126,1,8
-9,90,3,9
-21,29,1,10

statlab / permute Goto Github PK

permute's People

Contributors

Stargazers

Watchers

Forkers

permute's Issues

Recommend Projects

Recommend Topics

Recommend Org