statlab / permute Goto Github PK
View Code? Open in Web Editor NEWpermutation tests and confidence sets
Home Page: http://statlab.github.io/permute/
License: BSD 2-Clause "Simplified" License
permutation tests and confidence sets
Home Page: http://statlab.github.io/permute/
License: BSD 2-Clause "Simplified" License
Implement multiplicity adjustments: Bonferroni, Bonferroni-Holm, Westphall-Young.
Also Sidak? Or should we omit methods that require independence or positive regression dependence?
Implement one or more FDR adjustment methods.
write a "wrapper" for multiplicity adjustment with different methods as parameters, or call different top-level functions?
Westphall-Young is close in structure to NPC: need to track individual permutations).
FDR, Bonferroni, Bonferroni-Holm operate on P-values.
Are the p values we get from two-sample (inside permute core) t-tests corrected for false positive?
Integrate with coveralls.io
Implementation looks like
pot_outx = np.concatenate([x, y + shift])
# Potential outcomes for all units under control
pot_outy = np.concatenate([x - shift, y])
pot_out_all = np.column_stack([pot_outx, pot_outy])
dist[i] = tst_stat(pot_out_all [:nx, 0], pot_out_all [nx:, 1])
based on the above code, pot_out_all [:nx, 0]
will give the x
and pot_out_all [nx:, 1]
will give the y
. so we are checking x
and y
not x
and y + shift
or x-shift
and y
Right now, t2p gets called Very Many times, each time requiring order B comparisons and a sum. I think a better approach would be to create, for each test in the combination, the entire vector of p-values at once, by sorting.
I think this is the performance bottleneck in npc().
I'm still on the hunt for the full rats cortical mass dataset, with three treatment groups instead of the two shown in FPP. This reference looks promising, but the data aren't included in the paper.
http://www.sciencedirect.com/science/article/pii/0031938468901613
In the core file, permute_within_groups and permute_rows does this correctly, but there's no random state argument for binom_conf_interval and two_sample takes a seed as its input.
I made up a toy example for writing tests and calculated what I think the confidence intervals should be. Every time I run the function, the lower limit of the confidence interval it gives me back is one less than the value I expected. I think it's coming from this chunk:
if alternative != "upper" and x > 0:
f = lambda q: cl - hypergeom.cdf(x-1, N, q, n)
ci_low = (brentq(f, 0.0, G, *kwargs))
because of the x-1 in the call to hypergeom.cdf. Why do we decrease the number of "good" things by 1 here? I realize the confidence intervals we get out of this will tend to be conservative so it's not necessarily bad, but I wonder if it's the "correct" thing to do.
Need to start section in developer's guide on things to know:
How to install cryptorandom (via pip or GitHub) on AppVeyor and ignore it in the conda install
command?
Look over #FIXMES
In some small examples, the p-values are behaving in ways we don't understand.
Should we do pre-processing to check if every rater gave a 0? What should the p-value/test statistic be in that case?
Look starting at line 262 in irr.py, starting with " if pvalues is None: "
Potential confusion I discovered. The first arg is the p-value, second is the test statistic. When we use the difference in means as the test stat, we expect to get back the difference in means. However you get back the negative when you specify alternative="less"
.
two_sample(x, y, stat="mean", alternative="less")
(0.32283000000000001, 0.4590029743250863)
two_sample(x, y, stat="mean", alternative="greater")
(0.67778000000000005, -0.4590029743250863)
1st dimension should be the 183 domain. 2nd dimension should be the video. Then rater, timestamp (to match input matrix for irr)
consider moving other functions in as well.
In [1]: import permute
In [2]: permute.version
Out[2]: 'unbuilt-dev'
Hey,
First thanks for all the work you've done to deliver this answome library !
I just noticed a minor issue while using the same sample with permute.core.one_sample
,
import permute.core
import numpy as np
a = np.random.rand(200)
p_value = permute.core.one_sample(a,a, reps=10000, seed=42, alternative='two-sided')[0]
p_value
0.00019998000199983323
For sure means are the same and results should not be significant.
Maybe changing >=
to >
may solve the problem in :
hits = np.sum([(tst_fun(z * (1 - 2 * prng.binomial(1, .5, size=n)))) >= tst
for i in range(reps)])
We have the scaffolding up to run the permutation over rows. We need to decide on the optimal way to do this.
We want something in the global scope called "permute_rows" so that we only need to modify that function rather than each time it appears in our package. We need a "permute" module to handle all this.
permute_rows can be one of
We should be using 2*min(P(X <= x || H_0), P(X >= x || H_0)) instead of P(|X| >= |x| || H_0) for the two-sided p-value.
I am using Pandas for the CSV reader currently. Perhaps it is better to consider storing data as *npy instead.
For finding the endpoints of confidence intervals for a shift, the p-value will generally be constant over a range of values of the shift.
Brentq will find a legitimate CI, but the lower endpoint might be unnecessarily low and the upper endpoint might be unnecessarily high.
The solution would seem to be to use something other than a root-finder to find the endpoints, or--if we do use a root-finder--to then try marching up or down to see whether the p-value changes.
I seem to be having issues running the tests that I write for the IRR simulation functions. Specifically, the output of the simulate_ts_dist and simulate_npc_dist functions don't match what is expected when I call "make test" at the command line.
We should figure out a work-around or fix whatever bug is causing this. It'd be nice not to have to skip tests because they always throw an error!
To do
Function is broken
I think your computation of spearman correlation is wrong. In your "spearman_corr" function you have:
xnew = np.argsort(x)+1
ynew = np.argsort(y)+1
However, argsort does not give your the actual rank of corresponding elements, just new indexing for sorting (these things are not equal). You should replace those lines, e.g., with scipy.stats.rankdata
xnew = rankdata(x, method='average')
ynew = rankdata(y, method='average')
This also handles possible tied ranks.
Add nose test to ensure pep8 style. Something like pyflakes, pep8, or flakes8.
E.g.,
http://stackoverflow.com/questions/12227443/is-there-a-plugin-for-pylint-and-pyflakes-for-nose-tests
To avoid CI errors using scipy 1.9, I changed some of the expected results for hypergeom_conf_interval
to match what was being produced in #202. You can see the changes here: https://github.com/statlab/permute/pull/202/files
I didn't have time to track down the source of the changes, but someone should look into it.
Columns of output data:
Characteristic
Mean of occurences in vid 1, ..., vid 8
SD of occurrences in vid 1,..., vid 8
Concordance score for each video
P-value of concordance score for each video
Overall p-value
Spearman correlation involves replacing the original data by their ranks. We should implement it as a wrapper; meanwhile, need to correct the terminology in the documentation.
https://gist.github.com/stefanv/954d21cb13f1aad55b0f
For the three methods implemented, I see:
1 loops, best of 3: 1.52 s per loop
10 loops, best of 3: 153 ms per loop
10 loops, best of 3: 148 ms per loop
If you are willing to get your outputs as lists instead of arrays, it's quite a bit quicker still.
Simplify README.md to include mostly links (website, mailing list, git URL) and simply install instructions. Move current content to project documentation.
Add Travis CI support.
For efficiency, add an option to pass in variables and test stat functions. Do permutations in lock step and broadcast the list of functions over the array of permuted data.
I think in lieu of
tst_fun = stats[stat]
we can substitute
if callable(stat):
tst_fun = stat
else:
test_fun = stats[stat]
Thoughts?
For the 2 sample problem and the k-sample problem, the PRNG method used to randomly reallocate items should be random_sample(), not shuffle()
Design:
Somehow, these need to depend on the alternative. For two-sample, the alternatives are greater, less than, or not equal. We may also want to consider ordered alternatives or differ.
Pass a triple (test statistic, combination function, alternative) into the function?
We also want to think about naming conventions.
These are some notes for how to simultaneously compute the permutation distribution for each video AND the permutation distribution for the NPC statistic. NB I haven't decided what the NPC test statistic should be - the simplest would be the n^{-1/2} weighted sum of the p-values.
For each video j, j=1,...,Ns, store the permutation test statistic rho's in a vector. We get these from simulate_ts_dist, setting keep_dist = True and iters = B. Put these in a B x Ns matrix.
Make a copy of the rho matrix and sort each column.
For the ith row in the unsorted matrix:
For the jth column in the unsorted matrix:
Use numpy.searchsorted OR (rho <= rho[i]).sum to find the percentile of rho_{i,j} --> this is the p-value p-hat_{i,j}
Compute the NPC for the p-values p-hat_{i,1},...,p-hat_{i,Ns}
From this, we get B NPC test statistics. These are an approximate distribution for the NPC test stat and now we can compare the observed NPC test stat to this distribution to get a p-value.
In routines like the binomial confidence interval, the search for the confidence bound assumes that the bound will be between the sample mean and the appropriate a priori bound (e.g., for an upper confidence bound, between \hat{p} and 1). If you call the routine with a confidence level below 50%, that assumption is false and the search algorithm will not converge.
In my experience, I've never seen a 1-sided confidence bounds with a confidence level below 50%. Should we raise an exception (tacitly on the assumption that the caller used \alpha instead of 1-\alpha)? Or should we (correctly) compute 1-sided confidence bounds with confidence levels below 50%?
I get the impression that pythonic style might not require either solution: the user is assumed to be somewhat sophisticated.
I noticed a p-value of zero on this page: https://statlab.github.io/permute/user/one-sample.html
Consider using the more conservative
p_u = (hits + 1) / (reps + 1)
instead of
p = hits / reps
The valid but conservative p-value
p_u
has been recommended by a number of authors including Davison and Hinkley (1997), Manly (1997) and Ernst (2004). The extent to whichp_u
over-estimates the exact p-value depends onm_t
.
https://www.ncbi.nlm.nih.gov/pubmed/21044043
brentq assumes that the objective function is continuous--but it isn't. It has flat spots and jumps
brentq assumes that evaluating the p-value at a particular shift always gives the same result, but that isn't true, because the p-value involves simulation. The randomness in the function needs to be taken into account; it will depend on the number of reps.
Add plusone to the calling signature and to the code for npc and t2p
Check code to be sure 1-sided and 2-sided P-values are not (incorrectly) assuming the test statistic has a continuous distribution.
Add install instructions. Add requirements.txt.
I get two-sided p-values of 2.0 in some examples.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.