Giter VIP home page Giter VIP logo

permute's People

Contributors

akglazer avatar jarrodmillman avatar kellieotto avatar matthew-brett avatar pbstark avatar qqqube avatar stefanv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

permute's Issues

multiple testing, FDR

  • Implement multiplicity adjustments: Bonferroni, Bonferroni-Holm, Westphall-Young.

  • Also Sidak? Or should we omit methods that require independence or positive regression dependence?

  • Implement one or more FDR adjustment methods.

  • write a "wrapper" for multiplicity adjustment with different methods as parameters, or call different top-level functions?

Westphall-Young is close in structure to NPC: need to track individual permutations).

FDR, Bonferroni, Bonferroni-Holm operate on P-values.

Wrong implementation in "two_sample_shift" when shift != 0.

Implementation looks like

pot_outx = np.concatenate([x, y + shift])
# Potential outcomes for all units under control
pot_outy = np.concatenate([x - shift, y])
pot_out_all = np.column_stack([pot_outx, pot_outy])

dist[i] = tst_stat(pot_out_all [:nx, 0], pot_out_all [nx:, 1])

based on the above code, pot_out_all [:nx, 0] will give the x and pot_out_all [nx:, 1] will give the y. so we are checking x and y not x and y + shift or x-shift and y

improve performance of npc

Right now, t2p gets called Very Many times, each time requiring order B comparisons and a sum. I think a better approach would be to create, for each test in the combination, the entire vector of p-values at once, by sorting.

I think this is the performance bottleneck in npc().

irr todo items

  • split by attribute
  • split by video
  • sparse -> full
  • combination method? (NPC, n^{-1/2} weighting, something else)

Pass in RandomState object rather than seed

In the core file, permute_within_groups and permute_rows does this correctly, but there's no random state argument for binom_conf_interval and two_sample takes a seed as its input.

Hypergeometric confidence interval - lower limit too low?

I made up a toy example for writing tests and calculated what I think the confidence intervals should be. Every time I run the function, the lower limit of the confidence interval it gives me back is one less than the value I expected. I think it's coming from this chunk:

if alternative != "upper" and x > 0:
    f = lambda q: cl - hypergeom.cdf(x-1, N, q, n)
    ci_low = (brentq(f, 0.0, G, *kwargs))

because of the x-1 in the call to hypergeom.cdf. Why do we decrease the number of "good" things by 1 here? I realize the confidence intervals we get out of this will tend to be conservative so it's not necessarily bad, but I wonder if it's the "correct" thing to do.

dev notes

Need to start section in developer's guide on things to know:

  • np.random.seed vs. RandomState
  • shuffle vs. permutation
  • np.asarray? type-checking
  • optimize notes (e.g., brentq)
  • python 2/3 recommendations
  • timing issues

IRR p-values when doing NPC

In some small examples, the p-values are behaving in ways we don't understand.

Should we do pre-processing to check if every rater gave a 0? What should the p-value/test statistic be in that case?

Look starting at line 262 in irr.py, starting with " if pvalues is None: "

"less than" alternative

Potential confusion I discovered. The first arg is the p-value, second is the test statistic. When we use the difference in means as the test stat, we expect to get back the difference in means. However you get back the negative when you specify alternative="less".

two_sample(x, y, stat="mean", alternative="less")
 (0.32283000000000001, 0.4590029743250863)

two_sample(x, y, stat="mean", alternative="greater")
(0.67778000000000005, -0.4590029743250863)

Reshape nsgk

1st dimension should be the 183 domain. 2nd dimension should be the video. Then rater, timestamp (to match input matrix for irr)

version number

In [1]: import permute

In [2]: permute.version
Out[2]: 'unbuilt-dev'

Significant results when comparing the same sample

Hey,

First thanks for all the work you've done to deliver this answome library !

I just noticed a minor issue while using the same sample with permute.core.one_sample ,

import permute.core
import numpy as np
a = np.random.rand(200)
p_value = permute.core.one_sample(a,a, reps=10000, seed=42, alternative='two-sided')[0]
p_value
0.00019998000199983323

For sure means are the same and results should not be significant.

Maybe changing >= to > may solve the problem in :

hits = np.sum([(tst_fun(z * (1 - 2 * prng.binomial(1, .5, size=n)))) >= tst
                       for i in range(reps)])

Permute or shuffle?

We have the scaffolding up to run the permutation over rows. We need to decide on the optimal way to do this.

We want something in the global scope called "permute_rows" so that we only need to modify that function rather than each time it appears in our package. We need a "permute" module to handle all this.

permute_rows can be one of

  • for loop to iterate over rows and shuffle in place?
  • figure out how to implement apply_along_axis with shuffle?

Two-sided p-values

We should be using 2*min(P(X <= x || H_0), P(X >= x || H_0)) instead of P(|X| >= |x| || H_0) for the two-sided p-value.

Revisit Pandas dependency

I am using Pandas for the CSV reader currently. Perhaps it is better to consider storing data as *npy instead.

CIs for shift: brentq for confidence intervals finds *a* root, not necessarily the largest or smallest root

For finding the endpoints of confidence intervals for a shift, the p-value will generally be constant over a range of values of the shift.

Brentq will find a legitimate CI, but the lower endpoint might be unnecessarily low and the upper endpoint might be unnecessarily high.

The solution would seem to be to use something other than a root-finder to find the endpoints, or--if we do use a root-finder--to then try marching up or down to see whether the p-value changes.

Unit tests don't run properly

I seem to be having issues running the tests that I write for the IRR simulation functions. Specifically, the output of the simulate_ts_dist and simulate_npc_dist functions don't match what is expected when I call "make test" at the command line.

We should figure out a work-around or fix whatever bug is causing this. It'd be nice not to have to skip tests because they always throw an error!

Constrained matrix permutation

To do

  • Literature search, this looks promising:
    • C. J. Carstens, Proof of uniform sampling of binary matrices with fixed row sums and column sums for the fast Curveball algorithm. PHYSICAL REVIEW E 91, 042812 (2015)
    • Snijders, T.A.B., Enumeration and simulation methods for 0-1 matrices with given marginals. Psychometrika, 56 (1991), 397-417. (this enumerates all possible matrices -- probably not what we want)
  • After finding a proof of uniformity we're happy with, code it up
  • Compare distribution of some test statistic between this method and @stefanv's code. Test statistic could be total number of pairs of sharks observed together

Incorrect spearman correlation calculation (computation of rank)

I think your computation of spearman correlation is wrong. In your "spearman_corr" function you have:

xnew = np.argsort(x)+1
ynew = np.argsort(y)+1

However, argsort does not give your the actual rank of corresponding elements, just new indexing for sorting (these things are not equal). You should replace those lines, e.g., with scipy.stats.rankdata

xnew = rankdata(x, method='average')
ynew = rankdata(y, method='average')

This also handles possible tied ranks.

Reporting NSGK IRR results

Columns of output data:

Characteristic
Mean of occurences in vid 1, ..., vid 8
SD of occurrences in vid 1,..., vid 8
Concordance score for each video
P-value of concordance score for each video
Overall p-value

Fix README.md

Simplify README.md to include mostly links (website, mailing list, git URL) and simply install instructions. Move current content to project documentation.

Lockstep permute in NPC

For efficiency, add an option to pass in variables and test stat functions. Do permutations in lock step and broadcast the list of functions over the array of permuted data.

use random_sample instead of shuffle

For the 2 sample problem and the k-sample problem, the PRNG method used to randomly reallocate items should be random_sample(), not shuffle()

Stratified permutation test

Design:

  • pass in string OR function, just like in two-sample function. Pass this to the helper function
  • need to specify a function to combine across strata

Somehow, these need to depend on the alternative. For two-sample, the alternatives are greater, less than, or not equal. We may also want to consider ordered alternatives or differ.

Pass a triple (test statistic, combination function, alternative) into the function?

We also want to think about naming conventions.

NPC

These are some notes for how to simultaneously compute the permutation distribution for each video AND the permutation distribution for the NPC statistic. NB I haven't decided what the NPC test statistic should be - the simplest would be the n^{-1/2} weighted sum of the p-values.

For each video j, j=1,...,Ns, store the permutation test statistic rho's in a vector. We get these from simulate_ts_dist, setting keep_dist = True and iters = B. Put these in a B x Ns matrix.

Make a copy of the rho matrix and sort each column.
For the ith row in the unsorted matrix:
    For the jth column in the unsorted matrix:
        Use numpy.searchsorted OR (rho <= rho[i]).sum to find the percentile of rho_{i,j} --> this is the p-value p-hat_{i,j}
    Compute the NPC for the p-values p-hat_{i,1},...,p-hat_{i,Ns}

From this, we get B NPC test statistics. These are an approximate distribution for the NPC test stat and now we can compare the observed NPC test stat to this distribution to get a p-value.

raise error for 1-sided confidence bounds with confidence levels below 50%?

In routines like the binomial confidence interval, the search for the confidence bound assumes that the bound will be between the sample mean and the appropriate a priori bound (e.g., for an upper confidence bound, between \hat{p} and 1). If you call the routine with a confidence level below 50%, that assumption is false and the search algorithm will not converge.

In my experience, I've never seen a 1-sided confidence bounds with a confidence level below 50%. Should we raise an exception (tacitly on the assumption that the caller used \alpha instead of 1-\alpha)? Or should we (correctly) compute 1-sided confidence bounds with confidence levels below 50%?

I get the impression that pythonic style might not require either solution: the user is assumed to be somewhat sophisticated.

@jarrodmillman @stefanv

permutation p-values should never be zero

I noticed a p-value of zero on this page: https://statlab.github.io/permute/user/one-sample.html

Consider using the more conservative

p_u = (hits + 1) / (reps + 1)

instead of

p = hits / reps

The valid but conservative p-value p_u has been recommended by a number of authors including Davison and Hinkley (1997), Manly (1997) and Ernst (2004). The extent to which p_u over-estimates the exact p-value depends on m_t.
https://www.ncbi.nlm.nih.gov/pubmed/21044043

CI for shift does not have a desired precision option: may cause brentq to thrash

brentq assumes that the objective function is continuous--but it isn't. It has flat spots and jumps

brentq assumes that evaluating the p-value at a particular shift always gives the same result, but that isn't true, because the p-value involves simulation. The randomness in the function needs to be taken into account; it will depend on the number of reps.

NPC: plusone and double-check P-values

Add plusone to the calling signature and to the code for npc and t2p

Check code to be sure 1-sided and 2-sided P-values are not (incorrectly) assuming the test statistic has a continuous distribution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.