Giter VIP home page Giter VIP logo

Comments (14)

larsoner avatar larsoner commented on August 28, 2024

I should also mention that I've done a bunch of work in MATLAB to optimize these functions for speed. I'm going to do the same thing in Python. On my 6-core machine, sometimes it takes an hour to perform the full permutation test, and that's with full parallelization.

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

Looks like a lot of the components I'll need are already in the code (hooray!), perhaps just by making use of permutation_cluster_test (I think). I'll keep looking into it to see if it will do what I think it does...

from mne-python.

agramfort avatar agramfort commented on August 28, 2024

yes there is already code for this. Ellen has used my script for one of her studies:

https://github.com/KuperbergLab/MEG_scripts/blob/master/source_compute_cluster_stats.py

it's still a bit alpha code but it works with fsaverage and maybe any ico surface.

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

Excellent. I'll compare it to my code to see if there are any differences in the output.

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

So far I noticed one thing was different. Instead of defining neighbors directly from the mesh, I used a parameter to define "neighbors" based on their being within some maximal geodesic distance of one another. Is there code for that written already? I basically pulled the information from the source space computed with "mne_add_patch_info --dist 7", but it would be excellent if we could keep it so you only needed to use Python.

Along those lines, I think it could make sense to extend the spatio_temporal_src_connectivity function to have an optional third parameter for the maximum distance between vertices to call them "neighbors". Leaving the third parameter as the default "None" would then use the mesh---the current behavior---or allow the user to use geodesic distances (what I had used). How do you think on it?

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

I also think it probably makes sense to make a function that just takes the data and various parameters to make spatio-temporal clustering more accessible.

something like:

connectivity = spatio_temporal_src_connectivity(src=None, n_times, dist=None)
# src=None would default to using fsaverage
spatio_temporal_clustering(X, connectivity=connectivity, stat_fun=None, tail=0, threshold=0.05, return_threshold=None, seed=0, n_permutations=None, do_permutations=True, stat_power=0)
  • X and connectivity are the [subjects x times x vertices] data (e.g., a difference between conditions) and connectivity matrices
  • connectivity==None defaults to computing the default connectivity (fsaverage and dist=None)
  • stat_fun==None defaults to a ttest (used by permutation_cluster_1samp_test)
  • tail, and seed are also basically passed on
  • threshold is for the first (primary) thresholding
  • return_threshold==None returns all clusters found, return_threshold==0.05 (or whatever) returns only those above a specific threshold
  • do_permutations==True actually performs a full permutation test and gives you p-values; ==False would just cluster the non-permuted case and return those (useful for making sure data are set up correctly, etc.)
  • stat_power is the power to raise the abs(statistical_value) for each vertex by to score the clusters. note that stat_power=0 yields the # of vertices, while stat_power=1 yields the sum of absolute value of the t values (or whatever the statistic is)

There are probably other parameters, but let me know what you think of this general idea.

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

Examining the permutation test code, it looks like it performs a random / monte carlo resampling (exchanging conditions) as opposed to a permutation test, which goes through all 2^n_subjects possible permutations. Is that correct? If so, I'd be up for modifying the code such that it does a permutation test (doing /all/ combinations of exchanges) when n_perms >= 2^(n_subjects). This will achieve an exact test when it's possible, which is nice.

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

One other thing I noticed is that in the _find_clusters code, for the two-tailed tests (which would be the only ones to have this be a concern), it does not consider whether the vertices have the same sign or not when adding them to a cluster, and it seems to me that it should. In other words, if I'm computing condition A-B, vertices for which A>B should not be clustered with those where B>A. Does this make sense? If so, I can try to modify the code in this way. If not, perhaps we can add an option?

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

When I coded my version of clustering, I noticed that since comparisons are against effectively the abs(max()) from each permutation (e.g., lines 127-128 in cluster_level.py), once you've done one permutation---say, 001010 for flipping conditions for 8 subjects---you've effectively also done 110101, so you can get away with doing half the tests, since the opposite permutation will yield the same maximal statistic. If you agree, I'm happy to re-code it to take this into account, especially if I'm also re-coding it to do a full permutation test when possible (two comments above).

EDIT: I meant to say that this is only applies for two-tailed tests...

from mne-python.

agramfort avatar agramfort commented on August 28, 2024

connectivity = spatio_temporal_src_connectivity(src=None, n_times,
dist=None)

src=None would default to using fsaverage

spatio_temporal_clustering(X, connectivity=connectivity, stat_fun=None,
tail=0, threshold=0.05, return_threshold=None, seed=0, n_permutations=None,
do_permutations=True, stat_power=0)

X and connectivity are the [subjects x times x vertices] data (e.g., a
difference between conditions) and connectivity matrices
connectivity==None defaults to computing the default connectivity
(fsaverage and dist=None)
stat_fun==None defaults to a ttest (used by
permutation_cluster_1samp_test)

ok for defaults but it could be explicit on the function signature.

tail, and seed are also basically passed on
threshold is for the first (primary) thresholding
return_threshold==None returns all clusters found, return_threshold==0.05
(or whatever) returns only those above a specific threshold
do_permutations==True actually performs a full permutation test and gives
you p-values; ==False would just cluster the non-permuted case and return
those (useful for making sure data are set up correctly, etc.)
stat_power is the power to raise the abs(statistical_value) for each
vertex by to score the clusters. note that stat_power=0 yields the # of
vertices, while stat_power=1 yields the sum of absolute value of the t
values (or whatever the statistic is)

There are probably other parameters, but let me know what you think of
this general idea.

sounds reasonable.

FYI in fieldtrip the randomization code accepts as parameters ids for
samples (subjects in your mind)
which say what permutation / resamplings are valid. eg. if they have
same id samples should stay in the same group. It makes their code
really generic.

from mne-python.

agramfort avatar agramfort commented on August 28, 2024

Examining the permutation test code, it looks like it performs a random /
monte carlo resampling (exchanging conditions) as opposed to a permutation
test, which goes through all 2^n_subjects possible permutations. Is that
correct? If so, I'd be up for modifying the code such that it does a
permutation test (doing /all/ combinations of exchanges) when n_perms >=
2^(n_subjects). This will achieve an exact test when it's possible, which is
nice.

you're right.

from mne-python.

agramfort avatar agramfort commented on August 28, 2024

One other thing I noticed is that in the _find_clusters code, for the two-tailed tests (which would be the only ones to
have this be a concern), it does not consider whether the vertices have the same sign or not when adding them to a
cluster, and it seems to me that it should. In other words, if I'm computing condition A-B, vertices for which A>B
should not be clustered with those where B>A. Does this make sense? If so, I can try to modify the code in this >way. If not, perhaps we can add an option?

you're right.

When I coded my version of clustering, I noticed that since comparisons
are against effectively the abs(max()) from each permutation (e.g., lines
127-128 in cluster_level.py), once you've done one permutation---say,
0001010 for flipping conditions for 8 subjects---you've effectively also
done 110101, so you can get away with doing half the tests, since the
opposite permutation will yield the same maximal statistic. If you agree,
I'm happy to re-code it to take this into account, especially if I'm also
re-coding it to do a full permutation test when possible (two comments
above).

you're right again

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

When I used a dataset that was 20484 x 2 time points, the clustering took 8 seconds. Going to 4 time points took it to 28 seconds, and going up to 8 time points took 97 seconds. In other words, it looks like the time goes up with the square of the number of time points. This makes sense since the connectivity matrix is structured as (n_times * n_vertices) ** 2.

This is problematic for me, however, since I'm used to clustering data that is 20424 vertices by 200 (or more) time points. And even with 24 GB of memory, I ran out of memory just doing the initial clustering, let alone how long it would take :(

I got around this problem in my MATLAB by exploiting the fact that only immediately adjacent time points (or time points within some specified number of steps) would be called neighbors. This correspondingly makes the time and memory requirements go up linearly with the number of time points (although they still go up with the square of the number of vertices).

Would it be alright if I implemented an alternative method for clustering when data is known to be structured as space x time? Looking at the code, I think it can be done in some sensible way.

from mne-python.

larsoner avatar larsoner commented on August 28, 2024

Closing to move the discussion to the WIP PR.

from mne-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.