In MATLAB PREP, the random seed for RANSAC during findNoisyC

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

RANSAC random seeds work differently in PyPREP than in MATLAB PREP about pyprep HOT 8 CLOSED

sappelhoff commented on May 26, 2024

RANSAC random seeds work differently in PyPREP than in MATLAB PREP

from pyprep.

Comments (8)

sappelhoff commented on May 26, 2024 1

Ahh now I get it. Yes, sounds like a good idea, especially when comparing several algos.

from pyprep.

a-hurst commented on May 26, 2024 1

Cool! I'll let you know when I find some time to set this up, then. Will be interesting to see how much of a difference the matlab_strict mode makes in terms of prediction accuracy! If it's decisively in PyPREP's favour, it would be a nice plot to be able to show somewhere in the README or docs.

from pyprep.

sappelhoff commented on May 26, 2024

very interesting ... thanks for looking into that! :)

Intuitively, the pyprep approach sounds more sensible to me ...

This means that the random state is going to be different for every RANSAC run during re-referencing, resulting in different re-referencing results than MATLAB PREP and potentially higher variability between iterations (resulting in more channels being flagged as bad)

I made the important part bold. --> The big question is whether the additional channels (compared to matprep) that are flagged bad, are sensibly flagged bad ... or whether it's a bad flagging.

That's something to check with visual inspection on 3 or 4 separate datasets I guess. If the finding is: pyprep does flag more channels but it looks sensible, then I'd say we should leave pyprep as is, and provide the matprep way in matlab_strict and document it in your nice doc page, Austin. Otherwise, we should make the matlab way the default.

from pyprep.

a-hurst commented on May 26, 2024

That's something to check with visual inspection on 3 or 4 separate datasets I guess.

Given that I'm lacking in the 'practical experience with EEG' department, would you be willing to take a look at this one? I'm not sure I trust my intuition of what a good or bad channel looks like beyond obvious bad ones. The only change you'd need in PyPREP's code to comparison test would be to set this line to be a fixed seed:

pyprep/pyprep/find_noisy_channels.py

Line 517 in ffa6346

self.random_state,

Alternatively, if you know of any BIDS datasets where bad channels have already been flagged/annotated in the metadata, I can test against one of those.

Also, a related thought on this question: currently, each random channel sample during RANSAC is completely independent such that you could theoretically get a bunch of the same channel picks for each sample and/or have no picks of a given channel across all of the samples. This is the way MatPREP does it too. For a better guarantee of even sampling, what if we did something like this?

n_samples = 50
sample_size = int(len(good_channel_names * 0.25))

samples = []
sample_pool = []
for n in range(samples):

    # If the sample pool is empty, fill it with a copy of the full good channel list and shuffle it
    if len(sample_pool) < sample_size:
        sample_pool = good_channel_names.copy()
        random.shuffle(sample_pool)

    # Remove samples from the sample pool to make RANSAC samples
    sample = []
    for i in range(sample_size):
        sample.append(sample_pool.pop())
    samples.append(sample)

We'd need some extra code for making sure there aren't ever any duplicate channels in a sample, but this would ensure that RANSAC predictor locations are sampled evenly on average for every NoisyChannels run.

from pyprep.

sappelhoff commented on May 26, 2024

Given that I'm lacking in the 'practical experience with EEG' department, would you be willing to take a look at this one? I'm not sure I trust my intuition of what a good or bad channel looks like beyond obvious bad ones.

Unfortunately I don't have time to get my hands dirty to help you right now, but it's never too early to start building your own intuition for visual inspection of EEG data, you'll need it with the first dataset that you collect yourself at the latest. I recommend you read chapter 6 from Luck's "An Introduction to the Event Related Potential Technique" - if you (or your library) doesn't have access to this book, I can share that chapter with you (I bought it when I started my phd 😉 ), but I've heard it's also available on some shady sites like Library Genesis.

If you are unsure with a certain channel, you could then share it here and we can have a look together.

For a better guarantee of even sampling, what if we did something like this?

That would deviate from the original PREP, so we'd have to wrap the current way into matprep_strict, I guess.

Overall I am not sure if it's worth the effort, because given enough samples, a random sample will be (theoretically) equivalent to the procedure that you suggest (which is kind of "pushing randomness to uniform even at small samples"?)

from pyprep.

a-hurst commented on May 26, 2024

@sappelhoff Don't have much time look into it right now, but I stumbled upon something yesterday trying to find Python EMG analysis examples that might help with questions like this: it's called moabb and it's designed as a framework for empirically testing and comparing the effectiveness of different BCI algorithms and processing methods at accurately classifying real/imagined movements, built around MNE and including a bunch of open datasets.

If we choose a BCI method and a dataset or two, we could try to test the effectiveness of different PyPREP tweaks at improving the real-world SNR of the data, as measured by differences in classification accuracy. Would allow for automated and reproducible real-world testing of any functional changes to PyPREP (e.g. comparing the matlab_strict method of trend removal to the default, or comparing MatPREP's random seed apporach here to our current one). What do you think, is this worth looking into?

from pyprep.

sappelhoff commented on May 26, 2024

It sounds like a nice additional test. However if a BCI algorithm performs better after pyprep setup A compared to pyprep setup B, then I wouldn't take that as evidence that pyprep setup A is superior. Simply because BCI algorithms may also utilize peculiar noise features of the data that may be enhanced in a given pyprep setup.

But if this test is part of a larger battery, then I could see how it would be helpful for evaluating choices.

I would put this lower on the priority list though (not a 0.4 feature). WDYT?

from pyprep.

a-hurst commented on May 26, 2024

Oh, I wasn't proposing this as something that would be formally integrated into PyPREP via CI or anything (I'd imagine running the comparisons on full sets of test files would be quite slow), I was just thinking of this as a method of trying to better assess the costs/benefits of different choices in PyPREP (vs the alternative of visually evaluating the channels of test files and comparing that to PyPREP's bad channel picks.

Would an approach like this be more trustworthy if we used a few different BCI algorithms per file, to make sure we weren't falling into the over-fitting trap? The slow part here would presumably be running PyPREP on all the different files, so having multiple BCI algorithms to run afterwards might not come with too much of a speed penalty.

from pyprep.

RANSAC random seeds work differently in PyPREP than in MATLAB PREP about pyprep HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent