autoreject / autoreject Goto Github PK

View Code? Open in Web Editor NEW

138.0 13.0 58.0 717 KB

Automated rejection and repair of bad trials/sensors in M/EEG

Home Page: https://autoreject.github.io

License: BSD 3-Clause "New" or "Revised" License

Python 99.38% Makefile 0.62%

electroencephalography preprocessing cross-validation mne-python magnetoencephalography meg

autoreject's Introduction

autoreject

This is a library to automatically reject bad trials and repair bad sensors in magneto-/electroencephalography (M/EEG) data.

The documentation can be found under the following links:

for the stable release
for the latest (development) version

Installation

We recommend the Anaconda Python distribution and a Python version >= 3.9. We furthermore recommend that you install autoreject into an isolated Python environment. To obtain the stable release of autoreject, you can use pip:

pip install -U autoreject

Or conda:

conda install -c conda-forge autoreject

If you want the latest (development) version of autoreject, use:

pip install https://github.com/autoreject/autoreject/archive/refs/heads/main.zip

To check if everything worked fine, you can do:

python -c 'import autoreject'

and it should not give any error messages.

Below, we list the dependencies for autoreject. All required dependencies are installed automatically when you install autoreject.

mne (>=1.5.0)
numpy (>=1.21.2)
scipy (>=1.7.1)
scikit-learn (>=1.0.0)
joblib
matplotlib (>=3.5.0)

Optional dependencies are:

openneuro-py (>= 2021.10.1, for fetching data from OpenNeuro.org)

Quickstart

The easiest way to get started is to copy the following three lines of code in your script:

>>> from autoreject import AutoReject
>>> ar = AutoReject()
>>> epochs_clean = ar.fit_transform(epochs)  # doctest: +SKIP

This will automatically clean an epochs object read in using MNE-Python. To get the rejection dictionary, simply do:

>>> from autoreject import get_rejection_threshold
>>> reject = get_rejection_threshold(epochs)  # doctest: +SKIP

We also implement RANSAC from the PREP pipeline (see PyPREP for a full implementation of the PREP pipeline). The API is the same:

>>> from autoreject import Ransac
>>> rsc = Ransac()
>>> epochs_clean = rsc.fit_transform(epochs)  # doctest: +SKIP

For more details check out the example to automatically detect and repair bad epochs.

Bug reports

Please use the GitHub issue tracker to report bugs.

Cite

[1] Mainak Jas, Denis Engemann, Federico Raimondo, Yousra Bekhti, and Alexandre Gramfort, "Automated rejection and repair of bad trials in MEG/EEG." In 6th International Workshop on Pattern Recognition in Neuroimaging (PRNI), 2016.

[2] Mainak Jas, Denis Engemann, Yousra Bekhti, Federico Raimondo, and Alexandre Gramfort. 2017. "Autoreject: Automated artifact rejection for MEG and EEG data". NeuroImage, 159, 417-429.

autoreject's People

Contributors

Stargazers

Watchers

autoreject's Issues

MAINT: respect sklearn depracations and use modern code

[] cross_validation -> model_selection
[] KFold API -> n_splits + .get_n_splits(X) + .split(X)

cc @jasmainak

ENH: consolidate distinction between parameters and attributes in API

Currently the API in ARCV, at least, introduces fuzzy loops. If self.cv is None, it will be overwritten in a fit data dependent way. This breaks refitting on differently shaped data as the cv object assumes a certain number of epochs. This and other related issues should be rigorously addressed together with #2 and #65.

cc @jasmainak

Epoch Cleaning solution stability

Hello again,
is it normal for autoreject to have quite different solutions between runs?

Maybe I'm doing something wrong? I'm using
ts = autoreject.LocalAutoRejectCV(consensus_percs=np.linspace(0, 0.99, 11))
clean_mne = ts.fit_transform(e_mne.pick_types(eeg=True))

migrate to circle 2.0

Let's do it after the release

Bad Channels are Used for Interpolation

Hello,

Thanks for making this tool, it filled a much needed gap in my preprocessing pipeline!

I've run into one issue while using it that I wanted to raise though. If the epochs object that you run autoreject on has bad channels marked, these get ignored when running autoreject as expected. However, when actually transforming the original epochs to get the cleaned version, these bad channels are used to interpolate the bad channels that autoreject marked, often resulting in less clean data than before.

This appears to be due to the fact that the bad channels during the interpolation process are reset to the ones determined by autoreject, overwriting the previously marked bad channels. By instead appending the autoreject bad channels this behavior can be avoided with the one side effect that the other bad channels not marked by autoreject will get interpolated as well. In the code.

In the code, this amounts to changing the = in line 677 of autoreject.py to a +=. I'd be happy to submit a pull request to fix this.

I realize that the intention of the design was to allow autoreject to do the job of detecting bad channels itself. However, I am unfortunately working with quite noisy data and I find that marking bad channels and running ICA ahead allows autoreject to clean up the remaining data and works quite well. If I just run autoreject directly, it throws out most of my data. This fix would allow this use case.

Thanks!

deal with flat channels

@jbschiratti reported to me that he has a use case for running autoreject on his data with flat channels. He has locally bad channels and MNE gets overzealous in rejecting trials. Perhaps we should support this as well.

Version problem

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/macbookpro/Documents/autoreject/examples/plot_auto_repair.py in <module>()
     96 ar = LocalAutoRejectCV(n_interpolates, consensus_percs,
     97                        thresh_func=thresh_func)
---> 98 epochs_clean = ar.fit_transform(epochs['Auditory/Left'])
     99 
    100 evoked = epochs.average()

/Users/macbookpro/Documents/autoreject/autoreject/autoreject.py in fit_transform(self, epochs)
    760             The epochs object which must be cleaned.
    761         """
--> 762         return self.fit(epochs).transform(epochs)

/Users/macbookpro/Documents/autoreject/autoreject/autoreject.py in fit(self, epochs)
    680 
    681         # The thresholds must be learnt from the entire data
--> 682         local_reject.fit(epochs)
    683         self.threshes_ = local_reject.threshes_
    684 

/Users/macbookpro/Documents/autoreject/autoreject/autoreject.py in fit(self, epochs)
    451         self.picks = _handle_picks(info=epochs.info, picks=self.picks)
    452         self.threshes_ = self.thresh_func(
--> 453             epochs.copy(), picks=self.picks, verbose=self.verbose)
    454         return self
    455 

/Users/macbookpro/Documents/autoreject/autoreject/autoreject.py in compute_thresholds(epochs, method, random_state, picks, verbose, n_jobs)
    357     n_epochs = len(epochs)
    358     picks = _handle_picks(info=epochs.info, picks=picks)
--> 359     epochs_interp = clean_by_interp(epochs, picks=picks, verbose=verbose)
    360     data = np.concatenate((epochs.get_data(), epochs_interp.get_data()),
    361                           axis=0)  # non-data channels will be duplicate

/Users/macbookpro/Documents/autoreject/autoreject/utils.py in clean_by_interp(inst, picks, verbose)
    105     ch_names = [inst.info['ch_names'][p] for p in picks]
    106     for ch_idx, (pick, ch) in enumerate(_pbar(list(zip(picks, ch_names)),
--> 107                                         desc=mesg, verbose=verbose)):
    108         inst_clean = inst.copy().pick_channels(ch_names)
    109         inst_clean.info['bads'] = [ch]

/Users/macbookpro/Documents/autoreject/autoreject/utils.py in _pbar(iterable, desc, leave, position, verbose)
     45 
     46         _ProgressBar = ProgressBar
---> 47         if not mne.utils.check_version('mne', '0.14dev0'):
     48             class _ProgressBar(ProgressBar):
     49                 def __iter__(self):

/Users/macbookpro/anaconda/lib/python3.5/site-packages/mne/utils.py in check_version(library, min_version)
   1106         if min_version:
   1107             this_version = LooseVersion(library.__version__)
-> 1108             if this_version < min_version:
   1109                 ok = False
   1110     return ok

/Users/macbookpro/anaconda/lib/python3.5/distutils/version.py in __lt__(self, other)
     50 
     51     def __lt__(self, other):
---> 52         c = self._cmp(other)
     53         if c is NotImplemented:
     54             return c

/Users/macbookpro/anaconda/lib/python3.5/distutils/version.py in _cmp(self, other)
    335         if self.version == other.version:
    336             return 0
--> 337         if self.version < other.version:
    338             return -1
    339         if self.version > other.version:

TypeError: unorderable types: int() < str()

And mne version:

In [2]: import mne

In [3]: mne.__version__
Out[3]: '0.14.1'

ENH/API: ch_type instead of fix param.

And simple API for multi-channel support:

ar = AutorejectLocalCV()
for ch_type in ('mag', 'grad', 'eeg'):
    ar.ch_type = ch_type
    ar.fit(epochs)  # accumulates columns of fix_log
ar.transform(epochs)  # transform everything that was learned.

copy interp changes from MNE

see https://github.com/mne-tools/mne-python/pull/4418/files

number of epochs rejected -- why one more is rejected?

I was examining the code to see whether I can model the number of rejected epochs with different consensus values without re-running the algorithm with different consensus values, and I always saw that the number of epochs rejected by autoreject is 1 more than I would reject, using ar.bad_segments and consensus value (i.e. np.where(ar.bad_segments.sum(axis=1)>=(consensus*len(epochs.ch_names))), where ar is my autoreject model). I believe that the reason is this piece of code in the autoreject.py

def _get_bad_epochs(self, bad_sensor_counts, ch_type):
    """Get the indices of bad epochs."""
    sorted_epoch_idx = np.argsort(bad_sensor_counts)[::-1]
    bad_sensor_counts = np.sort(bad_sensor_counts)[::-1]
    n_channels = len(self.picks)
    n_consensus = self.consensus_perc[ch_type] * n_channels
    if np.max(bad_sensor_counts) >= n_consensus:
        n_epochs_drop = np.sum(bad_sensor_counts >=
                               n_consensus) + 1
        bad_epochs_idx = sorted_epoch_idx[:n_epochs_drop]
    else:
        n_epochs_drop = 0
        bad_epochs_idx = []

What draws my attention is the line n_epochs_drop = np.sum(bad_sensor_counts >=n_consensus) + 1, which means that the number of epochs that will be dropped is the number of epochs in which the number of bad channels is higher that the consensus (which is sensible and well described in the paper), plus one. So in the end one more epoch is rejected than necessary. Is there a particular reason to do it, or is it a bug?

[DOC] typo

Hi, just looking at the documentation of LocalAutoRejectCV(), and there seems to be a mixup:

The defaults of consensus_percs and n_interpolates are switched (only in the description, the code looks fine)

Intermittent error while repairing epochs

I'm trying to use autoreject on some EEG data recorded on a BioSemi system. Sometimes I get the following warning during processing - it seems to happen during the initial fit and looks like it might be related to interpolation (mne's bem.py, via _interpolate_bads_eeg). Specifically this happens during a "repairing epochs" step. I can tell it's during the initial fit as it says it's repairing half the trials, not all of them.

C:\Users\Matt\Anaconda3\envs\NewPython\lib\site-packages\mne\bem.py:975: RuntimeWarning: Mean of empty slice.
  radius_init = radii.mean()
C:\Users\Matt\Anaconda3\envs\NewPython\lib\site-packages\numpy\core\_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
C:\Users\Matt\Anaconda3\envs\NewPython\lib\site-packages\numpy\core\fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
C:\Users\Matt\Anaconda3\envs\NewPython\lib\site-packages\numpy\core\_methods.py:73: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)

I'm calling autoreject as follows.

# run autoreject
    thresh_func = partial(compute_thresholds, method='bayesian_optimization', random_state = 342)
    n_interpolates = np.array([1, 4, 32])
    consensus_percs = np.linspace(0, 1.0, 11)
    ar = LocalAutoRejectCV(n_interpolates = n_interpolates,
                           consensus_percs = consensus_percs,
                           thresh_func=thresh_func, verbose='tqdm')
    epochs_ar = []    
    #fit Autoreject model to a subset of epochs, then correct all epochs
    epochs_ar = ar.fit(epochs[::2]).transform(epochs)

I use epochs[::2] here as I have 480 (~3s long...) trials per participant, and was trying out various options suggested in the comments on issue #16 to deal with instability of solutions (another issue i'm having...). I've had the same issue with [::3] and [::4].

Note that this seems to be contingent on the random state for compute_thresholds. With random_state = 342, this occurs reliably for one dataset I have, but goes away with random_state =343 for the same dataset. Obviously, if I leave this random, it then happens randomly on other datasets.

multi channel support for ransac

For someone bored with his life ...

cc @dengemann

Using autoreject on hilbert envelope to detecting EMG artefacts?

Hi there,

This is less of an 'issue' than a general question for a use, but this seemed like the easiest way to get in touch.
First off: thanks for all your hard work on the openness, clarity and ease of use regarding everything autoreject. I have a pretty general question:

I am in a lab where we typically would use FieldTrip's z-score based method for detecting segments of data with muscle artefacts.

In short, the process entails:
(1) bp-filter at 110-140 Hz
(2) compute hilbert envelope, followed by some boxcar averaging.
(3) z-score the resulting values and identify bad segments with a fixed cutoff.

Now, if I understand correctly, peak-to-peak rejection employed by autoreject wouldn't identify EMG distortions per se: the artefacts are 'defined' by power, not raw amplitudes. (this is somewhat confirmed by my observations thus far comparing both approaches ).

However, can I just run autoreject.compute_thresholds() on the hilbert envelope created in (2) to identify EMG-affected datasegments, effectively replacing step (3) above?
Or am I conceptually misinterpreting something here and abusing your package?

Thanks,
Wouter

error upon import

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-840fa4abf5dc> in <module>()
----> 1 import autoreject

/home/anaconda2/lib/python2.7/site-packages/autoreject/__init__.py in <module>()
----> 1 from .autoreject import GlobalAutoReject, LocalAutoReject, LocalAutoRejectCV
      2 from .autoreject import compute_thresholds, validation_curve, get_rejection_threshold
      3 from .ransac import Ransac
      4 from .utils import set_matplotlib_defaults
      5 from .viz import plot_epochs

/home/anaconda2/lib/python2.7/site-packages/autoreject/autoreject.py in <module>()
     13 
     14 from sklearn.base import BaseEstimator
---> 15 from sklearn.model_selection import RandomizedSearchCV
     16 from sklearn.cross_validation import KFold, StratifiedShuffleSplit
     17 

ImportError: No module named model_selection

autoreject.plot_epochs doesn't work

Hi,
I'm using autoreject in my project and found a strange bug, on some computers plot_epochs crashes with:

File "/dmj/fizmed/mdovgialo/.local/lib/python2.7/site-packages/autoreject/viz.py", line 201, in plot_epochs
  title, picks)
File "/dmj/fizmed/mdovgialo/.local/lib/python2.7/site-packages/autoreject/viz.py", line 447, in _prepare_mne_browse_epochs
  this_log = params['fix_log'][epoch_idx, ch_idx]
File "/dmj/fizmed/mdovgialo/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2059, in __getitem__
  return self._getitem_column(key)
File "/dmj/fizmed/mdovgialo/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
  return self._get_item_cache(key)
File "/dmj/fizmed/mdovgialo/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
  values = self._data.get(item)
File "/dmj/fizmed/mdovgialo/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 3541, in get
  loc = self.items.get_loc(item)
File "/dmj/fizmed/mdovgialo/.local/lib/python2.7/site-packages/pandas/indexes/base.py", line 2136, in get_loc
  return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 139, in pandas.index.IndexEngine.get_loc (pandas/index.c:4443)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4289)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13733)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13687)
KeyError: (0, 0)

I'm don't even know what exactly causes it. I've installed all dependencies for autoreject as instructed on the main page, I've tried several versions of pandas, but it still is the same. The computer which gets that error is running Debian 8 (jessie), computer which works - Ubuntu 16.04.

Maybe you've seen this error or might know what might cause it? I would gladly help you, but I'm not even sure why it keeps happening.

Steps to reproduce (I guess):
Install debian 8
Install autoreject as told in instruction
Use plot_epochs function after cleaning some epochs with autoreject like that:

plot_epochs(e_mne, bad_epochs_idx=ts.bad_epochs_idx,
                                    fix_log=ts.fix_log, scalings='auto',
                                    title='', show=True, block=True)

I'll gladly help, please ask me what would you need to even start tackling this bug.

get_rejection_thresholds broken

reported by @kambiz

Script: https://gist.github.com/kambysese/b68b98286b05ecf4c54172cf6ad34415
Data: https://www.dropbox.com/sh/2udqpo5e1rren5w/AADACwsT4rb9mUGX6Mr-p7Ioa?dl=0

"I've an epochs object with 320 events i.e., event_id=None and just the mag threshold estimation procedure takes > 20 mins with no end in sight. Comparatively the example script zips through with 73 events. Should I be sub-selecting channel and/or epoch types? Any chance we get some verbosity or speed up options/args?"

and

"I assume like all other things data SNR impacts the autoreject routines. SKUBI_105212 is infant data so don't be surprised if it's ummm...a little weird; for this the script took 109 mins. SUBJ_01 is an adult data ds and that clocked in at 15 mins on my WS, which may be expected."

Store index of remaining epochs in epochs object

Tracking the index of epochs is useful for doing, for example, cross correlations with behavioral data or so.

For example, to compute epoch-wise signal features and match it with a dataframe with trials (epochs) as rows, one must know the absolute index of each epoch that persists after bad epochs rejection. To be further able to merge data in a nice dataframe:

trial     behavioral_scale    signal_feature
0         5.54                     1532.2
1         2.45                     1846.1
2         4.25                     NA
3         1.12                     1132.8
4         8.02                     NA

Currently, I did not found if it is the case. However, the indices of remaining epochs can be found as follows:

ar =  autoreject.LocalAutoRejectCV(verbose=False, picks=mne.pick_types(epochs.info, meg=False, eeg=True))
epochs = ar.fit_transform(epochs)
remaining = list(range(len(epochs) + len(ar.bad_epochs_idx)))  # Create a range index
remaining = [x for x in remaining if x not in ar.bad_epochs_idx]  # Remove the bads

I can indeed manage to further store this list within the epochs by doing epochs.info["index"] = remaining, however, I was wondering if that would be a good idea to store it directly within autoreject functions?

ENH/FIX: avoid changing state of autoreject objects when transforming

Currently calling transform updates the thresholds by calling ._vote_epochs ..
This needs to be fixed before being able to address #22
@jasmainak

FIX: track bad channels warnings.

We still have these floating around.

autoreject/autoreject/utils.py:57: UserWarning: 2 channels are marked as bad. These will be ignored.If you want them to be considered by autoreject please remove them from epochs.info["bads"].
  'remove them from epochs.info["bads"].' % n_bads)

ENH/API: only one fix_log

instead of having bad segments and fix_log, just have one thing.

Expose it with method.

ar = AutoRejectCV()
ar.fit(epochs)
X_in_sample = ar.annotate(epochs)  # like predict but better name for domain specific purpose
X_new = ar.annotate(other_epochs)

Where epochs can be the epochs used in fitting or new data.
The usage and semantics are then easy: Anything that is > than 0 is not good, 1 is bad, 2 is fixed. Document that.

handle channel types gracefully between fit and transform

cc @jona-sassenhagen this is the issue you mentioned?

add travis for python 3

speedup median computation

with @TomDLT we realized that the median is much slower to compute than the mean. The only way to speed this up is to compute the median in parallel with the 'threading' backend. It can give speed ups of around 30%

plot_epochs throws an error

Hi everyone, I'm getting this error when using plot_epochs from autoreject.

Traceback (most recent call last):

  File "<ipython-input-19-0dc82ca5b4c1>", line 1, in <module>
    plot_epochs(epochs_clean, bad_epochs_idx=ar.bad_epochs_idx, fix_log=ar.fix_log.as_matrix(),scalings=dict(eeg=40e-6), title='')

  File "C:\Users\alfine-l\AppData\Local\Continuum\Anaconda2\lib\site-packages\autoreject\viz.py", line 201, in plot_epochs
    title, picks)

  File "C:\Users\alfine-l\AppData\Local\Continuum\Anaconda2\lib\site-packages\autoreject\viz.py", line 439, in _prepare_mne_browse_epochs
    params['colors'][ch_idx][epoch_idx] = (1., 0., 0., 1.)

IndexError: list assignment index out of range

I've installed autoreject using pip following the procedure on http://autoreject.github.io/
The version of mne is 0.13.1 on Anaconda - Spyder (Python 2.7) 64-bit, on w7 64-bit.

The error pops out when trying execute the last line:

t = (-0.5,1.5)
consense = np.arange(.1,.2,.05)
n_interp = [16,32,48]

epochs = mne.Epochs(raw1,events1,ids,tmin=t[0],tmax=t[1],add_eeg_ref=False,preload=True,baseline=None,on_missing='warning')
n = len(epochs.events)
ar = LocalAutoRejectCV(consensus_percs=np.asarray(consense), n_interpolates=np.asarray(n_interp))
epochs_clean = ar.fit_transform(epochs.pick_types(eeg=True, meg=False))

print 'Removed {0} epochs.'.format(n-len(epochs_clean.events))

from autoreject import plot_epochs
plot_epochs(epochs_clean, bad_epochs_idx=ar.bad_epochs_idx, fix_log=ar.fix_log.as_matrix(),scalings=dict(eeg=40e-6), title='')

Do you have any of a possible cause?
thank you

Examples plot auto repair errors

ar.fit_transform(epochs['Auditory/Left']) complaining about multiple channel types in example script.

Traceback (most recent call last):
File "/home/ktavabi/Downloads/plot_auto_repair.py", line 99, in
epochs_clean = ar.fit_transform(epochs['Auditory/Left'])
File "/home/ktavabi/Projects/autoreject/autoreject/autoreject.py", line 762, in fit_transform
return self.fit(epochs).transform(epochs)
File "/home/ktavabi/Projects/autoreject/autoreject/autoreject.py", line 657, in fit
_check_data(epochs, verbose=self.verbose)
File "/home/ktavabi/Projects/autoreject/autoreject/autoreject.py", line 39, in _check_data
raise ValueError('AutoReject handles only one channel type for now')
ValueError: AutoReject handles only one channel type for now

ENH/API/DOC: checkout performance options for large data sets (>500 trials at high sampling frequency)

So far we had discussed

subsampling of fitting and interpolating epochs for data augmentation
fit on k epochs, transform all

The task here is to at least show how to use autoreject when data are massive.

cc @fraimondo @jasmainak @agramfort

Clarifying bad_epochs_idx arg in plot_epochs

Trying to use bad_epochs_idx argument in viz.plot_epochs, @giuliagennari and I got an issue.

Indeed, because of the line 182 :

epochs.drop_bad()

everything is built on a 'cleaner' subset of epochs.

Notably lines 317-318:

for color_idx in range(len(type_colors)):
    colors.append([type_colors[color_idx]] * len(epochs.events))

colors (later added to params) is a nested list of size len(ch_names) * len(epochs.events), the latter dimension being related to the 'cleaner' subset of epochs.

However, lines 434-439 we loop over the bad_epochs_idx in params['colors'][ch_names][epochs_idx]:

for epoch_idx in params['bads']:
    params['ax_hscroll'].patches[epoch_idx].set_color((1., 0., 0., 1.))
    params['ax_hscroll'].patches[epoch_idx].set_zorder(3)
    params['ax_hscroll'].patches[epoch_idx].set_edgecolor('w')
    for ch_idx in range(len(params['ch_names'])):
        params['colors'][ch_idx][epoch_idx] = (1., 0., 0., 1.)

what may exceed the highest index of the 'cleaner' subset of epochs (e.g. if one of the bad epochs to display is the last one of the intial epochs object before drop_bad).

So I am not sure to understand, given the code, what is the aim of bad_epochs_idx argument in conjonction with the call to drop_bad.
The two do not seem compatible IMHO (not considering the case of users having to anticipate the new indices in the 'cleaner' epochs subset of potentially bad epochs which are not rejected by drop_bad...).

BaseEpochs now public

It seems the recent move of making BaseEpochs public causes this error for me:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-42-d3601f0fe095> in <module>()
      5 #    ar.verbose = False
      6 #try:
----> 7 epochs = ar.transform(epochs)
      8 #except Exception as e:
      9 #    print("ar failed for ", name)

/home/jona/tools/autoreject/autoreject/autoreject.py in transform(self, epochs)
    590             The epochs object which must be cleaned.
    591         """
--> 592         return self._local_reject.transform(epochs)
    593 
    594     def fit_transform(self, epochs):

/home/jona/tools/autoreject/autoreject/autoreject.py in transform(self, epochs)
    352         """
    353         epochs = epochs.copy()
--> 354         _check_data(epochs)
    355 
    356         self._vote_epochs(epochs)

/home/jona/tools/autoreject/autoreject/autoreject.py in _check_data(epochs)
     24 
     25 def _check_data(epochs):
---> 26     if not isinstance(epochs, mne.epochs._BaseEpochs):
     27         raise ValueError('Only accepts MNE epochs objects.')
     28 

AttributeError: module 'mne.epochs' has no attribute '_BaseEpochs'

ENH: nice logging / printing for multiple channel types.

Sequel to #64

predict method for ar.bad_segments

remove LBFGS-B

suggested by @agramfort

Since we have a finite set of thresholds to evaluate the GP on, we don't need to run an LBFGS. We can simply evaluate the GP on all the thresholds. It will avoid problems like in #58.

optimizer/optimizer.py:195: UserWarning: The objective has been evaluated at this point before.

I'm getting the same warning message over and over again when trying to use LocalAutoRejectCV using all defaults. My code is roughly:

import mne
from autoreject import LocalAutoRejectCV

raws = [mne.io.read_raw_edf(file, eog=['EXG' + str(i) for i in range(9)],
                            montage=mne.channels.read_montage('biosemi64'),
                            verbose='WARNING')
       for file in files]

picks = [mne.pick_types(raw.info, eeg=True) for raw in raws]

events = [mne.find_events(raw, stim_channel='STI 014', verbose='WARNING')
          for raw in raws]

eventids = [event[:, 2][event[:, 2] < 255].tolist()
            for event in events]

epochs = [mne.Epochs(raw, events=event, event_id=eventid,
                     tmin=0, tmax=16, add_eeg_ref=False,
                     picks=pick,
                     verbose='WARNING')
          for raw, event, eventid, pick in zip(raws, events, eventids, picks)]

epoch = epochs[0]
epoch.load_data()
epoch.resample(256)

reject = LocalAutoRejectCV(epoch)
clean_epochs = reject.fit_transform(epoch)

This warning comes up seemingly on every iteration of the fit_transform call (more than 100 warnings):

/Library/anaconda/envs/py36/lib/python3.6/site-packages/skopt/optimizer/optimizer.py:195:
UserWarning: The objective has been evaluated at this point before.

Am I doing something wrong here?

dropping bad epochs with multiple channel types

I ran autoreject on a dataset containing MEG and EEG channels, however, it failed to transform the epochs due to an IndexError at line 636 in autoreject.py the reason being that bad_epochs_idx was of type float (causing the drop method to fail). Apparently this happens because np.union1d returns an array of floats when one of the inputs are an empty list. Possible fix would be changing line 612 to

bad_epochs_idx = np.union1d(bad_epochs_idx, bad_epochs_idx_).astype(np.int)

(or setting dtype anywhere after this).

unit test with one channel

nice idea from @agramfort

Unit test with simulated data on just one channel

autoreject release

Following conversation with @agramfort here I am creating a list of things to do before the release.

Feel free to add comments or edit it @agramfort @dengemann

Make GlobalAutoReject and LocalAutoreject private
Update API documentation to reflect new class related to RejectLog.
Fix verbosity issue with tqdm
Don't change CV in fit? # not sure I understood the issue here ... (cf #66 and XXX)
Close PR #92
LocalAutorejectCV -> AutoRejectCV and
repr

API: avoid partial + thresh_func outside to simplify usage / add n_jobs arg

@jasmainak @agramfort Is anything preventing us from hiding the complexity of making partial functions by just adding an n_jobs argument to AutoReject?

I think this would make things look even less surprising and simpler.

Compare:

ar = AutoReject(n_jobs=36)

Versus:

from autoreject import  compute_thresholds

from functools import partial  # noqa

thresh_func = partial(compute_thresholds, random_state=42, n_jobs=36)

ar = AutoReject(thresh_func=thresh_func)

ENH: improve performance, try to spot and avoid unnecessary copies during interpolation by relying on picks

To be looked into once #45 is in.

todos

add narrative to api description
remove joblibs for memcaching
use n_jobs using joblibs to make script parallel
do something about using private functions in mne
visualization comparable to drop_log in mne-python but color coded
simple example where you get back a rejection dict for each channel type
one example using hcp data
a method to sort sensors by number of bad segments (useful for maxfilter)
allow visualizing bad segments also after correction

gaussian_process warning message 308

Hello everybody!!!
I have a problem, this message appears in each iteration of autoreject during the 'computing thresholds' step:

/Users/ghfc/anaconda/lib/python3.6/site-packages/sklearn/gaussian_process/gpr.py:308: UserWarning: Predicted variances smaller than 0. Setting those variances to 0.
warnings.warn("Predicted variances smaller than 0. "

I'm using the version 0.15.dev0 of mne, and this set of versions:

numpy 1.12.1
scikit-learn 0.18.1 np111py36_1
scikit-optimize 0.3
scipy 0.19.0

I have other environment with other versions, and those messages never appear, and it seems to execute faster:

mne 0.14.dev0
numpy 1.11.1 py35_0
scikit-learn 0.18.1 np111py36_1
scikit-optimize 0.2
scipy 0.18.1

I will appreciate a lot If some one has an idea of how to improve this, because compute a file longs about 1 hour with the default settings for the LocalAutorejectCV()

Raise runtime error cause epochs.drop_log

This error raises for now using mne.io.read_raw_egi_mff #4017 with one of the files, i will continue checking it with the others:

In [20]: epochs_clean = ar.fit(epochs[::50]).transform(epochs)
Loading data for 18 events and 701 original time points ...
1 bad epochs dropped
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-290c8fbf6820> in <module>()
----> 1 epochs_clean = ar.fit(epochs[::50]).transform(epochs)

C:\Program Files\Anaconda3\envs\mne_dev\lib\site-packages\autoreject-0.1.dev0-py3.5.egg\autoreject\autoreject.py in fit(self, epochs)
    515             The epochs object to be fit.
    516         """
--> 517         _check_data(epochs)
    518         if self.cv is None:
    519             self.cv = KFold(len(epochs), n_folds=10)

C:\Program Files\Anaconda3\envs\mne_dev\lib\site-packages\autoreject-0.1.dev0-py3.5.egg\autoreject\autoreject.py in _check_data(epochs)
     38                'incomplete data). Please check that no epoch '
     39                'is dropped when you call epochs.drop_bad_epochs().')
---> 40         raise RuntimeError(msg)
     41
     42

RuntimeError: Some epochs are being dropped (maybe due to incomplete data). Please check that no epoch is dropped when you call epochs.drop_bad_epochs().

epochs.drop_log shows 2 droped epochs:

[['NO_DATA'],
 [],
 [],
...
 [],
 ['TOO_SHORT']]

@jasmainak

Obviously bad channel not detected

I've been using autoreject to clean up a 24-subject Biosemi EEG dataset. So far, awesome results! However, I have a few subjects that have obviously bad channels, which are not detected properly in my current pipeline. Any idea's why?

Here is an example. I'm loading data like this:

import mne
import autoreject
from functools import partial

raw = mne.io.read_raw_edf('my_subject.bdf, montage='biosemi32',
                          eog=['EXG1', 'EXG2', 'EXG3', 'EXG4'],
                          misc=['EXG5', 'EXG6', 'EXG7', 'EXG8'],
                          stim_channel='Status', preload=True)

# Apply the correct reference
raw = raw.set_eeg_reference(['EXG5', 'EXG6'])

# Bandpass filter
raw = raw.filter(0.5, 15, n_jobs=4)

# Extract epochs
events = mne.find_events(raw, mask=0xFF)
event_id = dict(related=6, unrelated=7)
epochs = mne.Epochs(raw, events, event_id, tmin=-0.2, tmax=1.0, baseline=(-0.2, 0), preload=True)

Plotting the evoked shows that channel AF3 (channel index 1) is bad:

Autoreject to the rescue! This is how I call it:

picks = mne.pick_types(epochs.info, eeg=True)  # Find indices of all EEG channels
thresh_func = partial(autoreject.compute_thresholds, picks=picks, method='random_search')
ar = autoreject.LocalAutoRejectCV(n_interpolates=[1, 4, 8], picks=picks, thresh_func=thresh_func, cv=10)
ar.fit(epochs)
epochs = ar.transform(epochs)

This is the reject log:

And here is the "cleaned" data:

Are there any parameters I can tweak to make autoreject properly identify the broken channel? In the meanwhile, a good workaround is first doing RANSAC, which will detect the bad channel, before doing autoreject. But autoreject should be able to detect this channel, no? Contact me on [email protected] if you'd like the data.

TypeError with plot_auto_repair example

Following the install instruction, running the example plot_auto_repair.py with Python 2.7.13, it gives the error:
TypeError: __init__() got an unexpected keyword argument 'picks'

warnings when building docs

reported by @agramfort

Parameters
----------
/Users/alex/work/src/autoreject/autoreject/autoreject.py:docstring of autoreject.validation_curve:19: SEVERE: Unexpected section title.

Returns
-------
/Users/alex/work/src/autoreject/autoreject/viz.py:docstring of autoreject.viz.plot_epochs:9: SEVERE: Unexpected section title.

Parameters
----------
/Users/alex/work/src/autoreject/autoreject/viz.py:docstring of autoreject.viz.plot_epochs:46: SEVERE: Unexpected section title.

Returns
-------
/Users/alex/work/src/autoreject/autoreject/viz.py:docstring of autoreject.viz.plot_epochs:51: SEVERE: Unexpected section title.

consensus_perc -> consensus

It should be either int (#channels) or (% channels)

Ransac vs LocalAutoRejectCV

Hey guys,
First thanks for your work on this trully awesome (and so useful) package.

Altough I understand it's at its early dev stage, I have a quick question:
From the docs I understood that Ransac is for detecting/repairing bad channels and LocalAutoRejectCV for detecting/repairing bad epochs. Is that correct?

If so, does the following pipeline make any sense to you?

picks = mne.pick_types(epochs.info, eeg=True, stim=False, eog=False)

# Repair channels
ransac = autoreject.Ransac(verbose=False, picks=picks, n_jobs=1)
epochs = ransac.fit_transform(epochs)
bad_channels = ransac.bad_chs_

# Repair Epochs
if method == "repair":
    ar =  autoreject.LocalAutoRejectCV(verbose=False, picks=picks)
    epochs = ar.fit_transform(epochs)
    bad_epochs = ar.bad_epochs_idx
    reject = autoreject.get_rejection_threshold(epochs)
elif method == "reject":
    reject = autoreject.get_rejection_threshold(epochs)
    epochs.drop_bad(reject=reject)
    # bad_epochs = ? # I don't know how to get a list of dropped bad epochs
else:
    pass
drop_log = epochs.drop_log

Shoud I use interpolate_bads() somewhere?

Thanks a lot!!!

ENH/FIX: regression tests / ground truth

It seems we have no regression tests, especially not for autorejct itself.
To avoid loosing time and work when refactoring autoreject we need to make sure the code detects known bad segments. We should think about using simulated/modified data.

Typo on autoreject.github.io

On the Github pages (and in doc/index.rst, which I'm guessing is where the site pulls from) - there is this code section:

from autoreject import get_rejection_thresholds
reject = get_rejection_thresholds(epochs)

When I try to run it, I can't import get_rejection_thresholds, and when I check the code, it seems simply to be a typo - the function is called get_rejection_threshold, (without the s).

Anyways, just a heads up on that.

ENH/DOC: add documentation to bad idx and bad segments properties

and explain what they are and how to use them.

MISC/API/STY

Starting to collect a few observations and potential issues here.

if following sklearn conventions is a goal it should be .fix_log_ not .fix_log
~~I am wondering if dependencies should be managed more parsimoniously, e.g., refrain from using pandas, the fix log could be dumb matrix.~~
~~dependencies should be caught at construction time, skopt import errors are yelling it me when I'm in the middle of things.~~
API: LocalAutoReject / AutoRejectLocal / AutoReject(scope='local')
~~any parallelisation possible? Some bottleneck analyses may help.~~

That's it for now. Adding more things as we proceed. All up for discussion.