Comments (32)
Yes, we currently only look at the first and last timestamps when resampling. Even without resampling, we only look at the time of the first timestamp and then use the effective sampling rate for the remaining samples. I guess we need to consider all timestamps.
from mnelab.
Thanks @DominiqueMakowski, you are correct that we currently assume regularly spaced samples per stream. Using pandas to handle interpolation is actually very clever, I wonder why I haven't thought of it before 😆 (the tradeoff is of course that it is a rather large dependency, but maybe worth it even for MNELAB).
Could you share the file with me so that I can play around with it to get a better grasp of the problem?
Regarding your function, how can I avoid linear interpolation for such a long interruption? By setting fillmissing=1/fs
(where fs
is the expected sampling frequency of the stream)?
Finally, I think it would be beneficial to get resampling directly into pyXDF. Did you check the implementation in xdf-modules/pyxdf#1 by any chance?
from mnelab.
Could you share the file with me
Dropped you an email
Regarding your function, how can I avoid linear interpolation for such a long interruption?
I added a fillmissing
argument that sets a limit in seconds on wether to prevent interpolations of long periods. It seem to work nicely and leaves NaNs, which is not the best for MNE but that's another issue
Finally, I think it would be beneficial to get resampling directly into pyXDF.
I agree...
from mnelab.
I added a
fillmissing
argument that sets a limit in seconds on wether to prevent interpolations of long periods
So I set it to 1/fs
? The default None
seems to turn off NaNs completely if I'm not mistaken.
from mnelab.
It does the transformation above so you just need to specify it in seconds, like 0.5 or 0.1. And yes None by default leaves all interruptions (normally)
from mnelab.
If we are to have similar code in NK and mnelab, we might want to outsource some of it to pyxdf.
What I could see is a pyxdf.xdf_to_dataframe()
function that tries to import pandas (and if not, errors saying that pandas is required for this function) - i.e., pandas is an optional dependency, and then converts and resamples the xdf to a dataframe (like what we do in NK now). Then, mnelab can extract the array from it and put that into mne.Raw and NK can use it as-is.
from mnelab.
If we are to have similar code in NK and mnelab, we might want to outsource some of it to pyxdf.
Agreed. Resampling should really be handled by pyXDF, and a proposed solution already exists (although I'm not sure how easy it will be to rebase and if it is still working). But this should be discussed directly with the pyXDF people (maybe in xdf-modules/pyxdf#1).
from mnelab.
For future reference: that method might suffer from some loss of precision, from my small experimentations using the union of existing and new indices was giving the best results
from mnelab.
@DominiqueMakowski is the description of the signals in your top post correct? I think you might have mixed up the colors.
Just to be sure, the correct (expected) signal should contain a segment with missing data in the first second?
When loading just stream 4 (with or without resampling), I get this time series:
So I'm wondering if the import worked, and the problem is maybe in the to_data_frame()
method? I can't investigate further for a couple of days (weeks?), so if you find anything in the meantime please LMK.
from mnelab.
One more observation, the MNELAB GUI doesn't let you choose sampling frequencies greater than the highest sampling frequency in the file, i.e. 1000Hz in this example. Even then, the signals look exactly like in the screenshot, so maybe it's because you resample to 2000Hz (I doubt it, but still worth checking)?
from mnelab.
Although I did spot another (probably unrelated) issue: the suggested resampling when selecting all streams is 52Hz, but it should be 1000Hz. Not sure what's going on here, but this seems like a separate issue.
from mnelab.
Just to be sure, the correct (expected) signal should contain a segment with missing data in the first second?
No, I think the whole recording is like several minutes so it should be within the first minute or so (the time axis is messed up in my fig)
so maybe it's because you resample to 2000Hz (I doubt it, but still worth checking)?
The upsampling is done to avoid aliasing when merging signals with uneven sampling rates, but it should have fairly minimal impact
from mnelab.
No, I think the whole recording is like several minutes so it should be within the first minute or so (the time axis is messed up in my fig)
So your three example plots do not actually show the problem? Sorry, I'm confused now, but now I don't understand what the problem with MNELAB is...
from mnelab.
Can you zoom out in your fig to see all the signal horizontally?
from mnelab.
Yes:
Looks the same whether I resample to 256Hz or not (in which case the sampling rate is 256.021Hz).
from mnelab.
can you share the code to reproduce this fig?
from mnelab.
This is all done in MNELAB with GUI commands, but here is the corresponding code (available in View – History). For example, here's the code for loading all streams and resampling to 1000Hz:
from copy import deepcopy
import mne
from mnelab.io import read_raw
datasets = []
data = read_raw(
"/Users/clemens/Data/biosignal-test-data/XDF/sub-01_ses-S001_task-HCT_run-001_eeg.xdf",
stream_ids=[1, 2, 3, 4, 5],
fs_new=1000.0,
preload=True
)
datasets.insert(0, data)
data.plot(events=events, n_channels=18)
from mnelab.
Haha yes but that's the whole problem, mnelab assumes that samples are evenly sampled. That's the raw signal:
import pyxdf
streams, _ = pyxdf.load_xdf(
"./raw/physio/sub-01/ses-S001/eeg/sub-01_ses-S001_task-HCT_run-001_eeg.xdf"
)
plt.plot(streams[3]["time_stamps"],
streams[3]["time_series"][:, 0])
from mnelab.
OK, now I get it. Here's a screenshot showing the first channel over the entire duration, top (blue): original data as obtained with PyXDF, bottom (black): data imported with MNELAB:
from mnelab.
@DominiqueMakowski I wonder if interpolating missing data is the best solution. Would it not be better to use NaN values instead? Otherwise, it is difficult to determine if data collection (using a device with a given regular sampling frequency) worked, or if there was a gap where no data samples have been recorded. After all, you don't want to process the interpolated data, right?
from mnelab.
I opted for a user-defined duration, that allows to keep interruptions longer than a given time
from mnelab.
Ah, right! That's a good approach. So everything > than that duration is filled in as NaNs, right?
from mnelab.
I should have read the thread again, you already mentioned this before! Sorry about the noise!
from mnelab.
no worries haha I'm very often guilty of that as well
from mnelab.
from mnelab.
Yes, none is probably not the best, but then the right default depends on signals, like 1 second of EEG is probably too much, but for other signals like EDA it could be alright. Another option is not to set a default but to throw warnings if a break is detected.
The reader in neurokit is also made with neurokit in mind, which doesn't deal super well with nans
from mnelab.
Good point. Every regularly sampled XDF stream has a nominal sampling frequency, so we could use it to define a default. Conservatively, we could choose everything > 1/fs to be filled with NaNs, but this is likely too small. Maybe > 2/fs is a better choice? It seems like a value depending on fs makes more sense than an absolute time interval.
from mnelab.
I have another question @DominiqueMakowski. You are using df.interpolate()
with method="index"
. This means linear interpolation between index values, right? I'm not sure if I understand the docs correctly.
from mnelab.
from mnelab.
But the question still remains: how do you define a discontinuity based on the signal type? You'd have to use type-specific durations to determine it, or no? Technically, I think it's easiest to take the nominal fs to decide if there are gaps in the signal and then emit a warning. This relies only on the fs and not on the type and domain-specific interpretation of a signal (i.e. which gap is still acceptable).
from mnelab.
You are using df.interpolate() with method="index". This means linear interpolation between index values, right?
tbh I wouldn't be able to exactly explain how pandas work here, indeed their docs are a bit mysterious. All I can say is that from my trial and errors attempts that was the way that worked the best in preserving the original signal 🤷
I think it's easiest to take the nominal fs to decide if there are gaps
I think that's fine, yeah. in general slower signals will tend to have a lower nominal frequency (at least for some devices). I think we can be fairly conservative with warnings, so users can then explicitly specify more liberal rules
from mnelab.
Quick comment, this problem also occurs without resampling, i.e. when loading just one stream. MNELAB currently does not handle gaps. It assumes that all data points are available at all time points defined by the nominal sampling frequency. In fact, MNELAB just looks at the first timestamp, but completely ignores all other timestamps.
So to fix this problem, I think we will need to resample (interpolate) all XDF streams, even if it's just one stream. Then we can take a look at how resampling two (or more) streams to a common sampling frequency behaves.
from mnelab.
Related Issues (20)
- Pip installing on windows 10 HOT 6
- Font size too large on Windows HOT 2
- Add encoding option for reader
- Remove flake8? HOT 1
- Error in stream selection HOT 4
- ERDS plots not working, fix needed with newer MNE HOT 1
- wrong units when reading xdf HOT 4
- Toolbar icons not showing in windows HOT 7
- open mnelab with a preloaded file HOT 2
- Add command line parser
- Use edfio instead of pyEDFlib HOT 2
- Switch to MkDocs
- MNE Qt Browser does not work with MNELAB HOT 30
- Error reading XDF file: invalid variable-length integer encountered. HOT 2
- Better defaults with raw plot HOT 1
- Launching `mnelab` fails with NumPy version >= 1.25 HOT 2
- Add support for selecting plot backends
- Can't use the starting commands HOT 8
- Add MNE-Python config to settings dialog
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mnelab.