cbrnr / sleepecg Goto Github PK
View Code? Open in Web Editor NEWSleep stage detection using ECG
License: BSD 3-Clause "New" or "Revised" License
Sleep stage detection using ECG
License: BSD 3-Clause "New" or "Revised" License
(as part of JOSS Review)
When installing SleepECG
with extras, following the instructions in the Contributing guide dev dependencies fails when using zsh
as shell (which is now the default unser ) because square brackets need to be escaped. Thus, I would maybe suggest to add an additional note that zsh users should call pip install -e ".[dev]"
(with quotes). This command should also work when using other shells (such as bash
).
As discussed in #41, we need to wait for the following packages to provide binary wheels for Python 3.10:
It looks like GUDB is not accessible anymore, at least not without registration:
https://berndporr.github.io/ECG-GUDB/
Our reader errors out, e.g.
404 Client Error: Not Found for url: https://berndporr.github.io/ECG-GUDB/experiment_data/subject_22/jogging/annotation_cables.tsv
We might need to use the API provided by the ecg-gudb-database
package.
The tests still import the C implementation directly. We should use the public function instead and perform the tests with all three backends.
This raises the additional question: how do I build the C extension locally?
When we require Python 3.9 in the future, we can make use of the following updates:
If you encounter situations for which Python 3.9 would allow a more elegant solution, please note them here.
The documentation contains different numbers describing the size of the MESA data set:
@hofaflo can you clarify how this fits together?
This is a quick issue relating to the JOSS review comments at: openjournals/joss-reviews#5411 (comment)
For development I'd usually expect to see a file containing the dependencies somewhere (e.g. requirements.txt or environment.yml). Does this exist? I see that the dependencies are listed in README.md, but it would be nice to have a file that can be used by package managers (e.g. pip, conda).
I know the package is still new, but I just did a Google search for "python ecg peak detection", and SleepECG doesn't show up in the first eight pages (we all know that no one ever goes to page 2). Any ideas on how we could increase our visibility? I think people should really be able to find SleepECG's Pan-Tompkins peak detector, because it is the best:tm: implementation available.
Currently, we have read_mitbih()
, which reads two specific data sets from the MIT-BIH database: MITDB and LTDB. It would be useful to have a dedicated reader function for each database, i.e. read_mitdb()
and read_ltdb()
(both of which could use a private version of read_mitbih()
under the hood.
@hofaflo should we make a quick v0.4.2 release so that we get Python 3.10 support and the new website design out there? The classification stuff will go into v0.5.0 then.
(as part of the JOSS Review)
The straightforward installation of SleepECG
fails on my system (Python 3.9 on 2017 Macbook Pro with macOS Ventura 13.3, Intel CPU, no GPU) using the current pyproject.toml
version with the following error message:
with the following error message:
Could not find a version that satisfies the requirement tensorflow-macos>=2.7.0; (sys_platform == "darwin" and python_version < "3.11") and extra == "full" (from sleepecg[full]) (from versions: none)
Replacing the optional dependencies tensorflow-macos
and tensorflow-metal
by tensorflow
in the pyproject.toml
file fixes the issue and correctly installs the package with all dependencies.
Currently, we list all functions and classes explicitly when generating our API docs, e.g.
.. autosummary::
:toctree: generated
:nosignatures:
download_nsrr
download_physionet
export_ecg_record
read_gudb
read_ltdb
read_mesa
read_mitdb
read_shhs
read_slpdb
set_nsrr_token
ECGRecord
SleepRecord
SleepStage
However, these are our public objects (not starting with _
), so is it not possible to tell the plugin to use all public functions so we don't have to list them explicitly?
I noticed that our LICENSE
is not included in the source package. However, when I build it locally with python setup.py sdist
, this file is actually copied into the package. I suspect this is because the CI environment is using older versions of setuptools
etc., so it might be worth updating packages? Alternatively, maybe it is already sufficient if we build on Python 3.9 instead of 3.7 here?
Hi,
I am so excited to see your wonderful work! Is there any release about the prediction performance of the ECG signals only or comparison with EEG signals?
Thanks a lot~
Maybe this is just a very minor and insignificant detail, but I noticed that prepare_data_keras()
returns a float
stages array after padding and one-hot encoding (even when the input argument is int8
). Would it make sense to return an int8
stages array instead?
The function detect_heartbeats()
can be easily confused when presented with time series containing a flat signal e.g. at the beginning of the recording.
from scipy.misc import electrocardiogram
from sleepecg import detect_heartbeats
# This works as expected
ecg = electrocardiogram()
detect_heartbeats(ecg, fs=1000)
# However, even when presented with 60 seconds of flat signal, the function still returns (a lot of) peaks
detect_heartbeats(np.repeat(-1.5, 60000), fs=1000)
The problem is that the flat signal at the beginning or at the end of the time series is not discarded automatically. Of course, better performance is achieved by providing better quality data, but I think this is something that should be automatically handled by the function: no peak detection on the portion of the signal consisting of consecutive constant values over at least n samples, at the beginning or at the end (ideally also if this happened during the recording).
In addition to that, the erroneous detection of (lot of) peaks in these segments appears to considerably slow down the function itself, the following time series (~1h ECG recording starting with ~45 s of flat signal) can for example completely crash the function that will hang forever (or at least for a very long time).
ecg.zip
(as part of the JOSS Review)
The paper is easy written, easy to follow, and nicely outlines the need of the SleepECG
package. It provides a good motivation, especially regarding the reproducibility of the existing approaches. Just some minor comments:
SleepECG
could help in this process (namely everywhere) would be beneficial.Should the readers in the first column on https://sleepecg.readthedocs.io/en/stable/datasets.html get formatted as code (i.e. enclosed in backticks)?
I don't know if this is even possible, but it would be nice to run the examples in the docs automatically and include their output (e.g. figures, text output, ...) in the docs. Currently, this has to be done manually.
(as part of the JOSS review)
In the README, you wrote: "Based only on ECG (and to a lesser extent also movement data), SleepECG
provides functions for [...]". I'm wondering where the movement data comes from since you didn't mention it elsewhere and also have no functions for importing, processing, and extracting features from acceleration or actigraphy data.
The goal is to have functions which download and read all common polysomnography datasets. Each record will be stored in a dataclass containing heartbeat times and annotated sleep stages.
Each reader should be added via a separate PR, this issue gives an overview of the progress.
Datasets:
Planned future extensions:
There seems to be an issue with
sleepecg/sleepecg/io/sleep_readers.py
Line 653 in 61b15a8
There is no channel name 'ECG' in latest MESA release, do not know if there was before or when it was changed.
a list of current channel names is as follows
['EKG', 'EOG-L', 'EOG-R', 'EMG', 'EEG1', 'EEG2', 'EEG3', 'Pres', 'Flow', 'Snore', 'Thor', 'Abdo', 'Leg', 'Therm', 'Pos', 'EKG_Off', 'EOG-L_Off', 'EOG-R_Off', 'EMG_Off', 'EEG1_Off', 'EEG2_Off', 'EEG3_Off', 'Pleth', 'OxStatus', 'SpO2', 'HR', 'DHR']
Really awesome work on this package! I have been using the detect_heartbeats function as my default QRS detector in many projects, and I love it. It is super fast and robust.
This issue is related to #87. I think that the unit tests in test_heartbeats.py could be more comprehensive. Specifically, the following scenarios should be tested:
I have included below some quick-and-dirty code to check these three scenarios, using SciPy's ECG dataset. Let me know if you think this would be a valuable addition.
import mne
import sleepecg
import numpy as np
from scipy.misc import electrocardiogram
ecg = electrocardiogram()
sf = 360
# The ground-truth
pks_360hz = sleepecg.detect_heartbeats(ecg, sf)
def f1_score(y_pred, y_true):
tp, fp, fn = sleepecg.compare_heartbeats(y_pred, y_true, max_distance=5)
tp, fp, fn = len(tp), len(fp), len(fn)
f1 = tp / (tp + 0.5 * (fp + fn))
return f1
Resampling
for new_sf in [72, 90, 100, 120, 180, 256, 360, 720, 1080, 1800, 3600, 7200]:
# Resample
res_factor = new_sf / sf
ecg_res = mne.filter.resample(ecg, up=res_factor)
pks = sleepecg.detect_heartbeats(ecg_res, new_sf)
# Compare to the original SF
pks = np.round(pks / res_factor).astype(int)
f1 = f1_score(pks, pks_360hz)
print(f"{new_sf} Hz (x{res_factor:.2f}) » F1 = {f1:.3f}")
The F1-score is 0.96 at 72 Hz, 0.97 at 120 Hz and 0.99 at 256 Hz and higher frequencies.
Rescaling
for scale in [1e-5, 1e-4, 1e-3, 1e-2, 0.1, 1, 10, 1e2, 1e3, 1e4, 1e5]:
pks = sleepecg.detect_heartbeats(ecg * scale, sf)
# Compare to the original SF
f1 = f1_score(pks, pks_360hz)
print(f"{scale} » F1 = {f1:.3f}")
The F1-score is always 1, so the algorithm is indeed unit-independent.
Padding with flat data
ecg_pad_after = np.pad(ecg, pad_width=(0, ecg.size), mode="edge")
pks = sleepecg.detect_heartbeats(ecg_pad_after, sf)
f1_score(pks, pks_360hz) # F1 = 1
ecg_pad_before = np.pad(ecg, pad_width=(ecg.size, 0), mode="edge")
pks = sleepecg.detect_heartbeats(ecg_pad_before, sf)
f1_score(pks, pks_360hz + ecg.size) # F1 = 0.36
Cheers,
Raphael
We are currently building manylinux-2014
wheels, but in addition to those we also get manylinux-1
wheels. Not sure if this is a bug, because the wheels seem to be identical.
Hello @cbrnr, congrats to the public release!
Are you planning to have a package on conda-forge
too? I can help you create it (or create it for you ;)). Just let me know!
When running examples/classifiers/wrn_gru_mesa.py
, I noticed several warnings during extract_features()
:
sleepecg/sleepecg/feature_extraction.py:363: RuntimeWarning: HR analysis window too short for estimating PSD for feature VLF. 3030.3s required, got 270s
warnings.warn(msg, category=RuntimeWarning)
sleepecg/sleepecg/feature_extraction.py:225: RuntimeWarning: Mean of empty slice
meanNN = np.nanmean(NN, axis=1)
sleepecg/sleepecg/feature_extraction.py:226: RuntimeWarning: All-NaN slice encountered
maxNN = np.nanmax(NN, axis=1)
sleepecg/sleepecg/feature_extraction.py:227: RuntimeWarning: All-NaN slice encountered
minNN = np.nanmin(NN, axis=1)
sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1878: RuntimeWarning: Degrees of freedom <= 0 for slice.
var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
sleepecg/sleepecg/feature_extraction.py:232: RuntimeWarning: Mean of empty slice
RMSSD = np.sqrt(np.nanmean(SD**2, axis=1))
sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1217: RuntimeWarning: All-NaN slice encountered
r, k = function_base._ureduce(a, func=_nanmedian, axis=axis, out=out,
sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1583: RuntimeWarning: All-NaN slice encountered
result = np.apply_along_axis(_nanquantile_1d, axis, a, q,
sleepecg/sleepecg/feature_extraction.py:244: RuntimeWarning: Mean of empty slice
cvSD = SDSD / np.nanmean(SD, axis=1)
sleepecg/sleepecg/feature_extraction.py:244: RuntimeWarning: divide by zero encountered in divide
cvSD = SDSD / np.nanmean(SD, axis=1)
sleepecg/.direnv/python-3.10.9/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1095: RuntimeWarning: All-NaN slice encountered | 12/1970 [00:06<15:03, 2.17it/s]
result = np.apply_along_axis(_nanmedian1d, axis, a, overwrite_input)
sleepecg/sleepecg/feature_extraction.py:252: RuntimeWarning: invalid value encountered in sqrt | 23/1970 [00:12<17:47, 1.82it/s]
SD2 = (2 * SDNN**2 - SD1**2) ** 0.5
sleepecg/sleepecg/feature_extraction.py:256: RuntimeWarning: divide by zero encountered in divide | 32/1970 [00:16<15:35, 2.07it/s]
CSI = SD2 / SD1
sleepecg/sleepecg/feature_extraction.py:257: RuntimeWarning: divide by zero encountered in log10
CVI = np.log10(SD1 * SD2 * 16)
sleepecg/sleepecg/feature_extraction.py:244: RuntimeWarning: invalid value encountered in divide | 1166/1970 [10:26<07:01, 1.91it/s]
cvSD = SDSD / np.nanmean(SD, axis=1)
sleepecg/sleepecg/feature_extraction.py:254: RuntimeWarning: invalid value encountered in divide
SD1_SD2_ratio = SD1 / SD2
sleepecg/sleepecg/feature_extraction.py:256: RuntimeWarning: invalid value encountered in divide
CSI = SD2 / SD1
Are these warnings problematic? Should we take a closer look and try to fix them? Or if they are OK, how do they influence the results (is a feature nan
e.g. when a division by zero occurs)?
Am I misunderstanding how our readers are supposed to work? I thought that datasets are only downloaded once if they are not available locally. However, it seems like this is not the case:
import sleepecg
list(sleepecg.read_gudb()) # download if not already available in ~/.sleepecg/datasets
list(sleepecg.read_gudb()) # this should not trigger another download, but it does
The docstring states:
Required files are downloaded if not present in '<data_dir>/gudb'.
It also seems like this only affects sleepecg.read_gudb()
.
All reader and downloader functions have a data_dir
parameter in the last spot. Only download_physionet()
has this parameter in the first spot. We should streamline this and consistently use data_dir
as the last parameter.
SleepECG currently includes three pre-trained classifiers trained on 1971 records (and tested on another 1000 records). However, when applying this classifier to unseen (new) data, would it not make sense to have it trained on all available records (i.e. 2971)? Of course we would not have a performance measure then, but we would have more information available for training.
What is the preferred way (best practice) to distribute pre-trained classifiers?
Hi,
I would like to try out architectures that don't require hrv calculation but directly use the ECG signal for example (but not limited to) over here (still requires r peak calculation) and add it to this library if they are working. And would also like to extend the task to sleep apnea detection as SHHS also comes with apnea labels. Need your comments on whether these approaches are feasible.
Given that Apple has started switching all of their products to ARM64, I think it would be useful if we also provided corresponding wheels. According to cibuildwheel's docs, this should work for Python 3.8+.
Building the conda-forge package in conda-forge/sleepecg-feedstock#19 (comment) revealed two warnings:
sleepecg.classifiers
and sleepecg.test
. These directories have been added automatically for now, but in the future this might not be the case. We could either add them manually in setup.cfg
or pyproject.toml
(https://setuptools.pypa.io/en/latest/userguide/package_discovery.html), or add empty __init__.py
files in these directories. sleepecg/_heartbeat_detection.c: In function '_thresholding':
sleepecg/_heartbeat_detection.c:475:45: warning: 'RR_high_limit' may be used uninitialized in this function [-Wmaybe-uninitialized]
475 | if (RR_low_limit < RR_n && RR_n < RR_high_limit)
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
sleepecg/_heartbeat_detection.c:475:24: warning: 'RR_low_limit' may be used uninitialized in this function [-Wmaybe-uninitialized]
475 | if (RR_low_limit < RR_n && RR_n < RR_high_limit)
| ^
sleepecg/_heartbeat_detection.c:330:33: warning: 'PEAKI' may be used uninitialized in this function [-Wmaybe-uninitialized]
330 | SPKI = 0.25 * PEAKI + 0.75 * SPKI;
| ~~~~~^~~~~~~
I noticed some changes in our benchmarks when I tried to reproduce it with the latest versions:
@hofaflo any idea what is going on?
Links on the documentation website are currently broken (at least the way they are rendered):
Google shows this URL when searching for "sleepecg": https://sleepecg.readthedocs.io/en/latest/generated/sleepecg.html
This page doesn't exist anymore. Any idea if this is a problem we can solve? The correct URL is https://sleepecg.readthedocs.io/en/stable/ BTW.
I turned on basic type checking for the first time (didn't know that you had to opt in), and boy there are a lot of errors. Since we are using type annotations, we should definitely fix all the errors (otherwise why are we using annotations in the first place).
This should be unnecesary once the issue is fixed upstream by holgern/pyedflib#151.
When we require Python 3.8 in the future, we can make use of the following updates:
io.utils._calculate_checksum
: replace while True
/ if not chunk
If you encounter situations for which Python 3.8 would allow a more elegant solution, please note them here.
(as part of JOSS Review)
The hypnogram in the example try_ws_gru_mesa.py
isn't shown when running the code because plt.show()
is missing in the end. Adding this line correctly displays the figure.
Currently, all detectors need to be installed even when running a benchmark with only one or two detectors. I think it would be better to soft-import detectors to avoid this.
I was wondering why the following classes are not exported (i.e. part of sleepecg/io/__init__.py
and therefore also in the API docs):
sleepecg.io.sleep_readers.Gender
sleepecg.io.sleep_readers.SubjectData
These are "public" classes, and should therefore be exported (or made "private" by appending an underscore).
Similarly, sleepecg.io.gudb.GUDB_MD5
is "public" (but not exported), I think here it would be consistent to rename to _GUDB_MD5
.
@hofaflo WDYT?
We should use ~/.sleepecg
to store various configuration values. For example, all downloaders should store their data in ~/.sleepecg/datasets
by default.
It would be nice if a given ECGRecord
could be plotted, e.g. with a .plot()
method. If the record contains an annotation
field, these should also be plotted. In addition, it should be possible to plot custom markers (e.g. detected heartbeats), resulting in something like this:
(green = annotations, red = custom markers, i.e. detected heartbeats)
Maybe it's just me, but when I run python benchmark_detectors.py runtime
in examples/benchmark
, I do not get any records, i.e. the line records = list(reader_dispatch(db_slug, data_dir))
yields an empty list. Is this caused by our recent changes in v0.4.0?
Our API documentation looks very nice except for the following issues (left over from #126):
These issues can only be addressed upstream (in griffe and python-handler repos) by adding support for "See Also", "Warnings", "Notes", and "References", see e.g. here.
Hi,
This is an issue I have encountered a few times already — when the ECG data is very noisy for either part of or most of the recording, the detect_heartbeats
function just gets stuck forever. This is problematic when you run peak detection across hundreds of participants and cannot visually check each recording.
To reproduce the error:
Download the EDF file with 8 hours ECG: https://drive.google.com/file/d/1ReYlFPAd3-dYk0C2WF7nDRQS3MxE2lyE/view?usp=share_link
import mne
from sleepecg import detect_heartbeats
# Load ECG
raw = mne.io.read_raw_edf("original_ecg.EDF", preload=True, verbose=False)
ecg = raw.get_data(units="uV")[0]
sf = raw.info["sfreq"]# Load ECG
# Peak detection
detect_heartbeats(ecg, sf) # gets stuck
In that case, the ECG data is so bad that honestly it would be better not to attempt the peak detection at all...!
Potential solutions:
method="zhao2018"
in https://neuropsychology.github.io/NeuroKit/functions/ecg.html#ecg-quality)Thanks,
Raphael
I am a bit unhappy that Numba is blocking our support for Python 3.11 (see #124). Since Numba is not a required dependency for us, would it be possible to still add Python 3.11 support but without Numba? What would we need to do to implement this (I'm sure we need to at least adapt our tests). Any suggestions @hofaflo?
When loading a classifier that ships with SleepECG, a bunch of warnings are printed. I wonder if these warnings indicate real problems (in which case we should fix them) or if they can be ignored (in which case we should silence them).
WDYT @hofaflo?
import sleepecg
clf = sleepecg.load_classifier("wrn-gru-mesa", "SleepECG")
I tried changing the logger level after we import tensorflow
, but that didn't seem to do anything. Setting the environment variable here (before the import) does get rid of all warnings except one message:
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
The remaining warning is an error:
2023-05-02 12:06:33.352056: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
So setting the level to "3"
gets rid of that one too.
Now that we have a MANIFEST.in
, we should add all missing files. I think these at least include everything in doc
and img
, but probably also .github
. Anything else? I'd try to include every file under source control.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.