ai4healthuol / sssd-ecg Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 3.0 8.93 MB

Generation of synthetic 12-lead electrocardiograms conditioned on 71 ECG statements from the PTB-XL dataset.

License: MIT License

Python 96.98% Jupyter Notebook 3.02%

conditional-generation diffusion-models ecg-signals healthcare

sssd-ecg's People

Contributors

Stargazers

Watchers

Forkers

univanxx cwinjet

sssd-ecg's Issues

How to run forecasting task.

Hi, I'm trying to run the forecasting task on PTB-XL (248) dataset. Can you clarify what all changes need to do to run forecasting task? I have tried all the previous solution mentioned in closed issues but getting errors. Also, while inference, what is the purpose of "number of utterance to be generated" argument?

Failure to train model on cpu

I adjusted the running environment of the code from cuda to cpu and found that the model doesn't run properly

The grad of the calculated loss was none, and after optimiser.step(), the parameters of the model changed to nan, and the training could not be continued.

Can you help me with this or May I ask what kind of environment you have over there, including python and pytorch versions, hardware informations, etc.?

Label Mapping for Generated Data

Hi,

I have been interested in generating synthetic ECGs from the real data for sometime now. I came across your project and SSSD-ECG model seemed interesting and I wanted to test it myself. I have trained the model with the available PTB-XL dataset with the default configurations and also managed to perform inference on the test dataset.
This is where the problem arises for me. I now have generated the ECGs by providing the number of samples required and by using the test labels already available i.e. ''ptbxl_test_labels.npy'. Now that I have the ECGs generated from the latest checkpoint, how do I map the right diagnosis label name to the ECG.

Maybe this is a naive question but it would be really helpful if you could provide some information about my problem @juanlopezcode .
Thanks in advance!

How to train using any data?

Hello,
I've tried to make your code work on my environment by simply only running train.py but I already stumble into an error:

SSSD_ECG.py tries to import from S4Model.py:

from models.S4Model import S4Layer

but there are no class or function that was named "S4Layer", I thought it might have been a mistake, and that it should be the class S4 so I changed the name of the class calls from "S4Layer" to "S4" which I assume was the class to be fetched but it gives me:

TypeError: __init__() missing 1 required positional argument: 'd_model'

Is there a possibility to add instructions to the readme?

Thanks in advance! And great work.

Get 'ptbxl_train_labels.npy' file

Hello, I tried to get the 'ptbxl_train_labels.npy' file by running ecg_data_preprocessing.ipynb. In the end of the notebook, I got the dataloaders like 'ds_test', and dataframes like 'df_test'. How should I get 'ptbxl_train_labels.npy' file from those data?

No module named 'clinical_ts.stratify'

When I run the jupyter notebook 'ecg_data_preprocessing.ipynb'. There is an error in ptb_xl/clinical_ts/ecg_utils.py:31. Which is the 'from .stratify import stratify,stratify_batched'

Clarification on Dataset Published

I came across your work on SSSD-ECG and I found it very useful. However, I have some clarifications regarding the datasets you published in https://figshare.com/s/43df16e4a50e4dd0a0c5 .
The train, test and validation dataset account for 21837 ECG signals , but PTB-XL has only 21799 signals. This has a mismatch of 38 signals. Assuming that the data published in https://mega.nz/folder/UfUDFYjS#YYUJ3CCUGb6ZNmJdCZLseg is the preprocessed PTB-XL data, this dataset also has the same issue. In this case how do I get the corresponding patient-ID for each ECG signal in the synthetic dataset?
Your help would me much appreciated.

Thanks :)

Data Loader

SSSD-ECG/src/sssd/train.py

Line 81 in d82aa2e

data_ptbxl = np.load('ptbxl_train_data.npy')

How should the data be formatted?

Dataset

https://figshare.com/s/43df16e4a50e4dd0a0c5
May I ask whether the data in this link is the data after preprocessing of the original data set or the data generated by the model?Thank you！

Provide model training on synthetic data

Hello! I was really inspired by your article and want to repeat the results of Table 1 for the capacity to replace real data metric. So, I'm interested in training on synthetic data and testing on real data. But I didn't get such good results (macro_auc=0.586).

For this purpose, I downloaded the synthetic dataset you provided. After unzipping, it has the following structure:

- data
    - ptbxl_test_data.npy
    - ptbxl_train_data.npy
    - ptbxl_validation_data.npy
- labels
    - ptbxl_test_labels.npy
    - ptbxl_train_labels.npy
    - ptbxl_validation_labels.npy

The next step was to train the neural network. For this I used the recommended benchmark https://github.com/helme/ecg_ptbxl_benchmarking/tree/master.
To run it on a synthetic dataset I needed to change prepare method of code.experiments.scp_experiments.SCP_Experiment class. To do this, I removed the content related to getting the train-val (up to this line) and changed it by loading synthetic data according to

synth_path = 'path/to/Dataset/

self.y_train = np.load(os.path.join(synth_path, 'labels', 'ptbxl_train_labels.npy'))
self.y_val = np.load(os.path.join(synth_path, 'labels', 'ptbxl_validation_labels.npy'))
self.y_test = np.load(os.path.join(synth_path, 'labels', 'ptbxl_test_labels.npy'))
self.X_train = np.load(os.path.join(synth_path, 'data', 'ptbxl_train_data.npy'))
self.X_val = np.load(os.path.join(synth_path, 'data', 'ptbxl_validation_data.npy'))
self.X_test = np.load(os.path.join(synth_path, 'data', 'ptbxl_test_data.npy'))

self.X_train = np.transpose(self.X_train, (0, 2, 1))
self.X_val = np.transpose(self.X_val, (0, 2, 1))
self.X_test = np.transpose(self.X_test, (0, 2, 1))

self.input_shape = self.X_train[0].shape

Next I just ran code.reproduce_results.py only for xresnet1d50 and ptbxl_all task according to the paper:

def main():
    
    datafolder = '../data/ptbxl/'
    datafolder_icbeb = '../data/ICBEB/'
    outputfolder = '../output/'

    models = [
        conf_fastai_xresnet1d50,
        # conf_fastai_resnet1d_wang,
        # conf_fastai_lstm,
        # conf_fastai_lstm_bidir,
        # conf_fastai_fcn_wang,
        # conf_fastai_inception1d,
        # conf_wavelet_standard_nn,
        ]

    ##########################################
    # STANDARD SCP EXPERIMENTS ON PTBXL
    ##########################################

    experiments = [
        ('exp0', 'all'),
        # ('exp1', 'diagnostic'),
        # ('exp1.1', 'subdiagnostic'),
        # ('exp1.1.1', 'superdiagnostic'),
        # ('exp2', 'form'),
        # ('exp3', 'rhythm')
       ]

    for name, task in experiments:
        e = SCP_Experiment(name, task, datafolder, outputfolder, models)
        e.prepare()
        e.perform()
        e.evaluate()

    # generate greate summary table
    utils.generate_ptbxl_summary_table()

For testing on synthetic data I got suitable results from the table (te_results.csv in outputs directory):

,macro_auc
point,0.988766041489155
mean,0.988766041489155
lower,0.988766041489155
upper,0.988766041489155

But when I use the trained model on the inference of real test data, I get metrics

,macro_auc
point,0.5860400175807828
mean,0.5860400175807828
lower,0.5860400175807828
upper,0.5860400175807828

which are significantly lower than those stated in the article and differ little from the baselines.
I also include code for inference of the trained model on real data for complete reproducibility of the experiment.

from utils import utils
# model configs
from configs.fastai_configs import *
from configs.wavelet_configs import *
from models.fastai_model import fastai_model
import numpy as np
import multiprocessing
from itertools import repeat
import pandas as pd


def main():
    # Prepare data
    datafolder = '../data/ptbxl/'
    outputfolder = '../output/'
    experiment_name = 'exp0'
    sf = 100
    task = 'all'
    test_fold = 10
    n_jobs=20

    data, raw_labels = utils.load_dataset(datafolder, sf, name='ptbxl')
    labels = utils.compute_label_aggregations(raw_labels, datafolder, task)

    data, labels, Y, _ = utils.select_data(data, labels, task, min_samples=0, outputfolder='')
    input_shape = data[0].shape
    
    # 10th fold for testing (9th for now)
    X_test = data[labels.strat_fold == test_fold]
    y_test = Y[labels.strat_fold == test_fold]
    n_classes = y_test.shape[1]

    # Load model
    config = conf_fastai_xresnet1d50
    modelname = config['modelname']
    modeltype = config['modeltype']
    modelparams = config['parameters']
    mpath = outputfolder+experiment_name+'/models/'+modelname+'/'
    model = fastai_model(modelname, n_classes, sf, mpath, input_shape, **modelparams)

    # Pedict
    y_test_pred = model.predict(X_test)

    # Get metrics
    test_samples = np.array([range(len(y_test))])
    rpath = mpath+'results/'
    thresholds = None
    pool = multiprocessing.Pool(n_jobs)

    te_df = pd.concat(pool.starmap(utils.generate_results, zip(test_samples, repeat(y_test), repeat(y_test_pred), repeat(thresholds))))
    te_df_point = utils.generate_results(range(len(y_test)), y_test, y_test_pred, thresholds)
    te_df_result = pd.DataFrame(
        np.array([
            te_df_point.mean().values, 
            te_df.mean().values,
            te_df.quantile(0.05).values,
            te_df.quantile(0.95).values]), 
        columns=te_df.columns, 
        index=['point', 'mean', 'lower', 'upper'])
    
    pool.close()

    te_df_result.to_csv(rpath+'real_te_results.csv')


if __name__ == '__main__':
    main()

Can you provide trained GAN network .pt weight files?

Hi, I am trying to generate ECG signals in two GAN networks. Can you provide the trained GAN network .pt weights file or the npy file of the ECG signals generated by the GAN networks?

XResNet50

Hello, can you provide the source code for ECG classification using Xresnet50?

How to visualize the generated sample

Hello, I run the inference.py script and successfully generated a sample with the "generate_four_leads()" function. The shape of sample is (400,12,1000) and the shape of label is (400,44). Are the shapes correct? I am little struggling to visualize the output. Could you please give me some advices on how to use matplotlib to visualize the generated sample? Thank you!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.