antonior92 / automatic-ecg-diagnosis Goto Github PK

Scripts and modules for training and testing neural network for ECG automatic classification. Companion code to the paper "Automatic diagnosis of the 12-lead ECG using a deep neural network".

Home Page: https://www.nature.com/articles/s41467-020-15432-4

License: MIT License

Python 100.00%

deep-learning convolutional-neural-networks ecg ecg-signal ecg-classification atrial-fibrillation atrial-fibrillation-detection

automatic-ecg-diagnosis's People

Contributors

Stargazers

Watchers

Forkers

ominux fgr1986 ed13santi mohsen-goodarzi masterchen ansuini frederico-klein labljy mdmamunhasan mosababoidrees kasikrit nikitaomare chen709847237 prateekiitr ecg-cloud medical-projects ecgkit jeffgan99 sanmiandresofa meikkko1026 haaifa tarilabs conjacal juuulia17 xfunture habibmrad zeigar 19-ade jeremykid syedazkarul huhuni melodyre stjordanis sh-r linhduongtuan yb73 anisimovkv idyanoleroy dhirajsuvarna zhanxuejie calvincsc chapmanbe sherazkhan gamanakis arvind267 cwinjet rspencer7007 tsungchima emirhanai zakikurdya ahart97 brstar96 zhangyihang haroon-wahab xingyu96 swarnaraj25 sameherajili shuaih720 amazurek1 mtokami the-killbill yoonlee-lab skumarece44 sergiolaranjo ngockim0228 jocerdikiawann kognitive-medizinische-assistent-komed wzoery kis12 thomz1 delphi1977 hankhsu1227 diting-li soumyajits2000 ehtii brantty hafisa19 sdd920717 wetang7 caphadoop amilcar mimasss2 wangmengxiao319 manhhv87 loopipoopy jasonchan-zr tida121418 slonikix goodshawn12 rdyan0053 fatemehmaghsoomi1 radreports alessandrocarotenuto kiraprint brunooss cube3power data-ai-ml-services manalibandivadekar pedromg suntzuisafteru

automatic-ecg-diagnosis's Issues

Got an error in running load model

Hello Antonio,

I downloaded mode: model.hdf5 and
test_data: ecg_tracings.hdf5

Here is my python library version:

Package              Version
-------------------- ------------
absl-py              0.9.0
astor                0.8.1
cycler               0.10.0
gast                 0.2.2
google-pasta         0.2.0
grpcio               1.28.1
h5py                 2.10.0
joblib               0.14.1
Keras                2.1.6
Keras-Applications   1.0.8
Keras-Preprocessing  1.1.0
kiwisolver           1.1.0
Markdown             3.2.1
matplotlib           3.0.3
numpy                1.18.3
opt-einsum           3.2.1
pandas               0.24.2
pip                  20.1
pkg-resources        0.0.0
protobuf             3.11.3
pyparsing            2.4.7
python-dateutil      2.8.1
pytz                 2020.1
PyYAML               5.3.1
scikit-learn         0.22.2.post1
scipy                1.4.1
seaborn              0.9.1
setuptools           46.1.3
six                  1.14.0
tensorboard          1.15.0
tensorflow           1.15.2
tensorflow-estimator 1.15.1
termcolor            1.1.0
tqdm                 4.45.0
Werkzeug             1.0.1
wheel                0.34.2
wrapt                1.12.1
xarray               0.12.3
xmljson              0.2.1

When I run predict.py

python3  predict.py --tracings ecg_tracings.hdf5 --model model.hdf5 --output_file output

It return the following error:

Traceback (most recent call last):
  File "predict.py", line 32, in <module>
    model = load_model(args.model, compile=False)
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/models.py", line 270, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/models.py", line 347, in model_from_config
    return layer_module.deserialize(config, custom_objects=custom_objects)
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/layers/__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/utils/generic_utils.py", line 144, in deserialize_keras_object
    list(custom_objects.items())))
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/engine/topology.py", line 2525, in from_config
    process_layer(layer_data)
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/engine/topology.py", line 2511, in process_layer
    custom_objects=custom_objects)
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/layers/__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/utils/generic_utils.py", line 146, in deserialize_keras_object
    return cls.from_config(config['config'])
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/engine/topology.py", line 1271, in from_config
    return cls(**config)
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/media/cvc/e82f78f6-bf9e-4253-912b-799fdd6c7d15/automatic-ecg-diagnosis/auto-ecg/lib/python3.5/site-packages/keras/layers/convolutional.py", line 337, in __init__
    **kwargs)
TypeError: __init__() got multiple values for keyword argument 'data_format'

Have you met the similar issues or do you have any ideas?

Best wishes

Unable to predict

python predict.py model/model.hdf5 --ouput_file outputs/

Traceback (most recent call last):
  File "predict.py", line 28, in <module>
    seq = ECGSequence(args.path_to_hdf5, args.dataset_name, batch_size=args.bs)
  File "/media/raghav/Win11/ECG/datasets.py", line 25, in __init__
    self.x = self.f[hdf5_dset]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/raghav/.conda/envs/ecg/lib/python3.8/site-packages/h5py/_hl/group.py", line 264, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'tracings' doesn't exist)"

Is the order of `exam_id ` corresponding to the order of `tracing` in full dataset?

Hi,
I want to use the full dataset for training, but I only need the data of the AF category.

So I want to use the following code to extract the data of the AF category and convert it into the mat format. But I'm not sure whether exam_id and tracing order correspond.
Here is my code:

import h5py
path_to_file = '../code15/exams_part0.hdf5'
f = h5py.File(path_to_file, 'r')
traces_ids = np.array(f['exam_id'])    # (20001,)
traces = np.array(f['tracings'])   # (20001,4096,12)
df = pd.read_csv('../code15/exams.csv')
for i, (trace_id, trace) in enumerate(zip(traces_ids, traces)):  # I'm not sure whether `exam_id` and `tracing` order correspond
    print(trace_id, trace)
    # use the trace_id to get the label
    res = df[df['exam_id'] == trace_id]
    if res['AF'] == 'TRUE':
        # save af file to mat file

Contradicting input description in Readme

Hi,

when trying to use your model on another dataset, I got confused with this sentence in the main Readme (and Zenodo descriptions of both test data and model):

"All signal are represented as 32 bits floating point numbers at the scale 1e-4V: so if the signal is in V it should be multiplied by 1000 before feeding it to the neural network model."

Did you mean 1e-3V? 1e-4 would be a factor of 10,000, but 1000 seemed to work on my data.

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

Hello!
I use the data set mentioned in the article：CODE-15%: a large scale annotated dataset of 12-lead ECGs
But I really can't solve this problem，The following is the error message：

File "D:\NutnetDisk\CODE\PYCHARMProject\automatic-ecg-diagnosis-master\train.py", line 53, in
verbose=1)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper
return method(self, *args, **kwargs)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\keras\engine\training.py", line 815, in fit
model=self)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 1112, in init
model=model)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 908, in init
**kwargs)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 775, in init
peek = _process_tensorlike(peek)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 1013, in _process_tensorlike
inputs = nest.map_structure(_convert_numpy_and_scipy, inputs)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\util\nest.py", line 617, in map_structure
structure[0], [func(*x) for x in entries],
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\util\nest.py", line 617, in
structure[0], [func(*x) for x in entries],
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 1008, in _convert_numpy_and_scipy
return ops.convert_to_tensor(x, dtype=dtype)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\framework\ops.py", line 1341, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\framework\tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\framework\constant_op.py", line 262, in constant
allow_broadcast=True)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\framework\constant_op.py", line 270, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "C:\Users\ty.conda\envs\mul-auto-ecg\lib\site-packages\tensorflow\python\framework\constant_op.py", line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

Problem for the signal amplitude.

I am confused to this sentence: All signal are represented as 32 bits floating point numbers at the scale 1e-4V: so if the signal is in V it should be multiplied by 1000 before feeding it to the neural network model.
So, the given ECG is stored in 0.1V, for it have been multiplied by 1000?

Question about the testing dataset

Hello, I have some confusion regarding the test dataset. I check the contents of the test dataset and found that some data does not belong to any category, such as the data in the 6th row.
On the other hand, there are some data points that belong to multiple categories, like the data in the 17th row.
I'm wondering why the dataset contains data that is not single-labeled, like the data in the 3rd row.
What impact would it have if we use the data from the 6th and 17th rows for training?

Data preprocessing

Thank you for making the testing data freely available for downloading. In reading your paper I couldn't find reference to ecg data preprocessing. Is the released test data normalized? From a quick look into it, it appears to be standardized. If it was, was it standardized over the complete training set dataset or training set per lead?

Results from model not matching table

Hello All,

I really enjoyed your paper and wanted to see how I can reproduce the results from Table 2. I downloaded the model and test dataset. When I predict the classes using model.predict and use a threshold of 0.5 I get the following breakdown of precision, recall and f1-scores. What were the thresholds used to get the results in Table 2? Am I missing any steps taken?

Thank you!


classification report

              precision    recall  f1-score   support

           0       1.00      0.25      0.40        28
           1       0.94      0.85      0.89        34
           2       1.00      0.83      0.91        30
           3       0.86      0.75      0.80        16
           4       1.00      0.69      0.82        13
           5       0.93      0.76      0.84        37

   micro avg       0.95      0.70      0.80       158
   macro avg       0.95      0.69      0.78       158
weighted avg       0.96      0.70      0.78       158
 samples avg       0.13      0.12      0.13       158

request for the full ecg dataset

Hi,
I use the Monash student email to request the full CODE dataset: [doi: 10.17044/scilifelab.15169716](https://doi.org/10.17044/scilifelab.15169716).

However, I encountered such a problem when I log in this site to request the dataset：

Could you share the dataset by other tools? Google Drive or Baidu Drive. Thank you!!!

Error about modifying the input shape

The model requires us to input data with shape = (N, 4096, 12), but because my training set is data with a sampling rate of 500HZ and a recording duration of 10 seconds. So I wish to modify the input shape=(N, 5000, 12).

I updated the code in the model.py as follows:

# signal = Input(shape=(4096, 12), dtype=np.float32, name='signal')

signal = Input(shape=(5000, 12), dtype=np.float32, name='signal')

but it caused some problems, the errors content of the question is as follows:

2021-12-09 14:51:34.523544: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-12-09 14:51:34.523562: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2021-12-09 14:51:34.523576: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (nmslab-004118.nms.kcl.ac.uk): /proc/driver/nvidia/version does not exist
2021-12-09 14:51:34.523874: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-12-09 14:51:34.663666: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3600000000 Hz
2021-12-09 14:51:34.664192: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7efcb4000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-12-09 14:51:34.664222: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-12-09 14:51:35.462783: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.
Epoch 1/70
2021-12-09 14:51:43.805704: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.
 1/25 [>.............................] - ETA: 0s - loss: 0.80672021-12-09 14:51:48.490054: I tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: ./logs/train/plugins/profile/2021_12_09_14_51_48
2021-12-09 14:51:48.504534: I tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to ./logs/train/plugins/profile/2021_12_09_14_51_48/nmslab-004118.nms.kcl.ac.uk.trace.json.gz
2021-12-09 14:51:48.544836: I tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 0.323 ms

2021-12-09 14:51:48.572273: I tensorflow/python/profiler/internal/profiler_wrapper.cc:87] Creating directory: ./logs/train/plugins/profile/2021_12_09_14_51_48Dumped tool data for overview_page.pb to ./logs/train/plugins/profile/2021_12_09_14_51_48/nmslab-004118.nms.kcl.ac.uk.overview_page.pb
Dumped tool data for input_pipeline.pb to ./logs/train/plugins/profile/2021_12_09_14_51_48/nmslab-004118.nms.kcl.ac.uk.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to ./logs/train/plugins/profile/2021_12_09_14_51_48/nmslab-004118.nms.kcl.ac.uk.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to ./logs/train/plugins/profile/2021_12_09_14_51_48/nmslab-004118.nms.kcl.ac.uk.kernel_stats.pb
25/25 [==============================] - ETA: 0s - loss: 0.8029Traceback (most recent call last):
  File "train.py", line 60, in <module>
    history = model.fit(train_seq,
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 862, in fit
    val_logs = self.evaluate(
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1081, in evaluate
    tmp_logs = test_function(iterator)
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 650, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/k20020475/PycharmProjects/auto_ecg_dnn/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Incompatible shapes: [31,1] vs. [32,1]
         [[node binary_crossentropy/logistic_loss/mul (defined at train.py:60) ]] [Op:__inference_test_function_5585]

Function call stack:
test_function

How can I solve this problem?

How to convert my csv to hdf5

what is the conversion technique used for converting csv to hdf5 format. the conventional method from pandas dataframe gives error in many ways.
can you share some snippet which is used to convert raw values in csv format to hdf5 format

Suspicious metrics evaluation

Hi there!

I'm watching your model evaluation code and have found suspicious line of code:
https://github.com/antonior92/automatic-ecg-diagnosis/blob/master/generate_figures_and_tables.py#L120

Here you choose optimal threshold values by true labels value. Is it fair to choose hyperparameters each time you evaluate your model? If you would use constant threshold values for each evaluation your final metrics are changed somehow.

Why use a convolution kernel of size 16 in this paper?

Hi, this is a very interesting paper documenting ability of a deep neural network (DNN) to identify abnormalities in a standard 12-lead ECG recordings. I have some confusion after reading the paper.

Q1: The convolution kernel of size 3 is widely used in the field of CV. Why use a convolution kernel of size 16 in this paper?

Q2: Other sizes of convolution kernels are listed in the paper. Why are the results not attached?

"The convolutional layers have filter length 16, starting with 4096 samples and 64 filters for the first layer and residual block and increasing the number of filters by 64 every second residual block and subsampling by a factor of 4 every residual block. Max Pooling51 and convolutional layers with filter length 1 (1x1 Conv) are included in the skip connections to make the dimensions match those from the signals in the main branch."

"The hyperparameters were chosen among the following options: residual neural networks with {2, 4, 8, 16} residual blocks, kernel size {8, 16, 32}, batch size {16, 32, 64}, initial learning rate {0.01, 0.001, 0.0001}, optimization algorithms {SGD, ADAM}, activation functions {ReLU, ELU}, dropout rate {0, 0.5, 0.8}, number of epochs without improvement in plateaus between 5 and 10, that would result in a reduction in the learning rate between 0.1 and 0.5. "

h5py Error when converting CSV data to hdf5

I've 12-Lead ecg data recorded in csv format, i converted this csv to hdf5 as below

df = pandas.read_csv(file, delimiter=',', usecols=[0,1,2,3,4,5,6,7,8,9,10,11], skiprows=1,header=None)
df.to_hdf('ecg.hdf5', key='tracings', mode='w')

when i run the converted hdf5, to predict.py i get error

  File "/home/pi/.local/lib/python3.7/site-packages/h5py/_hl/base.py", line 137, in _e
    name = name.encode('ascii')
AttributeError: 'tuple' object has no attribute 'encode'

am i going wrong anywhere?
but works fine when i run sampleData.hdf5, i couldn't pass my csv data

XML and restingecg input

Hi, I have XML and restingecg files. I don't know how to use it for this repo. Would you help me?

Question about training

Hi, I'm trying to run training.py my own, but I have this error.
I thought problem would be on the type of train_seq, but I can't fix it. Is there anyone have same problem?
Thank you!

Database format

Dear Antonio,

First, I hope that you're fine in this difficult times.

I have a dataset with functional data (float type) and I would like to know if it's possible to run the train code (train.py) with that type of data. I was watching your dataset and I saw that's binarized.

I think that your data in first instance doesn't have the binary form that has now. Then, I would like to know How do you transform your data to binary? And How was the original type/form of your data?

If you prefer mantain the privacity of your dataset, you can send me an email with your answer to [email protected].

I appreciate your help for academic purposes. Best Regards.

Ignacio Pizarro G.