ratschlab / dpsom Goto Github PK

Code associated with ACM-CHIL 21 paper 'T-DPSOM - An Interpretable Clustering Method for Unsupervised Learning of Patient Health States'

Home Page: https://arxiv.org/abs/1910.01590

License: MIT License

Python 54.01% Jupyter Notebook 45.99%

dpsom's Introduction

T-DPSOM - An Interpretable Clustering Method for Unsupervised Learning of Patient Health States

Reference

Laura Manduchi, Matthias Hüser, Martin Faltys, Julia Vogt, Gunnar Rätsch,and Vincent Fortuin. 2021. T-DPSOM - An Interpretable Clustering Methodfor Unsupervised Learning of Patient Health States. InACM Conference onHealth, Inference, and Learning (ACM CHIL ’21), April 8–10, 2021, VirtualEvent, USA.ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3450439.3451872

Training and Evaluation

Deep Probabilistic SOM

The training script of DPSOM model is dpsom/DPSOM.py, the model is defined in dpsom/DPSOM_model.py. To train and test the DPSOM model on the MNIST dataset using default parameters and feed-forward layers:

python DPSOM.py

This will train the model and then it will output the clustering performance on test set.

To use convolutional layers:

python DPSOM.py with convolution=True

Other possible configurations:

validation: if True it will evaluate the model on validation set (default False).
val_epochs: if True the clustering results are saved every 10 epochs on default output files (default False).
more_runs: if True it will run the model 10 times and it will output the NMI and Purity means with standard errors (default False).

To train and test the DPSOM model on the Fashion MNIST dataset using default parameters and feed-forward layers:

python DPSOM.py with data_set="fMNIST" beta=0.4

To use convolutional layers:

python DPSOM.py with data_set="fMNIST" beta=0.4 convolution=True

To investigate the role of the weight of the SOM loss use

python DPSOM.py with beta=<new_value>

default is beta=0.25.

To reconstruct the centroids of the learned 2D SOM grid into the input space we refer to the Notebook notebooks/centroids_rec.ipynb.

Temporal DPSOM

eICU preprocessing pipeline

The major preprocessing steps, which have to be performed sequentially, starting from the raw eICU tables in CSV format, are listed below. The scripts expect the tables to be stored in data/csv. Intermediate data is stored in various sub-folders of data.

(a) Conversion of raw CSV tables, which can be downloaded from https://eicu-crd.mit.edu/ after access is granted, to HDF versions of the tables. (eicu_preproc/hdf_convert.py)

(b) Filtering of ICU stays based on inclusion criteria. (eicu_preproc/save_all_pids.py, eicu_preproc/filter_patients.py)

(d) Selection of variables to include in the multi-variate time series, from the vital signs and lab measurement tables. (eicu_preproc/filter_variables.py)

(e) Conversion of the eICU data to a regular time grid format using forward filling imputation, which can be processed by VarTPSOM. (eicu_preproc/timegrid_all_patients.py, eicu_preproc/timegrid_one_batch.py)

(f) Labeling of the time points in the time series with the current/future worse physiology scores as well as dynamic mortality, which are used in the enrichment analyses and data visualizations. (eicu_preproc/label_all_patients.py, eicu_preproc/label_one_batch.py)

Saving the eICU data-set

Insert the paths of the obtained preprocessed data into the script eicu_preproc/save_model_inputs.py and run it.

The script selects the last 72 time-step of each time-series and the following labels:

'full_score_1', 'full_score_6', 'full_score_12', 'full_score_24', 'hospital_discharge_expired_1', 'hospital_discharge_expired_6', 'hospital_discharge_expired_12', 'hospital_discharge_expired_24', 'unit_discharge_expired_1', 'unit_discharge_expired_6', 'unit_discharge_expired_12', 'unit_discharge_expired_24'

It then saves the dataset in a csv table in data/eICU_data.csv.

Training the model

Once the data is saved in data/eICU_data.csv, the entire model can be trained using:

python TempDPSOM.py

It will output NMI clustering results using APACHE scores as labels and save them in results_eICU.txt.

Better prediction performances can be obtain with:

python TempDPSOM.py with latent_dim=100

It will save the prediction performances on the file results_eICU_pred.txt.

To train the model without prediction, use:

python TempDPSOM.py with eta=0

To train the model without smoothness loss and without prediction, use:

python TempDPSOM.py with eta=0 kappa=0

Other experiments as computing heatmaps and trajectories can be found in notebook/eICU_experiments.ipynb.

dpsom's People

Contributors

Stargazers

Watchers

Forkers

lauramanduchi ginward mattphillipsphd jvpoulos shenshian joseph8923 barbarioli vandallrandall cdchushig heitorrapela kdaisuke0203 joeyuzhou niceboy120 angelo-ortiz vamorel coolcodelvs ireneangelucci

dpsom's Issues

data_process, no 'var_quantiles.json'

I'm facing an issue. In the file timegrid_one_batch.py .

# Location of some meta-files
    parser.add_argument("--selected_pid_list", default="../data/included_pid_stays.txt",  help="Specify the lists of PIDs to use") 
    parser.add_argument("--pid_batch_file", default="../data/patient_batches.pickle", help="Specify the map from PIDs to batches") 
    parser.add_argument('--selected_lab_vars', default="../data/included_lab_variables.txt",help="Specify the file with the list of lab variables to use") 
    parser.add_argument("--quantile_dict", default="../data/var_quantiles.json", help="Precomputed data quantiles in the eICU data-set that can be used to remove outliers")

How could I get var_quantiles.json, which is used for removing outliers?
I check the code and don't find how to get this file,

Memory error with T-DPSOM

Hi,
I'm trying to use T-DPSOM code with another multivariate time series dataset, where each series has a length of 144 and 9 channels.

I have modified the methods "inputs" and "x" in "TempDPSOM_model.py" in the following way:

@lazy_scope
def inputs(self):
   x = tf.placeholder(tf.float32, shape=[None, self.input_size, self.input_channels], name="x")
   return x

@lazy_scope
def x(self):
   x = tf.reshape(self.inputs, [-1, self.input_channels])
   return x

and the initialization of the model by passing the right values for "input_size" and "input_channels"

Actually the code generates the following error before the start of the training:

2022-02-18 13:55:36.786045: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 11 Chunks of size 86400000 totalling 906.37MiB
2022-02-18 13:55:36.786143: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 156301824 totalling 149.06MiB
2022-02-18 13:55:36.786242: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 345600000 totalling 988.77MiB
2022-02-18 13:55:36.786339: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 2.04GiB
2022-02-18 13:55:36.786447: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes_: 2258003456 memory_limit_: 2258003559 available bytes: 103 curr_region_allocation_bytes_: 4516007424
2022-02-18 13:55:36.786613: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats: 
Limit:                      2258003559
InUse:                      2188879360
MaxInUse:                   2188879616
NumAllocs:                         181
MaxAllocSize:                345600000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2022-02-18 13:55:36.799434: W tensorflow/core/common_runtime/bfc_allocator.cc:474] ******************__*****************************************************************************xxx
2022-02-18 13:55:36.799597: W tensorflow/core/framework/op_kernel.cc:1733] RESOURCE_EXHAUSTED: failed to allocate memory
ERROR - hyperopt - Failed after 0:00:38!
Traceback (most recent calls WITHOUT Sacred internals):
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1377, in _do_call
    return fn(*args)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[optimize/Adam_2/update/_92]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 660, in main
    results = train_model(model, data_train, data_val, endpoints_total_val, lr_val, prior_val)
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 254, in train_model
    train_step_ae.run(feed_dict=f_dic)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 2755, in run
    _run_using_default_session(self, feed_dict, self.graph, session)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 5804, in _run_using_default_session
    session.run(operation, feed_dict)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1396, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:

Detected at node 'optimize/gradients_2/zeros_10' defined at (most recent call last):
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
      def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
      self.run_commandline()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
      return self.run(
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
      run()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
      self.result = self.main_function(*args)
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
      result = wrapped(*args, **kwargs)
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
      model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
      self.optimize
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
      setattr(self, attribute, function(self))
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
      train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
Node: 'optimize/gradients_2/zeros_10'
Detected at node 'optimize/gradients_2/zeros_10' defined at (most recent call last):
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
      def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
      self.run_commandline()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
      return self.run(
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
      run()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
      self.result = self.main_function(*args)
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
      result = wrapped(*args, **kwargs)
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
      model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
      self.optimize
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
      setattr(self, attribute, function(self))
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
      train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
Node: 'optimize/gradients_2/zeros_10'
2 root error(s) found.
  (0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[optimize/Adam_2/update/_92]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'optimize/gradients_2/zeros_10':
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
    def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
    self.run_commandline()
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
    return self.run(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
    run()
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
    model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
    self.optimize
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
    setattr(self, attribute, function(self))
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
    train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\training\optimizer.py", line 477, in minimize
    grads_and_vars = self.compute_gradients(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\training\optimizer.py", line 603, in compute_gradients
    grads = gradients.gradients(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 165, in gradients
    return gradients_util._GradientsHelper(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 671, in _GradientsHelper
    out_grads[i] = control_flow_state.ZerosLike(op, i)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\control_flow_state.py", line 835, in ZerosLike
    return _ZerosLikeV1(op, index)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\control_flow_state.py", line 801, in _ZerosLikeV1
    return array_ops.zeros(zeros_shape, dtype=val.dtype)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\dispatch.py", line 1082, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2927, in wrapped
    tensor = fun(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2988, in zeros
    output = fill(shape, constant(zero, dtype=dtype), name=name)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\dispatch.py", line 1082, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 238, in fill
    result = gen_array_ops.fill(dims, value, name=name)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3508, in fill
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 740, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 3776, in _create_op_internal
    ret = Operation(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 2175, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)


  0%|          | 0/250 [00:31<?, ?it/s]

How can I solve it?

normalization_values.h5 does not exist

I ran all the pre-processing steps in order, and then when I run eicu_preproc/save_model_inputs.py I get the error on line 75:

FileNotFoundError: File ../data/time_grid/normalization_values.h5 does not exist

I searched the repo and couldn't find the script that produces this file.

In my ../data/time_grid/ folder I have batch_*.h5

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.