Giter VIP home page Giter VIP logo

python-example-2023's Introduction

Python example code for the George B. Moody PhysioNet Challenge 2023

What's in this repository?

This repository contains a simple example that illustrates how to format a Python entry for the George B. Moody PhysioNet Challenge 2023. We recommend that you use this repository as a template for your entry. You can remove some of the code, reuse other code, and add new code to create your entry. You do not need to use the models, features, and/or libraries in this example for your approach. We encourage a diversity of approaches for the Challenge.

For this example, we implemented a random forest model with several features. This simple example is designed not not to perform well, so you should not use it as a baseline for your model's performance. You can try it by running the following commands on the Challenge training set. These commands should take a few minutes or less to run from start to finish on a recent personal computer.

This code uses four main scripts, described below, to train and run a model for the Challenge.

How do I run these scripts?

You can install the dependencies for these scripts by creating a Docker image (see below) and running

pip install -r requirements.txt

You can train your model by running

python train_model.py training_data model

where

  • training_data (input; required) is a folder with the training data files and
  • model (output; required) is a folder for saving your model.

You can run your trained model by running

python run_model.py model test_data test_outputs

where

  • model (input; required) is a folder for loading your model,
  • test_data (input; required) is a folder with the validation or test data files (you can use the training data for debugging and cross-validation, but the validation and test data will not have labels and will have 12, 24, 48, or 72 hours of data), and
  • test_outputs is a folder for saving your model outputs.

The Challenge website provides a training database with a description of the contents and structure of the data files.

You can evaluate your model by pulling or downloading the evaluation code and running

python evaluate_model.py labels outputs scores.csv

where

  • labels is a folder with labels for the data, such as the training database on the PhysioNet webpage,
  • outputs is a folder containing files with your model's outputs for the data, and
  • scores.csv (optional) is a collection of scores for your model.

Which scripts I can edit?

Please edit the following script to add your code:

  • team_code.py is a script with functions for training and running your trained model.

Please do not edit the following scripts. We will use the unedited versions of these scripts when running your code:

  • train_model.py is a script for training your model.
  • run_model.py is a script for running your trained model.
  • helper_code.py is a script with helper functions that we used for our code. You are welcome to use them in your code.

These scripts must remain in the root path of your repository, but you can put other scripts and other files elsewhere in your repository.

How do I train, save, load, and run my model?

To train and save your models, please edit the train_challenge_model function in the team_code.py script. Please do not edit the input or output arguments of the train_challenge_model function.

To load and run your trained model, please edit the load_challenge_model and run_challenge_model functions in the team_code.py script. Please do not edit the input or output arguments of the functions of the load_challenge_model and run_challenge_model functions.

How do I run these scripts in Docker?

Docker and similar platforms allow you to containerize and package your code with specific dependencies so that your code can be reliably run in other computational environments .

To guarantee that we can run your code, please install Docker, build a Docker image from your code, and run it on the training data. To quickly check your code for bugs, you may want to run it on a small subset of the training data.

If you have trouble running your code, then please try the follow steps to run the example code.

  1. Create a folder example in your home directory with several subfolders.

     user@computer:~$ cd ~/
     user@computer:~$ mkdir example
     user@computer:~$ cd example
     user@computer:~/example$ mkdir training_data test_data model test_outputs
    
  2. Download the training data from the Challenge website. Put some of the training data in training_data and test_data. You can use some of the training data to check your code (and you should perform cross-validation on the training data to evaluate your algorithm).

  3. Download or clone this repository in your terminal.

     user@computer:~/example$ git clone https://github.com/physionetchallenges/python-example-2023.git
    
  4. Build a Docker image and run the example code in your terminal.

     user@computer:~/example$ ls
     model  python-example-2023  test_data  test_outputs  training_data
    
     user@computer:~/example$ cd python-example-2023/
    
     user@computer:~/example/python-example-2023$ docker build -t image .
    
     Sending build context to Docker daemon  [...]kB
     [...]
     Successfully tagged image:latest
    
     user@computer:~/example/python-example-2023$ docker run -it -v ~/example/model:/challenge/model -v ~/example/test_data:/challenge/test_data -v ~/example/test_outputs:/challenge/test_outputs -v ~/example/training_data:/challenge/training_data image bash
    
     root@[...]:/challenge# ls
         Dockerfile             README.md         test_outputs
         evaluate_model.py      requirements.txt  training_data
         helper_code.py         team_code.py      train_model.py
         LICENSE                run_model.py
    
     root@[...]:/challenge# python train_model.py training_data model
    
     root@[...]:/challenge# python run_model.py model test_data test_outputs
    
     root@[...]:/challenge# python evaluate_model.py test_data test_outputs
     [...]
    
     root@[...]:/challenge# exit
     Exit
    

What else do I need?

This repository does not include code for evaluating your entry. Please see the evaluation code repository for code and instructions for evaluating your entry using the Challenge scoring metric.

This repository also includes code for preparing the validation and test sets. We will run your trained model on data without labels and with 12, 24, 48, and 72 hours of recording data to evaluate its performance with limited amounts of data. You can use this code to prepare the training data in the same way that we prepare the validation and test sets.

  • truncate_data.py: Truncate the EEG recordings. Usage: run python truncate_data.py -i input_folder -o output_folder -t 12 to truncate the EEG recordings to 12 hours. We will run your trained models on data with 12, 24, 48, and 72 hours of recording data.
  • remove_labels.py: Remove the labels. Usage: run python remove_labels.py -i input_folder -o output_folder to copy the data and metadata (but not the labels) from input_folder to output_folder.
  • remove_data.py: Remove the binary signal data, i.e., the EEG recordings. Usage: run python remove_data.py -i input_folder -o output_folder to copy the labels and metadata (but not the EEG recording data) from input_folder to output_folder.

How do I learn more?

Please see the Challenge website for more details. Please post questions and concerns on the Challenge discussion forum.

Useful links

python-example-2023's People

Contributors

bharadwaj9674 avatar matthewreyna avatar tasfik007 avatar tejovk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

python-example-2023's Issues

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'float64' in load_challenge_data

This problem happens when load_challenge_data is called on a patient with no EEG records.

In fact load_challenge_data calls get_recording_ids that returns a numpy array of the recording ids of the patient.

In the case that the patient has at least one record available the entire array returned by get_recording_ids will be an array of strings; in the case that the patient has no records available the returned array will be an array of np.nan.

This happens because in

return np.asarray(variables)

if variables is a list of only np.nan values, the resulting array would be a float array of np.nan, instead if at least one value in the list is a string (i.e. a record of EEG data is available), then the returned array will be a string array. (at least this is what happens to me with numpy==1.24.2)

So when load_challenge_data is called on a patient with no records available, the array of recording ids will be an array of np.nan (i.e. a float value) and the condition of this statement will be True

if recording_id != 'nan':

and so the next line would try to execute a join with a float argument, thus returning the error of the title.

recording_location = os.path.join(data_folder, patient_id, recording_id)

I would suggest to change the condition of the if statement at line 33 using your custom function is_nan(), so it would be if not is_nan(recording_id):

Import numpy as os in teamcode

In team code imports there is import numpy as np, os, sys
For some reason, it works, but IMHO it is not a valid import.

Integer overflow in helper_code.load_recording_data

The data type of loaded values using scipy.io.loadmat from the .mat data files is int16. Subtraction using baseline (or offset) might result in an integer overflow. An example is illustrated as follows:

image

Perhaps a data type conversion (e.g. int32, float32) should be immediately performed after scipy.io.loadmat.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.