thomashopkins32 / hubmap Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 5.7 MB

Hacking the Human Vasculature (Kaggle Competition)

License: Apache License 2.0

Jupyter Notebook 98.76% Python 1.24%

hubmap's People

Contributors

Watchers

hubmap's Issues

Use automatic mixed precision (AMP)

I already have a test for how this would work in utils.py. I think this would help by allowing for a larger batch size during training.

Handle the "unsure" annotations

We also receive annotations that the experts who annotated the data are unsure about.

We could try the following:

Treat them as full labels
Treat them as full labels but with less of a penalty in the loss for getting them wrong (tunable)
Smooth the labels to be 0.5 for these masks

Each of these options should be tested and examined in isolation on various sizes of the training set (probably need cross validation).

Figure out optimal memory usage on Kaggle GPUs

Kaggle uses NVIDIA Tesla P100 GPUs which have 16 (or 12?) GB of dedicated memory. Testing locally using my 3070 which has 8 GB, we can run a batch size of 4 using full precision and a batch size of 8 using mixed precision. We should test how many samples we can fit in a batch using 16 GB of memory.

My guess would be in the range of 16-20 samples for mixed precision but maybe more?

Look into optimizing the dataset loading

Right now it takes a couple of minutes to load in all of the images and compute polygons for the various annotated masks. I should look into a vectorized version of the polygon function that will speed this computation up.

I should also see if ChatGPT has some simple improvements to make that could speed up my code.

Try a pre-trained UNet model (if they exist...)

Starting with a pre-trained model rarely hurts performance and since there is a low amount of annotated training data, I think this would help quite a bit.

Run smaller experiements

Follow Karpathy's recipe for training: http://karpathy.github.io/2019/04/25/recipe/

We want to settle on what is working and what isn't. I think its clear that release 1 and release 2 have overfit quite a bit to the training data.

Read up on results from competition

The competition has ended and I should dive into what worked for people and what didn't.

Release 3 - Changes from UNet paper

Now that we are following the advice in the UNet paper, we should try to train the model once again.

Inspect memory requirements of UNet architecture

The number of batches that I can fit on my GPU (8 GB memory) is only 4 at the moment.

Analysis of the GPU memory requirements will make it easier to determine the best method to train the model with. A good way to go about this would be to call get_model_gpu_memory after each layer in the network. I need to know how the memory requirements change throughout a forward pass.

I should also look into other methods (or packages) that can do this work for me.

Implement self-supervised learning

Train on all of the training images

Randomly mask out a portion of the image
Have the network try to fill in the missing part

This should allow the network to learn what structures exist in the image which will make training for image segmentation much easier.

Predict using an ensemble of models

This would involve creating and training other models first. I can list some potential models here and decide if this is worth doing.

Set up evaluation for Kaggle

Kaggle has their own way of evaluating models. I need to read the documentation for the competition and implement some of it so we can submit to the competition.

Data Transformations

Do some experimentation in what some different image transformations would do if we included them during training.

Create a new notebook to visualize some of the transformations and make sure the annotations are still available and correct.

Figure out what to do with unlabeled data

There are ~6,000 images in the training set that have no labels. Examine a few a try to determine the best method for utilizing this data.

Look into the following and report back:

self-supervised learning
unsupervised pretraining

Also look into how we might use this information in other ways.

Implement metrics for evaluation

Look into different metrics for evaluating image segmentation problems. The most common one I can think of (which will also be used by Kaggle for scoring the competition) is IoU (Intersection over Union).

We may need custom implementations for this but we also might be able to use the COCO package to do this work.

Handle masking of the glomerulus

We have 3 different types of annotated masks available:

blood vessels (our target structure)
glomerulus
unsure
For the competition, we will be receiving the glomerulus mask on the test set. This means that any prediction we make in the annotated region can be safely ignored as it will not be counted in the scoring part.

I need to see how that annotation is being passed for scoring on the hidden test set. Does Kaggle discard predictions in the region internally?

What should I do with predictions from my model that fall in the glomerulus during training? I should look into this more but here are some ideas:

Remove any predictions of the model and blood vessel masks in the annotated region (there are some overlaps in the target structures)
Predict the glomerulus using the model (only needed if this structure is not available in the test set)
Add a penalty to the loss for predicting blood vessel structures in the glomerulus regions

Implement UNet Architecture

Get some dumb baselines

We should see how our network performs on the following:

All inputs are 0
All fully annotated images

Every training run should beat the "All inputs are 0" baseline.

Port code to a single notebook for training on Kaggle

Kaggle only works with notebooks as far as I can tell. Submission to the competition also requires a notebook.

Also, since Kaggle has double the GPU memory available, this means we can run with a larger batch size.

Reflect on the results of the competition

So the competition ended and unfortunately I could not train a good model in time.

Here are the things I could have done to get a better leaderboard score:

Do smaller experiments to observe what hyperparameters and other options work best
- Visualize predictions on validation set
Use automatic mixed precision training to allow for larger batch sizes
- This would make the batch normalization layer more accurate and training easier
Start with pre-trained models as a base
Read up on discussions and what was working for other people
Use open source packages for instance segmentation
Use an ensemble of different models

Here are some things I wanted to try but are unsure would help:

More/less aggressive data augmentation
Self-supervised learning on all of the training images
Use a better single model (apparently UNet is for semantic segmentation and not instance segmentation)
- Post-processing (which I did) allows for its use as a instance segmentation model but maybe this wasn't the best choice

Here are some mistakes that I made:

Training for too long (each run was about 12 hours)
Eating up my GPU budget on Kaggle (limit is 30 hrs/week)

Here are some things I am confused about:

Training was fairly unstable in terms of training and validation loss
- I am thinking maybe the low batch size of 9 is to blame?
- I was limited by Kaggle's GPU memory on this but could try automatic mixed precision (AMP) training
Why didn't more people use UNet as their architecture?
Why did my leaderboard score improve so much after the competition closed?

And finally, here are some things I learned so far:

How the mean average precision (mAP) metric works for instance segmentation
How to architect, build, and release a deep learning project that is easy to maintain
How to build, train, and evaluate instance segmentation models on a small amount of data
Some anatomy of the kidney

In sum, this was a fun project to work on and I am eager to continue trying things with it. I would like to try all of the items from the first and second sections above. I will add them as Issues to this project and work on them over the next few weeks/months.

Try open source packages for segmentation

Maybe these are better/easier to use out of the box.

Now that I have implemented one myself, I should try these open source ones which should make my workflow much faster.

Rewrite the custom dataset to use pycocotools

They have built in tooling for reading the polygons file and it seems to work well.

See this notebook for how this works: https://www.kaggle.com/code/mersico/medical-instance-segmentation-with-yolov8

Verify training script functionality

Step through each step in the debugger and make sure the data looks appropriate every step of the way.

Make sure each step of the training script is reproducible since this is an important factor for submitting a notebook to Kaggle.

Initialize the model weights well. Look into the UNet paper for guidance on this. I think there was something mentioned about initialization in there.

Verify that the loss decreases to 0 (or close to it) when we train on a single image (with multiple annotations). If it's not we need to investigate why.

Decrease and increase model capacity, how does this affect the training outcome? Increased capacity should result in lower loss but potentially more overfitting.

Inspect the gradients of each layer's weights. Make sure that they look fairly regular.

Investigate issue with mAP metric

The np.logical_and step seems to take a very long time. We should debug this to find out why.

Separate Training, Validation, and Testing data into different datasets

To make the transforms easier to work with. We should pre-split the data into training, validation, and testing.

The testing data is a single image and is already split off. The training data needs to be randomly split and this split needs to be saved somewhere.

This is required so that we can use no image transformations during validation and also get accurate class frequencies during training. If we use a single dataset and then do a split we run into the following issues:

Data leakage from computing the class frequencies using validation data
Validation data does not reflect real-world data (it has been randomly augmented)

We should implement TrainHuBMAP, ValidHuBMAP, and TestHuBMAP datasets instead of the single HuBMAP dataset.

Follow UNet paper for training specifics

For the first couple of releases, I trained the model using things from my personal experience and intuition in training deep neural networks. I should look to see what worked for the authors and try to emulate that for this dataset.

This means that I should:

Use SGD with momentum
Use xavier normal weight initialization (check the math for this)
Use data augmentation similar to their choices
Use dropout regularization near the final layers of the network

Let's see if that will improve our performances.

Try a different single model

UNet apparently isn't meant for instance segmentation but semantic segmentation (I have to double check that this is accurate).

I should try a different model that was built for instance segmentation once I have squeezed out performance on UNet to the best of my ability.

Set up model checkpointing

We need to be able to save our model and train it further. Kaggle limits the GPU hours of notebooks to 30 hours per week and 9 hours per run.

thomashopkins32 / hubmap Goto Github PK

hubmap's People

Contributors

Watchers

hubmap's Issues

Recommend Projects

Recommend Topics

Recommend Org