raspstephan / cbrain-cam Goto Github PK

Code for neural network parameterization project

License: MIT License

Python 0.11% Jupyter Notebook 83.57% HTML 16.32% Shell 0.01% MATLAB 0.01% NCL 0.01%

cbrain-cam's Issues

Test convolutional layers

This is essentially @gentine 's idea: Instead of using fully connected networks, could we use convolutions in the vertical to improve the generalizability.

I already have a basic framework for convolutions but it should be tested more rigorously. To test the generalizability, I would suggest starting offline: Train the network with the reference climate and then test its predictions for a validation dataset with +1-4K.

Extend to inequality constraints possible?

Hi!
In the paper, you show very nicely that your method can constrain NN output to equality constraints.
Do you think it would be possible to extend this method to inequality constraints?

Improve learning rate annealing for faster training of neural networks

this is closely related to #7

There are ways to train neural networks significantly faster than just using standard learning rate annealing. I heard about this from the fast.ai work on the DAWNBench competition: https://www.fast.ai/2018/08/10/fastai-diu-imagenet/

I am not as firm on the backgrounds yet, but I think a large part of the training speedup comes from implementing Leslie Smith's work on Super-convergence (https://arxiv.org/abs/1708.07120) and the 1Cycle learning rate scedule (https://arxiv.org/abs/1803.09820).

I am not sure how easy this would be to implement but there is a Keras repository: https://github.com/titu1994/keras-one-cycle

This links

Big Goal: Solve stability

Why do neural networks crash once they are implemented in CAM? Why do some not crash? How can we ensure that they don't crash?

Link up preprocessing ID and network training

Currently, the preprocessing script produces three files, train, valid and normalization.

All of these have to be listed separately in the training config file. This is a lot of work.

On the other hand, I want to keep flexibility.

In any case, I need to enhance the reproducibility, so that it's clear which preprocessing config file corresponds to which network training file.

Create an environment/requirements file and an setup script

Let me know if you are interested in doing this. Then I will add some details!

Implement strongly constrained network in CAM

This means implementing the conservation layers used in NN 002. Again let's assume this crashes because of weird correlations.

To Do:

Implement layers and test whether they are correct

Implement different network architectures (resnet, deepnet) with batch norm

This is closely connected with #1.

I recently listened to this podcast with Jeremy Howard: https://twimlai.com/twiml-talk-214-trends-in-deep-learning-with-jeremy-howard/

In it he talks about how people realized that batch norm and skip connections change the loss surface, making networks easier to train: https://arxiv.org/abs/1712.09913

I wonder if part of the stability problems we are experiencing are due to a very spiky minimum during training with bad generalization capabilities.

I tried batch norm offline at some point but it didn't increase the generalization to "new" climates offline. But maybe this is a separate issue.

Analyze Jacobians and compare to Noah's

As in Noah's analysis Jacobians can give clues why the NN is unstable. Currently my J's and Tom's do differ a little, so we need to figure out why and what's going on.

To Do:

Compare method of computing Jacobians in Tom's and my own notebooks
Compare Jacobians for 8 col version (that hopefully turns out to be stable again) and the unstable 32 col version (test again with maxrs normalization)
Use Jacobians to figure out the worth of adiabatic (and other) inputs
Understand why PNAS Jacobians look like shit

How (where) to get the raw SPCAM data

Hi,
Thank you for sharing your great works! I am new to the parametrization problem and have a great interest in your work. Now I‘am confused that how/where to get the raw SPCAM data. I have read the 1.0-Entire-workflow-for-32-column-run.ipynb file, but did not find the files you mentioned about.

Implement weakly constrained network in CAM

Change variable inputs/outputs for weakly constrained network 003. Let's assume it will crash right away though because of spurious dependencies.

To Do:

Check that network input/outputs agree between CAM and preprocessing
Write down CAM variable flow to figure out what happens with DTVKE
Check how variables relate to energy fixer
Implement noadiab version

Debug crashed new 8 column run

NN experiment 001 crashed after around 10 days in CAM. Why? Previous experiments with 7 layers and those inputs and outputs worked, see PNAS paper.

To Debug:

Check why the run crashes. Is it typical or is there something obviously wrong.
Run new NN (006) with exactly the same setup as reference PNAS version (D025). Same layers, same input normalization. But note that I am testing with much fewer epochs.
If this crashes, plug in the weights and norm files from D025 to check whether the difference is in the neural network or in CAM

Clean up data pre-processing scripts

Currently, the data preprocessing is handled by the preprocess_aqua.py function followed by shuffle_ds.py to randomize the training dataset. To create a new training + validation set, three commands are required. Much of this is redundant and should be easier, so that only one config script is needed for all of these steps.

Additionally, some of the functionality in preprocess_aqua is deprecated.

General discussion

Hi @gentine, @tbeucler, @mspritch, @mouatadid, @gmooers96,

time for some updates now that I am actively working on the project again.

I documented the entire workflow from the raw SPCAM training data to the NNCAM simulations here: https://github.com/raspstephan/CBRAIN-CAM/blob/master/notebooks/stephans-devlog/1.0-Entire-workflow-for-32-column-run.ipynb

This should be a great starting point for anyone new to the project. But it also serves to document the status quo and identify workflow bottlenecks and issues in the reproducibility. I already opened some issues (#1, #2, #3, #4) that address some problems in the workflow.

Talking about issues. I, and anyone working with the code, can add well defined problems that need to be fixed or interesting science enhancements (see e.g. #5). Some of the issues I assigned to myself, but for some I also added the label help wanted. These issues would be great projects for collaborators to tackle (i.e. motivated CS students, etc.)

Talking about collaboration, I also started a Wiki for this project: https://github.com/raspstephan/CBRAIN-CAM/wiki
It is pretty sparse so far, but I already added a guide for collaborators: https://github.com/raspstephan/CBRAIN-CAM/wiki/A-guide-for-collaborators

Thanks!

Make converting to text file part of the training script

Currently, you need two commands to first train the network, then convert the saved model to text files for CAM.
There is a lot of redundancy. Conversion should be part of the training script, with a simple True/False argument, whether to do it or not.
It should still be accessible separately, I guess.

raspstephan / cbrain-cam Goto Github PK

cbrain-cam's People

Contributors

Stargazers

Watchers

Forkers

cbrain-cam's Issues

To Do:

To Do:

To Do:

To Debug:

Recommend Projects

Recommend Topics

Recommend Org