meteoswiss / ldcast Goto Github PK

View Code? Open in Web Editor NEW

99.0 99.0 13.0 173 KB

Latent diffusion for generative precipitation nowcasting

License: Apache License 2.0

Python 99.95% Shell 0.05%

ldcast's People

Contributors

Stargazers

Watchers

Forkers

nathalierombeek bugsuse zzwei1 zhanxianghua xueruisu ptnv-s davidlikecookies pelleck nknan crazytiy wfc1102 lucasfatas snath-xoc

ldcast's Issues

UnboundLocalError: local variable 'R_pred' referenced before assignment and Out-of-Memory Error

Hi!
Thanks for the great work.
I have been playing around with the demo (1 ensemble) and I got an error at this line.

ldcast/scripts/forecast_demo.py

Line 114 in a6b2bd9

for k in range(R_pred.shape[0]):

Also, with an --ensemble-members=2 the code seems stuck after the PLMS sampler step as the processes do not seem to join ... any reason why?

question: adjusting target ldcast

Hi @jleinonen,

How to compare the results from the demo with the actual results

Hello, could you please tell me the results predicted by your demo, where are the real results for comparison, or how can I check the .nc file you provided by myself?

About the index file of train/valid/test

It's great and interesting work! Thanks for your sharing the nice code!

I have downloaded the ldcast-dataset.zip and unzipped it to the data directory, but an error was raised when I ran train_geoforecast.py:

FileNotFoundError: [Errno 2] No such file or directory: '/public/home/ldcast/scripts/../data/split_chunks.pkl.gz'

Could you provide this file? In addition, some files (e.g., cache/sampler_autoenc_train.pkl, cache/sampler_autoenc_valid.pkl, cache/sampler_autoenc_test.pkl) were not found. Could you provide these files?

I found that the work supports using the nwp data as extra input variables (it's very interesting). Do you have plans to release the nwp data?

Thanks a lot!

FileNotFoundError: [Errno 2] No such file or directory: '/ldcast/ldcast/scripts/../data/rate-tp-0/'

Upon running the eval_pysteps.py file I encountered the FileNotFoundError.
Would you mind checking this for me?

Traceback (most recent call last):
  File "/ldcast/ldcast/scripts/eval_pysteps.py", line 50, in <module>
    Fire(main)
  File "/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/ldcast/ldcast/scripts/eval_pysteps.py", line 46, in main
    create_evaluation_ensemble(**config)
  File "/ldcast/ldcast/scripts/eval_pysteps.py", line 22, in create_evaluation_ensemble
    data_iter = eval_data.get_data_iter(
  File "/ldcast/ldcast/scripts/eval_data.py", line 50, in get_data_iter
    return func(**kwargs)
  File "/ldcast/ldcast/scripts/eval_data.py", line 27, in data_iter_testset
    return data_iter_mchrzc(**kwargs, split="test")
  File "/ldcast/scripts/eval_data.py", line 15, in data_iter_mchrzc
    datamodule = train_genforecast.setup_data(
  File "/ldcast/scripts/train_nowcaster.py", line 65, in setup_data
    raw = {
  File "/ldcast/scripts/train_nowcaster.py", line 66, in <dictcomp>
    var: patches.load_all_patches(
  File "/ldcast/features/patches.py", line 398, in load_all_patches
    files = os.listdir(patch_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/ldcast/ldcast/scripts/../data/rate-tp-0/'

``eval_genforecast``: No output after "Sending batch ... "

Hi!
Really sorry to go over the top with bugs now.
I tried the other scripts they ran fine. For the ldcast model (with 1 gpu), I find that the output is stuck after "Sending batch ...". I have tried looking into nvidia-smi and htop but it seems there is no activity. Is there an issue with the processes not able to join?

https://github.com/MeteoSwiss/ldcast/blob/b2829aeec135d9bd8ac8dd59f4830ed009b90eb5/scripts/eval_genforecast.py#LL90C5-L90C31

DGMR Model Download and Crop Size

To test DeepMind's DGMR I did try to run the colab notebook (https://github.com/deepmind/deepmind-research/blob/master/nowcasting/Open_sourced_dataset_and_model_snapshot_for_precipitation_nowcasting.ipynb), and extracting the model from tfhub but it seems it is being served by a GCP Storage Bucket (which I presume will require gsutil but can't seem to get a hold of the endpoint).

For your own evaluations, can you guide on how did you go about downloading the DGMR model and also which crop size did you use (eg. 256x256)?

Diffusion initialization

Hello
Thank you for code and model sharing

I think the random image sampling is not best in the nowcasting problem.
The good idea is doing the image initialization (see x_T argument in models/diffusion/plms.py) based on short range NWP forecast.
Are you tested this approach ?

Some questions about VAE

thank you for a very good job! I have a question about VAE;
In autoenc.py 48-52

    def _loss(self, batch):
        (x,y) = batch
        while isinstance(x, list) or isinstance(x, tuple):
            x = x[0][0]
        (y_pred, mean, log_var) = self.forward(x)

        rec_loss = (y-y_pred).abs().mean()
        kl_loss = kl_from_standard_normal(mean, log_var)

        total_loss = rec_loss + self.kl_weight * kl_loss

        return (total_loss, rec_loss, kl_loss)

(y_pred, mean, log_var) = self.forward(x)

I'm a little confused here

(x,y) = batch
while isinstance(x, list) or isinstance(x, tuple):
            x = x[0][0]
(y_pred, mean, log_var) = self.forward(x)

(x,y) = batch
while isinstance(x, list) or isinstance(x, tuple):
            x = x[0][0]
(y_pred, mean, log_var) = self.forward(y)

Is it self.forward(y) or self.forward(x)? Is the shape of x here representing the 4 frames of the condition? If this is the number of 4 frames of the condition, then y is the number of frames to be predicted. Which should be used here? self.forward(y)?

ModuleNotFoundError: No module named 'sdgf'

Hi!
I was running the eval_dgmr file and I encountered this error.
Would you mind helping with this?

  File "/ldcast/ldcast/scripts/eval_dgmr.py", line 8, in <module>
    import eval_data
  File "/ldcast/ldcast/scripts/eval_data.py", line 2, in <module>
    import train_genforecast
  File "/ldcast/ldcast/scripts/train_genforecast.py", line 10, in <module>
    from train_nowcaster import setup_data
  File "/ldcast/ldcast/scripts/train_nowcaster.py", line 8, in <module>
    from sdgf.features import batch, patches, split, transform
ModuleNotFoundError: No module named 'sdgf'

NaN or Inf found in input tensor.

Hi,
I was running train_autoenc.py with default hyperparameters and I encountered this error then stopped training.
Would you mind helping with this?

Epoch 11: : 1200it [21:02,  1.05s/it, loss=0.0638, v_num=0, val_loss=0.054, val_rec_loss=0.0454, val_kl_loss=0.819]                               Metric val_rec_loss improved by 0.010 >= min_delta = 0.0. New best score: 0.045                                                                     
Epoch 12: : 600it [11:53,  1.19s/it, loss=nan, v_num=0, val_loss=0.054, val_rec_loss=0.0454, val_kl_loss=0.819]NaN or Inf found in input tensor.
...
Epoch 12: : 1200it [20:38,  1.03s/it, loss=nan, v_num=0, val_loss=0.054, val_rec_loss=0.0454, val_kl_loss=0.819]NaN or Inf found in input tensor.
Epoch 12: : 1200it [20:40,  1.03s/it, loss=nan, v_num=0, val_loss=nan.0, val_rec_loss=nan.0, val_kl_loss=nan.0]                                   Monitored metric val_rec_loss = nan is not finite. Previous best value was 0.045. Signaling Trainer to stop.                                        
Epoch 12: : 1200it [20:40,  1.03s/it, loss=nan, v_num=0, val_loss=nan.0, val_rec_loss=nan.0, val_kl_loss=nan.0]

How ldcast predict 20 time steps from 4 time steps input?

Hello
I successfully run the demo code and find that the model can predict 20 time steps from 4 time steps input. But I am a little confused about how this happens.

In your article 4.2.1 Forecaster, you said that"we train the model to predict Dout = 5 encoded output time steps simultaneously from Din = 1 encoded input time steps". In my understanding, this means you can use state at time t to predict t+1, t+2, t+3, t+4 and t+5. Now you have state at t-3, t-2, t-1 and t, why the model can predict t+1 ~ t+20?

Hope I have made my question clear.

question: validation loss of the trained model

Hi @jleinonen,

in order reproduce the results, it would be nice to have an idea about the val_loss_ema which leads to the weights included in the zenodo repo?

In particular, I tried making predictions with a model that had about val_loss_ema of 0.00753 which doesn't show any great features.

Thanks and regards,

Tomas

The parameter sample_shape does not work when fine-tune the generative model using 256x256 pixel samples

Hi, I was running train_genforecast.py to fine-tune the model using 256x256 pixel samples. The model was initialized with the weights obtained the pre-training with 128x128 pixel samples.

I found that the parameter sample_shape of the train function in train_genforecast.py was not used when fine-tuning the model.

That is, the sample_shape is still (4, 4) when building datamodule for fine-tuning the model.

ldcast/scripts/train_genforecast.py

Lines 94 to 97 in 1345cb2

 datamodule = setup_data( 

 future_timesteps=future_timesteps, use_obs=use_obs, use_nwp=use_nwp, 

 sampler_file=sampler_file, batch_size=batch_size 

 )

Would you mind checking this for me?

loading trained model

Hi @jleinonen ,

thank you for this nice work.

I am trying to retrain the model using the script

python train_genforecast.py --model_dir="../models/genforecast_train

to see if I can reproduce the weights and obtain somewhat similar results. However, I am failing to load the model ckpt's into the Forecast class. Please note that loading the pretrained weights coming the Zenodo data repository does work.

Below I will provide the error message, but I also observed that the model size (on disk) is almost double for the ckpt's created by the train_genforecast.py script vs genforecast-radaronly-256x256-20step.pt.

I guess I am missing an obvious step here?

Error message:

from ldcast.forecast import Forecast
fn_aut = 'models/autoenc/autoenc-32-0.01.pt'
fn_gen = 'models/genforecast_train/epoch=0-val_loss_ema=0.6150.ckpt'
fc = Forecast(ldm_weights_fn = fn_gen, autoenc_weights_fn=fn_aut)
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/workspace/thirdparty/meteoswiss/ldcast/forecast.py", line 49, in __init__ self.ldm = self._init_model() File "/workspace/thirdparty/meteoswiss/ldcast/forecast.py", line 99, in _init_model ldm.load_state_dict(torch.load(self.ldm_weights_fn)) File "/workspace/virtualenv/venv_ldcast/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Missing key(s) in state_dict: "betas", "alphas_cumprod", "alphas_cumprod_prev", "sqrt_alphas_cumprod", "sqrt_one_minus_alphas_cumprod", "model.time_embed.0.weight", "model.time_embed.0.bias", "model.time_embed.2.weight", "model.time_embed.2.bias", "model.input_blocks.0.0.weight", "model.input_blocks.0.0.bias", "model.input_blocks.1.0.in_layers.2.weight", "model.input_blocks.1.0.in_layers.2.bias", "model.input_blocks.1.0.emb_layers.1.weight", "model.input_blocks.1.0.emb_layers.1.bias", "model.input_blocks.1.0.out_layers.3.weight", "model.input_blocks.1.0.out_layers.3.bias", "model.input_blocks.1.1.pre_proj.weight", "model.input_blocks.1.1.pre_proj.bias", "model.input_blocks.1.1.filter.w1", "model.input_blocks.1.1.filter.b1", "model.input_blocks.1.1.filter.w2", "model.input_blocks.1.1.filter.b2", "model.input_blocks.1.1.mlp.fc1.weight", "model.input_blocks.1.1.mlp.fc1.bias", "model.input_blocks.1.1.mlp.fc2.weight", "model.input_blocks.1.1.mlp.fc2.bias", "model.input_blocks.2.0.in_layers.2.weight", "model.input_blocks.2.0.in_layers.2.bias", "model.input_blocks.2.0.emb_layers.1.weight", "model.input_blocks.2.0.emb_layers.1.bias", "model.input_blocks.2.0.out_layers.3.weight", "model.input_blocks.2.0.out_layers.3.bias", "model.input_blocks.2.1.pre_proj.weight", "model.input_blocks.2.1.pre_proj.bias", "model.input_blocks.2.1.filter.w1", "model.input_blocks.2.1.filter.b1", "model.input_blocks.2.1.filter.w2", "model.input_blocks.2.1.filter.b2", "model.input_blocks.2.1.mlp.fc1.weight", "model.input_blocks.2.1.mlp.fc1.bias", "model.input_blocks.2.1.mlp.fc2.weight", "model.input_blocks.2.1.mlp.fc2.bias", "model.input_blocks.3.0.op.weight", "model.input_blocks.3.0.op.bias", "model.input_blocks.4.0.in_layers.2.weight", "model.input_blocks.4.0.in_layers.2.bias", "model.input_blocks.4.0.emb_layers.1.weight", "model.input_blocks.4.0.emb_layers.1.bias", "model.input_blocks.4.0.out_layers.3.weight", "model.input_blocks.4.0.out_layers.3.bias", "model.input_blocks.4.0.skip_connection.weight", "model.input_blocks.4.0.skip_connection.bias", "model.input_blocks.4.1.pre_proj.weight", "model.input_blocks.4.1.pre_proj.bias", "model.input_blocks.4.1.filter.w1", "model.input_blocks.4.1.filter.b1", "model.input_blocks.4.1.filter.w2", "model.input_blocks.4.1.filter.b2", "model.input_blocks.4.1.mlp.fc1.weight", "model.input_blocks.4.1.mlp.fc1.bias", "model.input_blocks.4.1.mlp.fc2.weight", "model.input_blocks.4.1.mlp.fc2.bias", "model.input_blocks.5.0.in_layers.2.weight", "model.input_blocks.5.0.in_layers.2.bias", "model.input_blocks.5.0.emb_layers.1.weight", "model.input_blocks.5.0.emb_layers.1.bias", "model.input_blocks.5.0.out_layers.3.weight", "model.input_blocks.5.0.out_layers.3.bias", "model.input_blocks.5.1.pre_proj.weight", "model.input_blocks.5.1.pre_proj.bias", "model.input_blocks.5.1.filter.w1", "model.input_blocks.5.1.filter.b1", "model.input_blocks.5.1.filter.w2", "model.input_blocks.5.1.filter.b2", "model.input_blocks.5.1.mlp.fc1.weight", "model.input_blocks.5.1.mlp.fc1.bias", "model.input_blocks.5.1.mlp.fc2.weight", "model.input_blocks.5.1.mlp.fc2.bias", "model.input_blocks.6.0.op.weight", "model.input_blocks.6.0.op.bias", "model.input_blocks.7.0.in_layers.2.weight", "model.input_blocks.7.0.in_layers.2.bias", "model.input_blocks.7.0.emb_layers.1.weight", "model.input_blocks.7.0.emb_layers.1.bias", "model.input_blocks.7.0.out_layers.3.weight", "model.input_blocks.7.0.out_layers.3.bias", "model.input_blocks.7.0.skip_connection.weight", "model.input_blocks.7.0.skip_connection.bias", "model.input_blocks.8.0.in_layers.2.weight", "model.input_blocks.8.0.in_layers.2.bias", "model.input_blocks.8.0.emb_layers.1.weight", "model.input_blocks.8.0.emb_layers.1.bias", "model.input_blocks.8.0.out_layers.3.weight", "model.input_blocks.8.0.out_layers.3.bias", "model.middle_block.0.in_layers.2.weight", "model.middle_block.0.in_layers.2.bias", "model.middle_block.0.emb_layers.1.weight", "model.middle_block.0.emb_layers.1.bias", "model.middle_block.0.out_layers.3.weight", "model.middle_block.0.out_layers.3.bias", "model.middle_block.1.pre_proj.weight", "model.middle_block.1.pre_proj.bias", "model.middle_block.1.filter.w1", "model.middle_block.1.filter.b1", "model.middle_block.1.filter.w2", "model.middle_block.1.filter.b2", "model.middle_block.1.mlp.fc1.weight", "model.middle_block.1.mlp.fc1.bias", "model.middle_block.1.mlp.fc2.weight", "model.middle_block.1.mlp.fc2.bias", "model.middle_block.2.in_layers.2.weight", "model.middle_block.2.in_layers.2.bias", "model.middle_block.2.emb_layers.1.weight", "model.middle_block.2.emb_layers.1.bias", "model.middle_block.2.out_layers.3.weight", "model.middle_block.2.out_layers.3.bias", "model.output_blocks.0.0.in_layers.2.weight", "model.output_blocks.0.0.in_layers.2.bias", "model.output_blocks.0.0.emb_layers.1.weight", "model.output_blocks.0.0.emb_layers.1.bias", "model.output_blocks.0.0.out_layers.3.weight", "model.output_blocks.0.0.out_layers.3.bias", "model.output_blocks.0.0.skip_connection.weight", "model.output_blocks.0.0.skip_connection.bias", "model.output_blocks.1.0.in_layers.2.weight", "model.output_blocks.1.0.in_layers.2.bias", "model.output_blocks.1.0.emb_layers.1.weight", "model.output_blocks.1.0.emb_layers.1.bias", "model.output_blocks.1.0.out_layers.3.weight", "model.output_blocks.1.0.out_layers.3.bias", "model.output_blocks.1.0.skip_connection.weight", "model.output_blocks.1.0.skip_connection.bias", "model.output_blocks.2.0.in_layers.2.weight", "model.output_blocks.2.0.in_layers.2.bias", "model.output_blocks.2.0.emb_layers.1.weight", "model.output_blocks.2.0.emb_layers.1.bias", "model.output_blocks.2.0.out_layers.3.weight", "model.output_blocks.2.0.out_layers.3.bias", "model.output_blocks.2.0.skip_connection.weight", "model.output_blocks.2.0.skip_connection.bias", "model.output_blocks.2.1.conv.weight", "model.output_blocks.2.1.conv.bias", "model.output_blocks.3.0.in_layers.2.weight", "model.output_blocks.3.0.in_layers.2.bias", "model.output_blocks.3.0.emb_layers.1.weight", "model.output_blocks.3.0.emb_layers.1.bias", "model.output_blocks.3.0.out_layers.3.weight", "model.output_blocks.3.0.out_layers.3.bias", "model.output_blocks.3.0.skip_connection.weight", "model.output_blocks.3.0.skip_connection.bias", "model.output_blocks.3.1.pre_proj.weight", "model.output_blocks.3.1.pre_proj.bias", "model.output_blocks.3.1.filter.w1", "model.output_blocks.3.1.filter.b1", "model.output_blocks.3.1.filter.w2", "model.output_blocks.3.1.filter.b2", "model.output_blocks.3.1.mlp.fc1.weight", "model.output_blocks.3.1.mlp.fc1.bias", "model.output_blocks.3.1.mlp.fc2.weight", "model.output_blocks.3.1.mlp.fc2.bias", "model.output_blocks.4.0.in_layers.2.weight", "model.output_blocks.4.0.in_layers.2.bias", "model.output_blocks.4.0.emb_layers.1.weight", "model.output_blocks.4.0.emb_layers.1.bias", "model.output_blocks.4.0.out_layers.3.weight", "model.output_blocks.4.0.out_layers.3.bias", "model.output_blocks.4.0.skip_connection.weight", "model.output_blocks.4.0.skip_connection.bias", "model.output_blocks.4.1.pre_proj.weight", "model.output_blocks.4.1.pre_proj.bias", "model.output_blocks.4.1.filter.w1", "model.output_blocks.4.1.filter.b1", "model.output_blocks.4.1.filter.w2", "model.output_blocks.4.1.filter.b2", "model.output_blocks.4.1.mlp.fc1.weight", "model.output_blocks.4.1.mlp.fc1.bias", "model.output_blocks.4.1.mlp.fc2.weight", "model.output_blocks.4.1.mlp.fc2.bias", "model.output_blocks.5.0.in_layers.2.weight", "model.output_blocks.5.0.in_layers.2.bias", "model.output_blocks.5.0.emb_layers.1.weight", "model.output_blocks.5.0.emb_layers.1.bias", "model.output_blocks.5.0.out_layers.3.weight", "model.output_blocks.5.0.out_layers.3.bias", "model.output_blocks.5.0.skip_connection.weight", "model.output_blocks.5.0.skip_connection.bias", "model.output_blocks.5.1.pre_proj.weight", "model.output_blocks.5.1.pre_proj.bias", "model.output_blocks.5.1.filter.w1", "model.output_blocks.5.1.filter.b1", "model.output_blocks.5.1.filter.w2", "model.output_blocks.5.1.filter.b2", "model.output_blocks.5.1.mlp.fc1.weight", "model.output_blocks.5.1.mlp.fc1.bias", "model.output_blocks.5.1.mlp.fc2.weight", "model.output_blocks.5.1.mlp.fc2.bias", "model.output_blocks.5.2.conv.weight", "model.output_blocks.5.2.conv.bias", "model.output_blocks.6.0.in_layers.2.weight", "model.output_blocks.6.0.in_layers.2.bias", "model.output_blocks.6.0.emb_layers.1.weight", "model.output_blocks.6.0.emb_layers.1.bias", "model.output_blocks.6.0.out_layers.3.weight", "model.output_blocks.6.0.out_layers.3.bias", "model.output_blocks.6.0.skip_connection.weight", "model.output_blocks.6.0.skip_connection.bias", "model.output_blocks.6.1.pre_proj.weight", "model.output_blocks.6.1.pre_proj.bias", "model.output_blocks.6.1.filter.w1", "model.output_blocks.6.1.filter.b1", "model.output_blocks.6.1.filter.w2", "model.output_blocks.6.1.filter.b2", "model.output_blocks.6.1.mlp.fc1.weight", "model.output_blocks.6.1.mlp.fc1.bias", "model.output_blocks.6.1.mlp.fc2.weight", "model.output_blocks.6.1.mlp.fc2.bias", "model.output_blocks.7.0.in_layers.2.weight", "model.output_blocks.7.0.in_layers.2.bias", "model.output_blocks.7.0.emb_layers.1.weight", "model.output_blocks.7.0.emb_layers.1.bias", "model.output_blocks.7.0.out_layers.3.weight", "model.output_blocks.7.0.out_layers.3.bias", "model.output_blocks.7.0.skip_connection.weight", "model.output_blocks.7.0.skip_connection.bias", "model.output_blocks.7.1.pre_proj.weight", "model.output_blocks.7.1.pre_proj.bias", "model.output_blocks.7.1.filter.w1", "model.output_blocks.7.1.filter.b1", "model.output_blocks.7.1.filter.w2", "model.output_blocks.7.1.filter.b2", "model.output_blocks.7.1.mlp.fc1.weight", "model.output_blocks.7.1.mlp.fc1.bias", "model.output_blocks.7.1.mlp.fc2.weight", "model.output_blocks.7.1.mlp.fc2.bias", "model.output_blocks.8.0.in_layers.2.weight", "model.output_blocks.8.0.in_layers.2.bias", "model.output_blocks.8.0.emb_layers.1.weight", "model.output_blocks.8.0.emb_layers.1.bias", "model.output_blocks.8.0.out_layers.3.weight", "model.output_blocks.8.0.out_layers.3.bias", "model.output_blocks.8.0.skip_connection.weight", "model.output_blocks.8.0.skip_connection.bias", "model.output_blocks.8.1.pre_proj.weight", "model.output_blocks.8.1.pre_proj.bias", "model.output_blocks.8.1.filter.w1", "model.output_blocks.8.1.filter.b1", "model.output_blocks.8.1.filter.w2", "model.output_blocks.8.1.filter.b2", "model.output_blocks.8.1.mlp.fc1.weight", "model.output_blocks.8.1.mlp.fc1.bias", "model.output_blocks.8.1.mlp.fc2.weight", "model.output_blocks.8.1.mlp.fc2.bias", "model.out.2.weight", "model.out.2.bias", "autoencoder.log_var", "autoencoder.encoder.0.proj.weight", "autoencoder.encoder.0.proj.bias", "autoencoder.encoder.0.conv1.weight", "autoencoder.encoder.0.conv1.bias", "autoencoder.encoder.0.conv2.weight", "autoencoder.encoder.0.conv2.bias", "autoencoder.encoder.0.norm1.weight", "autoencoder.encoder.0.norm1.bias", "autoencoder.encoder.0.norm2.weight", "autoencoder.encoder.0.norm2.bias", "autoencoder.encoder.1.weight", "autoencoder.encoder.1.bias", "autoencoder.encoder.2.conv1.weight", "autoencoder.encoder.2.conv1.bias", "autoencoder.encoder.2.conv2.weight", "autoencoder.encoder.2.conv2.bias", "autoencoder.encoder.2.norm1.weight", "autoencoder.encoder.2.norm1.bias", "autoencoder.encoder.2.norm2.weight", "autoencoder.encoder.2.norm2.bias", "autoencoder.encoder.3.weight", "autoencoder.encoder.3.bias", "autoencoder.decoder.0.weight", "autoencoder.decoder.0.bias", "autoencoder.decoder.1.conv1.weight", "autoencoder.decoder.1.conv1.bias", "autoencoder.decoder.1.conv2.weight", "autoencoder.decoder.1.conv2.bias", "autoencoder.decoder.1.norm1.weight", "autoencoder.decoder.1.norm1.bias", "autoencoder.decoder.1.norm2.weight", "autoencoder.decoder.1.norm2.bias", "autoencoder.decoder.2.weight", "autoencoder.decoder.2.bias", "autoencoder.decoder.3.proj.weight", "autoencoder.decoder.3.proj.bias", "autoencoder.decoder.3.conv1.weight", "autoencoder.decoder.3.conv1.bias", "autoencoder.decoder.3.conv2.weight", "autoencoder.decoder.3.conv2.bias", "autoencoder.decoder.3.norm1.weight", "autoencoder.decoder.3.norm1.bias", "autoencoder.decoder.3.norm2.weight", "autoencoder.decoder.3.norm2.bias", "autoencoder.to_moments.weight", "autoencoder.to_moments.bias", "autoencoder.to_decoder.weight", "autoencoder.to_decoder.bias", "context_encoder.autoencoder.0.log_var", "context_encoder.autoencoder.0.encoder.0.proj.weight", "context_encoder.autoencoder.0.encoder.0.proj.bias", "context_encoder.autoencoder.0.encoder.0.conv1.weight", "context_encoder.autoencoder.0.encoder.0.conv1.bias", "context_encoder.autoencoder.0.encoder.0.conv2.weight", "context_encoder.autoencoder.0.encoder.0.conv2.bias", "context_encoder.autoencoder.0.encoder.0.norm1.weight", "context_encoder.autoencoder.0.encoder.0.norm1.bias", "context_encoder.autoencoder.0.encoder.0.norm2.weight", "context_encoder.autoencoder.0.encoder.0.norm2.bias", "context_encoder.autoencoder.0.encoder.1.weight", "context_encoder.autoencoder.0.encoder.1.bias", "context_encoder.autoencoder.0.encoder.2.conv1.weight", "context_encoder.autoencoder.0.encoder.2.conv1.bias", "context_encoder.autoencoder.0.encoder.2.conv2.weight", "context_encoder.autoencoder.0.encoder.2.conv2.bias", "context_encoder.autoencoder.0.encoder.2.norm1.weight", "context_encoder.autoencoder.0.encoder.2.norm1.bias", "context_encoder.autoencoder.0.encoder.2.norm2.weight", "context_encoder.autoencoder.0.encoder.2.norm2.bias", "context_encoder.autoencoder.0.encoder.3.weight", "context_encoder.autoencoder.0.encoder.3.bias", "context_encoder.autoencoder.0.decoder.0.weight", "context_encoder.autoencoder.0.decoder.0.bias", "context_encoder.autoencoder.0.decoder.1.conv1.weight", "context_encoder.autoencoder.0.decoder.1.conv1.bias", "context_encoder.autoencoder.0.decoder.1.conv2.weight", "context_encoder.autoencoder.0.decoder.1.conv2.bias", "context_encoder.autoencoder.0.decoder.1.norm1.weight", "context_encoder.autoencoder.0.decoder.1.norm1.bias", "context_encoder.autoencoder.0.decoder.1.norm2.weight", "context_encoder.autoencoder.0.decoder.1.norm2.bias", "context_encoder.autoencoder.0.decoder.2.weight", "context_encoder.autoencoder.0.decoder.2.bias", "context_encoder.autoencoder.0.decoder.3.proj.weight", "context_encoder.autoencoder.0.decoder.3.proj.bias", "context_encoder.autoencoder.0.decoder.3.conv1.weight", "context_encoder.autoencoder.0.decoder.3.conv1.bias", "context_encoder.autoencoder.0.decoder.3.conv2.weight", "context_encoder.autoencoder.0.decoder.3.conv2.bias", "context_encoder.autoencoder.0.decoder.3.norm1.weight", "context_encoder.autoencoder.0.decoder.3.norm1.bias", "context_encoder.autoencoder.0.decoder.3.norm2.weight", "context_encoder.autoencoder.0.decoder.3.norm2.bias", "context_encoder.autoencoder.0.to_moments.weight", "context_encoder.autoencoder.0.to_moments.bias", "context_encoder.autoencoder.0.to_decoder.weight", "context_encoder.autoencoder.0.to_decoder.bias", "context_encoder.proj.0.weight", "context_encoder.proj.0.bias", "context_encoder.analysis.0.0.norm1.weight", "context_encoder.analysis.0.0.norm1.bias", "context_encoder.analysis.0.0.filter.w1", "context_encoder.analysis.0.0.filter.b1", "context_encoder.analysis.0.0.filter.w2", "context_encoder.analysis.0.0.filter.b2", "context_encoder.analysis.0.0.norm2.weight", "context_encoder.analysis.0.0.norm2.bias", "context_encoder.analysis.0.0.mlp.fc1.weight", "context_encoder.analysis.0.0.mlp.fc1.bias", "context_encoder.analysis.0.0.mlp.fc2.weight", "context_encoder.analysis.0.0.mlp.fc2.bias", "context_encoder.analysis.0.1.norm1.weight", "context_encoder.analysis.0.1.norm1.bias", "context_encoder.analysis.0.1.filter.w1", "context_encoder.analysis.0.1.filter.b1", "context_encoder.analysis.0.1.filter.w2", "context_encoder.analysis.0.1.filter.b2", "context_encoder.analysis.0.1.norm2.weight", "context_encoder.analysis.0.1.norm2.bias", "context_encoder.analysis.0.1.mlp.fc1.weight", "context_encoder.analysis.0.1.mlp.fc1.bias", "context_encoder.analysis.0.1.mlp.fc2.weight", "context_encoder.analysis.0.1.mlp.fc2.bias", "context_encoder.analysis.0.2.norm1.weight", "context_encoder.analysis.0.2.norm1.bias", "context_encoder.analysis.0.2.filter.w1", "context_encoder.analysis.0.2.filter.b1", "context_encoder.analysis.0.2.filter.w2", "context_encoder.analysis.0.2.filter.b2", "context_encoder.analysis.0.2.norm2.weight", "context_encoder.analysis.0.2.norm2.bias", "context_encoder.analysis.0.2.mlp.fc1.weight", "context_encoder.analysis.0.2.mlp.fc1.bias", "context_encoder.analysis.0.2.mlp.fc2.weight", "context_encoder.analysis.0.2.mlp.fc2.bias", "context_encoder.analysis.0.3.norm1.weight", "context_encoder.analysis.0.3.norm1.bias", "context_encoder.analysis.0.3.filter.w1", "context_encoder.analysis.0.3.filter.b1", "context_encoder.analysis.0.3.filter.w2", "context_encoder.analysis.0.3.filter.b2", "context_encoder.analysis.0.3.norm2.weight", "context_encoder.analysis.0.3.norm2.bias", "context_encoder.analysis.0.3.mlp.fc1.weight", "context_encoder.analysis.0.3.mlp.fc1.bias", "context_encoder.analysis.0.3.mlp.fc2.weight", "context_encoder.analysis.0.3.mlp.fc2.bias", "context_encoder.temporal_transformer.0.attn1.KV.weight", "context_encoder.temporal_transformer.0.attn1.KV.bias", "context_encoder.temporal_transformer.0.attn1.Q.weight", "context_encoder.temporal_transformer.0.attn1.Q.bias", "context_encoder.temporal_transformer.0.attn1.proj.weight", "context_encoder.temporal_transformer.0.attn1.proj.bias", "context_encoder.temporal_transformer.0.attn2.KV.weight", "context_encoder.temporal_transformer.0.attn2.KV.bias", "context_encoder.temporal_transformer.0.attn2.Q.weight", "context_encoder.temporal_transformer.0.attn2.Q.bias", "context_encoder.temporal_transformer.0.attn2.proj.weight", "context_encoder.temporal_transformer.0.attn2.proj.bias", "context_encoder.temporal_transformer.0.norm1.weight", "context_encoder.temporal_transformer.0.norm1.bias", "context_encoder.temporal_transformer.0.norm2.weight", "context_encoder.temporal_transformer.0.norm2.bias", "context_encoder.temporal_transformer.0.norm3.weight", "context_encoder.temporal_transformer.0.norm3.bias", "context_encoder.temporal_transformer.0.mlp.0.weight", "context_encoder.temporal_transformer.0.mlp.0.bias", "context_encoder.temporal_transformer.0.mlp.2.weight", "context_encoder.temporal_transformer.0.mlp.2.bias", "context_encoder.forecast.0.norm1.weight", "context_encoder.forecast.0.norm1.bias", "context_encoder.forecast.0.filter.w1", "context_encoder.forecast.0.filter.b1", "context_encoder.forecast.0.filter.w2", "context_encoder.forecast.0.filter.b2", "context_encoder.forecast.0.norm2.weight", "context_encoder.forecast.0.norm2.bias", "context_encoder.forecast.0.mlp.fc1.weight", "context_encoder.forecast.0.mlp.fc1.bias", "context_encoder.forecast.0.mlp.fc2.weight", "context_encoder.forecast.0.mlp.fc2.bias", "context_encoder.forecast.1.norm1.weight", "context_encoder.forecast.1.norm1.bias", "context_encoder.forecast.1.filter.w1", "context_encoder.forecast.1.filter.b1", "context_encoder.forecast.1.filter.w2", "context_encoder.forecast.1.filter.b2", "context_encoder.forecast.1.norm2.weight", "context_encoder.forecast.1.norm2.bias", "context_encoder.forecast.1.mlp.fc1.weight", "context_encoder.forecast.1.mlp.fc1.bias", "context_encoder.forecast.1.mlp.fc2.weight", "context_encoder.forecast.1.mlp.fc2.bias", "context_encoder.forecast.2.norm1.weight", "context_encoder.forecast.2.norm1.bias", "context_encoder.forecast.2.filter.w1", "context_encoder.forecast.2.filter.b1", "context_encoder.forecast.2.filter.w2", "context_encoder.forecast.2.filter.b2", "context_encoder.forecast.2.norm2.weight", "context_encoder.forecast.2.norm2.bias", "context_encoder.forecast.2.mlp.fc1.weight", "context_encoder.forecast.2.mlp.fc1.bias", "context_encoder.forecast.2.mlp.fc2.weight", "context_encoder.forecast.2.mlp.fc2.bias", "context_encoder.forecast.3.norm1.weight", "context_encoder.forecast.3.norm1.bias", "context_encoder.forecast.3.filter.w1", "context_encoder.forecast.3.filter.b1", "context_encoder.forecast.3.filter.w2", "context_encoder.forecast.3.filter.b2", "context_encoder.forecast.3.norm2.weight", "context_encoder.forecast.3.norm2.bias", "context_encoder.forecast.3.mlp.fc1.weight", "context_encoder.forecast.3.mlp.fc1.bias", "context_encoder.forecast.3.mlp.fc2.weight", "context_encoder.forecast.3.mlp.fc2.bias", "context_encoder.resnet.0.proj.weight", "context_encoder.resnet.0.proj.bias", "context_encoder.resnet.0.conv1.weight", "context_encoder.resnet.0.conv1.bias", "context_encoder.resnet.0.conv2.weight", "context_encoder.resnet.0.conv2.bias", "context_encoder.resnet.1.proj.weight", "context_encoder.resnet.1.proj.bias", "context_encoder.resnet.1.conv1.weight", "context_encoder.resnet.1.conv1.bias", "context_encoder.resnet.1.conv2.weight", "context_encoder.resnet.1.conv2.bias", "model_ema.decay", "model_ema.num_updates", "model_ema.time_embed0weight", "model_ema.time_embed0bias", "model_ema.time_embed2weight", "model_ema.time_embed2bias", "model_ema.input_blocks00weight", "model_ema.input_blocks00bias", "model_ema.input_blocks10in_layers2weight", "model_ema.input_blocks10in_layers2bias", "model_ema.input_blocks10emb_layers1weight", "model_ema.input_blocks10emb_layers1bias", "model_ema.input_blocks10out_layers3weight", "model_ema.input_blocks10out_layers3bias", "model_ema.input_blocks11pre_projweight", "model_ema.input_blocks11pre_projbias", "model_ema.input_blocks11filterw1", "model_ema.input_blocks11filterb1", "model_ema.input_blocks11filterw2", "model_ema.input_blocks11filterb2", "model_ema.input_blocks11mlpfc1weight", "model_ema.input_blocks11mlpfc1bias", "model_ema.input_blocks11mlpfc2weight", "model_ema.input_blocks11mlpfc2bias", "model_ema.input_blocks20in_layers2weight", "model_ema.input_blocks20in_layers2bias", "model_ema.input_blocks20emb_layers1weight", "model_ema.input_blocks20emb_layers1bias", "model_ema.input_blocks20out_layers3weight", "model_ema.input_blocks20out_layers3bias", "model_ema.input_blocks21pre_projweight", "model_ema.input_blocks21pre_projbias", "model_ema.input_blocks21filterw1", "model_ema.input_blocks21filterb1", "model_ema.input_blocks21filterw2", "model_ema.input_blocks21filterb2", "model_ema.input_blocks21mlpfc1weight", "model_ema.input_blocks21mlpfc1bias", "model_ema.input_blocks21mlpfc2weight", "model_ema.input_blocks21mlpfc2bias", "model_ema.input_blocks30opweight", "model_ema.input_blocks30opbias", "model_ema.input_blocks40in_layers2weight", "model_ema.input_blocks40in_layers2bias", "model_ema.input_blocks40emb_layers1weight", "model_ema.input_blocks40emb_layers1bias", "model_ema.input_blocks40out_layers3weight", "model_ema.input_blocks40out_layers3bias", "model_ema.input_blocks40skip_connectionweight", "model_ema.input_blocks40skip_connectionbias", "model_ema.input_blocks41pre_projweight", "model_ema.input_blocks41pre_projbias", "model_ema.input_blocks41filterw1", "model_ema.input_blocks41filterb1", "model_ema.input_blocks41filterw2", "model_ema.input_blocks41filterb2", "model_ema.input_blocks41mlpfc1weight", "model_ema.input_blocks41mlpfc1bias", "model_ema.input_blocks41mlpfc2weight", "model_ema.input_blocks41mlpfc2bias", "model_ema.input_blocks50in_layers2weight", "model_ema.input_blocks50in_layers2bias", "model_ema.input_blocks50emb_layers1weight", "model_ema.input_blocks50emb_layers1bias", "model_ema.input_blocks50out_layers3weight", "model_ema.input_blocks50out_layers3bias", "model_ema.input_blocks51pre_projweight", "model_ema.input_blocks51pre_projbias", "model_ema.input_blocks51filterw1", "model_ema.input_blocks51filterb1", "model_ema.input_blocks51filterw2", "model_ema.input_blocks51filterb2", "model_ema.input_blocks51mlpfc1weight", "model_ema.input_blocks51mlpfc1bias", "model_ema.input_blocks51mlpfc2weight", "model_ema.input_blocks51mlpfc2bias", "model_ema.input_blocks60opweight", "model_ema.input_blocks60opbias", "model_ema.input_blocks70in_layers2weight", "model_ema.input_blocks70in_layers2bias", "model_ema.input_blocks70emb_layers1weight", "model_ema.input_blocks70emb_layers1bias", "model_ema.input_blocks70out_layers3weight", "model_ema.input_blocks70out_layers3bias", "model_ema.input_blocks70skip_connectionweight", "model_ema.input_blocks70skip_connectionbias", "model_ema.input_blocks80in_layers2weight", "model_ema.input_blocks80in_layers2bias", "model_ema.input_blocks80emb_layers1weight", "model_ema.input_blocks80emb_layers1bias", "model_ema.input_blocks80out_layers3weight", "model_ema.input_blocks80out_layers3bias", "model_ema.middle_block0in_layers2weight", "model_ema.middle_block0in_layers2bias", "model_ema.middle_block0emb_layers1weight", "model_ema.middle_block0emb_layers1bias", "model_ema.middle_block0out_layers3weight", "model_ema.middle_block0out_layers3bias", "model_ema.middle_block1pre_projweight", "model_ema.middle_block1pre_projbias", "model_ema.middle_block1filterw1", "model_ema.middle_block1filterb1", "model_ema.middle_block1filterw2", "model_ema.middle_block1filterb2", "model_ema.middle_block1mlpfc1weight", "model_ema.middle_block1mlpfc1bias", "model_ema.middle_block1mlpfc2weight", "model_ema.middle_block1mlpfc2bias", "model_ema.middle_block2in_layers2weight", "model_ema.middle_block2in_layers2bias", "model_ema.middle_block2emb_layers1weight", "model_ema.middle_block2emb_layers1bias", "model_ema.middle_block2out_layers3weight", "model_ema.middle_block2out_layers3bias", "model_ema.output_blocks00in_layers2weight", "model_ema.output_blocks00in_layers2bias", "model_ema.output_blocks00emb_layers1weight", "model_ema.output_blocks00emb_layers1bias", "model_ema.output_blocks00out_layers3weight", "model_ema.output_blocks00out_layers3bias", "model_ema.output_blocks00skip_connectionweight", "model_ema.output_blocks00skip_connectionbias", "model_ema.output_blocks10in_layers2weight", "model_ema.output_blocks10in_layers2bias", "model_ema.output_blocks10emb_layers1weight", "model_ema.output_blocks10emb_layers1bias", "model_ema.output_blocks10out_layers3weight", "model_ema.output_blocks10out_layers3bias", "model_ema.output_blocks10skip_connectionweight", "model_ema.output_blocks10skip_connectionbias", "model_ema.output_blocks20in_layers2weight", "model_ema.output_blocks20in_layers2bias", "model_ema.output_blocks20emb_layers1weight", "model_ema.output_blocks20emb_layers1bias", "model_ema.output_blocks20out_layers3weight", "model_ema.output_blocks20out_layers3bias", "model_ema.output_blocks20skip_connectionweight", "model_ema.output_blocks20skip_connectionbias", "model_ema.output_blocks21convweight", "model_ema.output_blocks21convbias", "model_ema.output_blocks30in_layers2weight", "model_ema.output_blocks30in_layers2bias", "model_ema.output_blocks30emb_layers1weight", "model_ema.output_blocks30emb_layers1bias", "model_ema.output_blocks30out_layers3weight", "model_ema.output_blocks30out_layers3bias", "model_ema.output_blocks30skip_connectionweight", "model_ema.output_blocks30skip_connectionbias", "model_ema.output_blocks31pre_projweight", "model_ema.output_blocks31pre_projbias", "model_ema.output_blocks31filterw1", "model_ema.output_blocks31filterb1", "model_ema.output_blocks31filterw2", "model_ema.output_blocks31filterb2", "model_ema.output_blocks31mlpfc1weight", "model_ema.output_blocks31mlpfc1bias", "model_ema.output_blocks31mlpfc2weight", "model_ema.output_blocks31mlpfc2bias", "model_ema.output_blocks40in_layers2weight", "model_ema.output_blocks40in_layers2bias", "model_ema.output_blocks40emb_layers1weight", "model_ema.output_blocks40emb_layers1bias", "model_ema.output_blocks40out_layers3weight", "model_ema.output_blocks40out_layers3bias", "model_ema.output_blocks40skip_connectionweight", "model_ema.output_blocks40skip_connectionbias", "model_ema.output_blocks41pre_projweight", "model_ema.output_blocks41pre_projbias", "model_ema.output_blocks41filterw1", "model_ema.output_blocks41filterb1", "model_ema.output_blocks41filterw2", "model_ema.output_blocks41filterb2", "model_ema.output_blocks41mlpfc1weight", "model_ema.output_blocks41mlpfc1bias", "model_ema.output_blocks41mlpfc2weight", "model_ema.output_blocks41mlpfc2bias", "model_ema.output_blocks50in_layers2weight", "model_ema.output_blocks50in_layers2bias", "model_ema.output_blocks50emb_layers1weight", "model_ema.output_blocks50emb_layers1bias", "model_ema.output_blocks50out_layers3weight", "model_ema.output_blocks50out_layers3bias", "model_ema.output_blocks50skip_connectionweight", "model_ema.output_blocks50skip_connectionbias", "model_ema.output_blocks51pre_projweight", "model_ema.output_blocks51pre_projbias", "model_ema.output_blocks51filterw1", "model_ema.output_blocks51filterb1", "model_ema.output_blocks51filterw2", "model_ema.output_blocks51filterb2", "model_ema.output_blocks51mlpfc1weight", "model_ema.output_blocks51mlpfc1bias", "model_ema.output_blocks51mlpfc2weight", "model_ema.output_blocks51mlpfc2bias", "model_ema.output_blocks52convweight", "model_ema.output_blocks52convbias", "model_ema.output_blocks60in_layers2weight", "model_ema.output_blocks60in_layers2bias", "model_ema.output_blocks60emb_layers1weight", "model_ema.output_blocks60emb_layers1bias", "model_ema.output_blocks60out_layers3weight", "model_ema.output_blocks60out_layers3bias", "model_ema.output_blocks60skip_connectionweight", "model_ema.output_blocks60skip_connectionbias", "model_ema.output_blocks61pre_projweight", "model_ema.output_blocks61pre_projbias", "model_ema.output_blocks61filterw1", "model_ema.output_blocks61filterb1", "model_ema.output_blocks61filterw2", "model_ema.output_blocks61filterb2", "model_ema.output_blocks61mlpfc1weight", "model_ema.output_blocks61mlpfc1bias", "model_ema.output_blocks61mlpfc2weight", "model_ema.output_blocks61mlpfc2bias", "model_ema.output_blocks70in_layers2weight", "model_ema.output_blocks70in_layers2bias", "model_ema.output_blocks70emb_layers1weight", "model_ema.output_blocks70emb_layers1bias", "model_ema.output_blocks70out_layers3weight", "model_ema.output_blocks70out_layers3bias", "model_ema.output_blocks70skip_connectionweight", "model_ema.output_blocks70skip_connectionbias", "model_ema.output_blocks71pre_projweight", "model_ema.output_blocks71pre_projbias", "model_ema.output_blocks71filterw1", "model_ema.output_blocks71filterb1", "model_ema.output_blocks71filterw2", "model_ema.output_blocks71filterb2", "model_ema.output_blocks71mlpfc1weight", "model_ema.output_blocks71mlpfc1bias", "model_ema.output_blocks71mlpfc2weight", "model_ema.output_blocks71mlpfc2bias", "model_ema.output_blocks80in_layers2weight", "model_ema.output_blocks80in_layers2bias", "model_ema.output_blocks80emb_layers1weight", "model_ema.output_blocks80emb_layers1bias", "model_ema.output_blocks80out_layers3weight", "model_ema.output_blocks80out_layers3bias", "model_ema.output_blocks80skip_connectionweight", "model_ema.output_blocks80skip_connectionbias", "model_ema.output_blocks81pre_projweight", "model_ema.output_blocks81pre_projbias", "model_ema.output_blocks81filterw1", "model_ema.output_blocks81filterb1", "model_ema.output_blocks81filterw2", "model_ema.output_blocks81filterb2", "model_ema.output_blocks81mlpfc1weight", "model_ema.output_blocks81mlpfc1bias", "model_ema.output_blocks81mlpfc2weight", "model_ema.output_blocks81mlpfc2bias", "model_ema.out2weight", "model_ema.out2bias". Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "loops", "callbacks", "optimizer_states", "lr_schedulers".

Those who have researched similar issues can add me on WeChat, and we can learn and exchange ideas together, making progress together. Weicat: LBJ-2021, Email: [email protected]

Can't cache sampler_nowcaster_test

Context

After successfully running
python forecast_demo.py and
python train_autoenc.py --model_dir="../models/autoenc_train",
I couldn't get the
python train_genforecast.py --model_dir="../models/genforecast_train"
command running due to problems in caching the sampler for test and training set.

When running python train_genforecast.py --model_dir="../models/genforecast_train":

Expected behaviour

Creates the sampler files and save to the cache directory for valid, test, and train datasets
Trains the forecaster

Actual behaviour

Creates the file ../cache/sampler_nowcaster_valid.pkl
Throws an error creating the next sampler as (complete error message pasted below):

~/tmp/0606/ldcast/scripts$ python train_genforecast.py --model_dir="../models/genforecast_train"
Loading data...
/home/kucuk/tmp/0606/ldcast/ldcast/features/transform.py:80: RuntimeWarning: divide by zero encountered in log10
  log_scale = np.log10(scale).astype(np.float32)
Loading cached sampler from ../cache/sampler_nowcaster_valid.pkl.
No cached sampler found, creating a new one...
Traceback (most recent call last):
  File "train_genforecast.py", line 129, in <module>
    Fire(main)
  File "/home/kucuk/miniconda3/envs/ldcast_test/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/kucuk/miniconda3/envs/ldcast_test/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/kucuk/miniconda3/envs/ldcast_test/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "train_genforecast.py", line 125, in main
    train(**config)
  File "train_genforecast.py", line 94, in train
    datamodule = setup_data(
  File "/home/kucuk/tmp/0606/ldcast/scripts/train_nowcaster.py", line 124, in setup_data
    datamodule = split.DataModule(
  File "/home/kucuk/tmp/0606/ldcast/ldcast/features/split.py", line 127, in __init__
    self.batch_gen = {
  File "/home/kucuk/tmp/0606/ldcast/ldcast/features/split.py", line 128, in <dictcomp>
    split: batch.BatchGenerator(
  File "/home/kucuk/tmp/0606/ldcast/ldcast/features/batch.py", line 81, in __init__
    self.sampler = EqualFrequencySampler(
  File "/home/kucuk/tmp/0606/ldcast/ldcast/features/sampling.py", line 30, in __init__
    self.starting_ind = [
  File "/home/kucuk/tmp/0606/ldcast/ldcast/features/sampling.py", line 31, in <listcomp>
    starting_indices_for_centers(
  File "/home/kucuk/tmp/0606/ldcast/ldcast/features/sampling.py", line 210, in starting_indices_for_centers
    starting_ind = np.concatenate(
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: need at least one array to concatenate

Seems like an issue with indexing of patches in the sampler, though I'm not sure...

Additional information

I removed the sampler_nowcaster* in the cache folder and tried the same command python train_genforecast.py --model_dir="../models/genforecast_train", received the same error
In case it helps, below is the complete directory structure (note the size difference among sampler_*_valid.pkl files)

~/tmp/0606/ldcast$ tree -h
.
├── [ 11K]  LICENSE
├── [5.5K]  README.md
├── [4.0K]  cache
│   ├── [501K]  sampler_autoenc_test.pkl
│   ├── [883M]  sampler_autoenc_train.pkl
│   ├── [1.9M]  sampler_autoenc_valid.pkl
│   └── [745K]  sampler_nowcaster_valid.pkl
├── [4.0K]  config
│   ├── [  65]  genforecast-radaronly-128x128-20step.yaml
│   └── [ 158]  genforecast-radaronly-256x256-20step.yaml
├── [4.0K]  data
│   ├── [ 48K]  Border_CH.dbf
│   ├── [130K]  Border_CH.shp
│   ├── [4.0K]  RV
│   │   ├── [197M]  patches_RV_202204.nc
│   │   ├── [156M]  patches_RV_202205.nc
│   │   ├── [154M]  patches_RV_202206.nc
│   │   ├── [151M]  patches_RV_202207.nc
│   │   ├── [ 84M]  patches_RV_202208.nc
│   │   └── [281M]  patches_RV_202209.nc
│   ├── [4.0K]  RZC
│   │   ├── [ 55M]  patches_RZC_201804.nc
│   │   ├── [102M]  patches_RZC_201805.nc
│   │   ├── [ 53M]  patches_RZC_201806.nc
│   │   ├── [ 53M]  patches_RZC_201807.nc
│   │   ├── [ 71M]  patches_RZC_201808.nc
│   │   ├── [ 38M]  patches_RZC_201809.nc
│   │   ├── [ 72M]  patches_RZC_201904.nc
│   │   ├── [117M]  patches_RZC_201905.nc
│   │   ├── [ 69M]  patches_RZC_201906.nc
│   │   ├── [ 57M]  patches_RZC_201907.nc
│   │   ├── [ 77M]  patches_RZC_201908.nc
│   │   ├── [ 55M]  patches_RZC_201909.nc
│   │   ├── [ 42M]  patches_RZC_202004.nc
│   │   ├── [ 67M]  patches_RZC_202005.nc
│   │   ├── [110M]  patches_RZC_202006.nc
│   │   ├── [ 51M]  patches_RZC_202007.nc
│   │   ├── [ 91M]  patches_RZC_202008.nc
│   │   ├── [ 61M]  patches_RZC_202009.nc
│   │   ├── [ 59M]  patches_RZC_202104.nc
│   │   ├── [140M]  patches_RZC_202105.nc
│   │   ├── [ 86M]  patches_RZC_202106.nc
│   │   ├── [120M]  patches_RZC_202107.nc
│   │   ├── [ 62M]  patches_RZC_202108.nc
│   │   └── [ 54M]  patches_RZC_202109.nc
│   ├── [4.0K]  demo
│   │   └── [4.0K]  20210622
│   │       ├── [214K]  RZC211731820VL.801.h5
│   │       ├── [214K]  RZC211731825VL.801.h5
│   │       ├── [216K]  RZC211731830VL.801.h5
│   │       └── [216K]  RZC211731835VL.801.h5
│   └── [4.1K]  split_chunks.pkl.gz
├── [4.0K]  figures
│   └── [4.0K]  demo
│       ├── [207K]  R_past-00.png
│       ├── [208K]  R_past-01.png
│       ├── [208K]  R_past-02.png
│       ├── [208K]  R_past-03.png
│       ├── [192K]  R_pred-00.png
│       ├── [191K]  R_pred-01.png
│       ├── [192K]  R_pred-02.png
│       ├── [194K]  R_pred-03.png
│       ├── [195K]  R_pred-04.png
│       ├── [195K]  R_pred-05.png
│       ├── [196K]  R_pred-06.png
│       ├── [196K]  R_pred-07.png
│       ├── [199K]  R_pred-08.png
│       ├── [202K]  R_pred-09.png
│       ├── [204K]  R_pred-10.png
│       ├── [204K]  R_pred-11.png
│       ├── [208K]  R_pred-12.png
│       ├── [207K]  R_pred-13.png
│       ├── [211K]  R_pred-14.png
│       ├── [211K]  R_pred-15.png
│       ├── [212K]  R_pred-16.png
│       ├── [209K]  R_pred-17.png
│       ├── [209K]  R_pred-18.png
│       └── [205K]  R_pred-19.png
├── [4.0K]  ldcast
│   ├── [4.0K]  analysis
│   │   ├── [4.8K]  crps.py
│   │   ├── [4.4K]  fss.py
│   │   ├── [3.2K]  histogram.py
│   │   └── [5.5K]  rank.py
│   ├── [4.0K]  features
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [ 11K]  batch.cpython-38.pyc
│   │   │   ├── [ 11K]  patches.cpython-38.pyc
│   │   │   ├── [7.0K]  sampling.cpython-38.pyc
│   │   │   ├── [4.8K]  split.cpython-38.pyc
│   │   │   ├── [8.2K]  transform.cpython-38.pyc
│   │   │   └── [3.1K]  utils.cpython-38.pyc
│   │   ├── [ 13K]  batch.py
│   │   ├── [3.9K]  io.py
│   │   ├── [ 13K]  patches.py
│   │   ├── [7.1K]  sampling.py
│   │   ├── [5.2K]  split.py
│   │   ├── [8.7K]  transform.py
│   │   └── [3.8K]  utils.py
│   ├── [8.5K]  forecast.py
│   ├── [4.0K]  models
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [1.1K]  distributions.cpython-38.pyc
│   │   │   └── [ 833]  utils.cpython-38.pyc
│   │   ├── [4.0K]  autoenc
│   │   │   ├── [4.0K]  __pycache__
│   │   │   │   ├── [3.4K]  autoenc.cpython-38.pyc
│   │   │   │   ├── [1.9K]  encoder.cpython-38.pyc
│   │   │   │   └── [ 960]  training.cpython-38.pyc
│   │   │   ├── [3.0K]  autoenc.py
│   │   │   ├── [1.9K]  encoder.py
│   │   │   └── [ 952]  training.py
│   │   ├── [4.0K]  benchmarks
│   │   │   ├── [2.4K]  dgmr.py
│   │   │   ├── [3.7K]  pysteps.py
│   │   │   └── [ 350]  transform.py
│   │   ├── [4.0K]  blocks
│   │   │   ├── [4.0K]  __pycache__
│   │   │   │   ├── [9.5K]  afno.cpython-38.pyc
│   │   │   │   ├── [3.2K]  attention.cpython-38.pyc
│   │   │   │   └── [2.2K]  resnet.cpython-38.pyc
│   │   │   ├── [ 13K]  afno.py
│   │   │   ├── [3.0K]  attention.py
│   │   │   └── [2.7K]  resnet.py
│   │   ├── [4.0K]  diffusion
│   │   │   ├── [4.0K]  __pycache__
│   │   │   │   ├── [6.6K]  diffusion.cpython-38.pyc
│   │   │   │   ├── [2.9K]  ema.cpython-38.pyc
│   │   │   │   └── [8.3K]  utils.cpython-38.pyc
│   │   │   ├── [7.4K]  diffusion.py
│   │   │   ├── [2.9K]  ema.py
│   │   │   ├── [ 12K]  plms.py
│   │   │   └── [8.7K]  utils.py
│   │   ├── [ 838]  distributions.py
│   │   ├── [4.0K]  genforecast
│   │   │   ├── [4.0K]  __pycache__
│   │   │   │   ├── [1.4K]  analysis.cpython-38.pyc
│   │   │   │   ├── [1.0K]  training.cpython-38.pyc
│   │   │   │   └── [ 11K]  unet.cpython-38.pyc
│   │   │   ├── [1.0K]  analysis.py
│   │   │   ├── [1.1K]  training.py
│   │   │   └── [ 17K]  unet.py
│   │   ├── [4.0K]  nowcast
│   │   │   ├── [4.0K]  __pycache__
│   │   │   │   └── [8.4K]  nowcast.cpython-38.pyc
│   │   │   └── [8.3K]  nowcast.py
│   │   └── [ 770]  utils.py
│   └── [4.0K]  visualization
│       ├── [1.2K]  cm.py
│       └── [ 11K]  plots.py
├── [4.0K]  ldcast.egg-info
│   ├── [5.9K]  PKG-INFO
│   ├── [ 175]  SOURCES.txt
│   ├── [   1]  dependency_links.txt
│   ├── [ 137]  requires.txt
│   └── [   1]  top_level.txt
├── [4.0K]  models
│   ├── [4.0K]  autoenc
│   │   └── [1.5M]  autoenc-32-0.01.pt
│   ├── [4.0K]  autoenc_train
│   │   ├── [4.6M]  epoch=0-val_rec_loss=0.2204.ckpt
│   │   ├── [4.6M]  epoch=0-val_rec_loss=nan.ckpt
│   │   └── [4.6M]  epoch=1-val_rec_loss=0.1653.ckpt
│   └── [4.0K]  genforecast
│       └── [5.0G]  genforecast-radaronly-256x256-20step.pt
├── [4.0K]  results
├── [4.0K]  scripts
│   ├── [4.0K]  __pycache__
│   │   └── [4.0K]  train_nowcaster.cpython-38.pyc
│   ├── [4.0K]  dwd_dataset.py
│   ├── [1.4K]  eval_data.py
│   ├── [1.5K]  eval_dgmr.py
│   ├── [4.4K]  eval_genforecast.py
│   ├── [1.4K]  eval_pysteps.py
│   ├── [3.8K]  forecast_demo.py
│   ├── [4.0K]  lightning_logs
│   │   └── [4.0K]  version_0
│   │       ├── [4.0K]  events.out.tfevents.1686053670.{VM_NAME}
│   │       └── [   3]  hparams.yaml
│   ├── [3.5K]  metrics.py
│   ├── [ 13K]  plots_genforecast.py
│   ├── [2.8K]  train_autoenc.py
│   ├── [3.4K]  train_genforecast.py
│   └── [4.1K]  train_nowcaster.py
├── [ 951]  setup.py
└── [   0]  tmp4_0607

37 directories, 149 files

Could it be the sth related to version compatibilities of packages, e.g., dask or numba? Perhaps I'm missing something in the data directory.
@jleinonen please let me know how I can provide further information - and thanks in advance!

autoenc.DummyAutoencoder is not exist

Autoenc.DummyAutoencode does not exist, how to use numerical prediction products specifically

ValueError

When running python train_autoenc.py --model_dir="../models/autoenc_train", I get this error:
ValueError:
You selected an invalid strategy name: strategy=None. It must be either a string or an instance of pytorch_lightning.strategies.Strategy. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai

Error: LatentDiffusion.optimizer_step

Hello,

Thanks a lot for making this repository public.

I am trying to run train_genforecast.py example. It is throwing following error with LatentDiffusion.optimizer_step() function.

File "/home/######/ldcast/scripts/train_genforecast.py", line 119, in train
trainer.fit(model, datamodule=datamodule, ckpt_path=ckpt_path)
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit
call._call_and_handle_interrupt(
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1023, in _run_stage
self.fit_loop.run()
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
self.advance()
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/loops/fit_loop.py", line 355, in advance
self.epoch_loop.run(self._data_fetcher)
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 133, in run
self.advance(data_fetcher)
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 219, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 188, in run
self._optimizer_step(kwargs.get("batch_idx", 0), closure)
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 266, in _optimizer_step
call._call_lightning_module_hook(
File "/home/######/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 145, in _call_lightning_module_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
TypeError: LatentDiffusion.optimizer_step() missing 1 required positional argument: 'optimizer_closure'

I have been able to run forecast_demo.py and train_autoenc.py successfully. Any suggestions how to fix it.

OS: Ubuntu 22.04
Python: 3.11.4
Single gpu

Thanks,
Ajitabh.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	datamodule = setup_data(
	future_timesteps=future_timesteps, use_obs=use_obs, use_nwp=use_nwp,
	sampler_file=sampler_file, batch_size=batch_size
	)