Giter VIP home page Giter VIP logo

clouds_dist's People

Contributors

krisrs1128 avatar mustafaghali avatar vict0rsch avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

mustafaghali

clouds_dist's Issues

Loss on validation set

Right now, we are just looking at the training losses. This is not so bad for the GAN term, but is risky with the matching loss.

Assignment:

  • Let earth data read as input some index file, specifying which samples are for train and which are for validation
  • Create function to evaluate on the validation set, during training
  • Make sure this are also written to the logs

Don't truncate in plots

We now generate images in the range [-1, 1]. This gets truncated when plotted in numpy -- everything less than 0 becomes black. This means the images we see in comet don't look close to what they really are like (they still look like gaussian noise).

The fix should be easy... just rescale to [0, 1]

x --> .5 * (x + 1)

continue training

we need to setup a procedure to continue training.

goes with issue about standardizing output dirs so that in the end we can just say something like --continue=run-i and the code goes to the right place, loads latest.pt and boom

Why is there a data loader in rescale

self.data_loader = torch.utils.data.DataLoader(

I don't quite follow @mustafaghali because in train.py we have

        transfs = []
        if self.opts.data.preprocessed_data_path is None and self.opts.data.with_stats:
            transfs += [
                Rescale(
                    data_path=self.opts.data.path,
                    batch_size=self.opts.train.batch_size,
                    num_workers=self.opts.data.num_workers,
                    verbose=1,
                )
            ]

        self.trainset = EarthData(
            self.opts.data.path,
            preprocessed_data_path=self.opts.data.preprocessed_data_path,
            load_limit=self.opts.data.load_limit or -1,
            transform=transforms.Compose(transfs),
        )

so why does rescale have a data_loader attribute?

Also note I deleted batchsize=n_in_mem and switched to opts.train.batch_size

staged training

  • train with large lambda_l (matching loss) in the beginning and then
    balance with the lambda_g

learn residuals

Train per-pixel linear regressors (42 metos => 3 rgb) then infer on data then at train time substract these inferences from the input to remove the least meaning variations

@krisrs1128 is that it?

Who's doing it?

squash channels

[0 ; 9] => U (wind components)
[10 ; 19] => T
[20:29] => V
[30:39] => H 
[40] => Scattering level
[41] => TS (surface temperature)
[42:43] => Long, Lat
2->11

to

0 => av(U)
1 => av(T)
2 => av(h)
3 => SL
4 => TS
5 => Lon
6 => Lat

Analyse expermient runs

config/explore_gan_hyps.json

{
    "experiment": {
        "name": "explore-gan",
        "exp_dir": "$tmpv/clouds_runs/",
        "repeat": 20
    },
    "runs": [
        {
            "sbatch": {
                "runtime": "24:00:00",
                "message": "gan exploration",
                "conf_name": "gan_exp"
            },
            "config": {
                "model": {
                    "disc_size": 64,
                    "dropout": {
                        "sample": "range",
                        "from": [
                            0, 0.45, 0.05
                        ]
                    }
                },
                "train": {
                    "datapath": "/network/tmp1/schmidtv/clouds500",
                    "batch_size": 8,
                    "num_D_accumulations": 1,
                    "n_epochs": 500,
                    "with_stats": false,
                    "lr_d": {
                        "sample": "list",
                        "from": [
                            0.00001,
                            0.00005,
                            0.0001,
                            0.0005,
                            0.001,
                            0.005,
                            0.01
                        ]
                    },
                    "lr_g": {
                        "sample": "list",
                        "from": [
                            0.00001,
                            0.00005,
                            0.0001,
                            0.0005,
                            0.001,
                            0.005,
                            0.01
                        ]
                    },
                    "lambda_gan": {
                        "sample": "list",
                        "from": [
                            0.1,
                            1,
                            5,
                            10
                        ]
                    },
                    "lambda_L": {
                        "sample": "list",
                        "from": [
                            0,
                            0.1,
                            1,
                            5,
                            10
                        ]
                    },
                    "matching_loss": {
                        "sample": "list",
                        "from": [
                            "l1",
                            "l2",
                            "weighted"
                        ]
                    }
                }
            }
        }
    ]
}

Imputation

  • Change the name: RemoveNans --> ReplaceNans

  • Implement the transformation

    • Imgs: Replace nans with -1
    • Metos: Replace the nans with mean - 3std

Standardize output dir

We need to create an output directory, unique per run, with conf, comet, checkpoints, saved images (if need be some day) and so on.

  • agree on file structure

Like:

$SCRATCH/clouds
    data/
        imgs/
        metos/
    logs/
    outputs/
        run-i/
            comet.zip
            network.pt
            final_images/
            conf.json

Thoughts?

Pull Request Etiquette

Hey, as I've seen in other projects, a good software engineering practice is to put [WIP] at the beginning of a PR's title when it's "work in progress". That will prevent unwanted merges.

For instance, @mustafaghali created a Quantization PR. But from our discussion, it's was not what we had in mind. So he changed it. And now I don't know if it should be merged or not.

So if a PR's not ready to be merged, add [WIP] in its title :)

Quantizing metos

Instead of linearly rescaling, quantize values across image per variable: percentiles

Discriminator not learning

I ran an experiment trying to overfit 100 samples, over 100 epochs.

Here's the link to the comet exp.

Pb: discriminator loss is constantly 0.5.

There must be a bug in the code, something's not right. Trying to investigate issues that may be related to backward or detach

getstats is slow

can we do something about this? I've only got 500 images and it's taking minutes...

Unet throughs error for cropped input 216

the concise unet implementation throughs error when trying to concatenate the upsample output with down features for cropped input from 256 to 216.
it works though if I changed the number of blocks to 3

data loading issue

/home/vsch/cloudenv/lib/python3.6/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/vsch/clouds/src/train.py", line 210, in <module>
    result = trainer.run_trail()81 train/d_loss:0.2497 train/L1_loss:0.3646 train/g_loss:0.3646
  File "/home/vsch/clouds/src/train.py", line 87, in run_trail
    lambda_L1=1,
  File "/home/vsch/clouds/src/train.py", line 111, in train
    for i, (coords, real_img, metos_data) in enumerate(self.trainloader):
  File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 568, in __next__
    return self._process_next_batch(batch)
  File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
  File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/vsch/clouds/src/data.py", line 58, in __getitem__
    path = [s for s in self.paths[key] if self.ids[j] in s][0]
  File "/home/vsch/clouds/src/data.py", line 58, in <listcomp>
    path = [s for s in self.paths[key] if self.ids[j] in s][0]
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.