krisrs1128 / clouds_dist Goto Github PK
View Code? Open in Web Editor NEWSimulation of low-clouds, from weather measures.
Simulation of low-clouds, from weather measures.
Let's see the effect of the NANs on training.
Right now, we are just looking at the training losses. This is not so bad for the GAN term, but is risky with the matching loss.
Assignment:
We now generate images in the range [-1, 1]. This gets truncated when plotted in numpy -- everything less than 0 becomes black. This means the images we see in comet don't look close to what they really are like (they still look like gaussian noise).
The fix should be easy... just rescale to [0, 1]
x --> .5 * (x + 1)
Check that torch.seed
makes the dataloader and initialization reproducible.
we need to setup a procedure to continue training.
goes with issue about standardizing output dirs so that in the end we can just say something like --continue=run-i
and the code goes to the right place, loads latest.pt
and boom
for a given batch 32 the stats(means, mins, max) tensors with the gan model give's Cuda out of memory error, it works fine when I reduce the batch size to 26...
clouds_dist/src/preprocessing.py
Line 16 in f937ecb
I don't quite follow @mustafaghali because in train.py
we have
transfs = []
if self.opts.data.preprocessed_data_path is None and self.opts.data.with_stats:
transfs += [
Rescale(
data_path=self.opts.data.path,
batch_size=self.opts.train.batch_size,
num_workers=self.opts.data.num_workers,
verbose=1,
)
]
self.trainset = EarthData(
self.opts.data.path,
preprocessed_data_path=self.opts.data.preprocessed_data_path,
load_limit=self.opts.data.load_limit or -1,
transform=transforms.Compose(transfs),
)
so why does rescale have a data_loader attribute?
Also note I deleted batchsize=n_in_mem
and switched to opts.train.batch_size
Train per-pixel linear regressors (42 metos => 3 rgb) then infer on data then at train time substract these inferences from the input to remove the least meaning variations
@krisrs1128 is that it?
Who's doing it?
[0 ; 9] => U (wind components)
[10 ; 19] => T
[20:29] => V
[30:39] => H
[40] => Scattering level
[41] => TS (surface temperature)
[42:43] => Long, Lat
2->11
to
0 => av(U)
1 => av(T)
2 => av(h)
3 => SL
4 => TS
5 => Lon
6 => Lat
config/explore_gan_hyps.json
{
"experiment": {
"name": "explore-gan",
"exp_dir": "$tmpv/clouds_runs/",
"repeat": 20
},
"runs": [
{
"sbatch": {
"runtime": "24:00:00",
"message": "gan exploration",
"conf_name": "gan_exp"
},
"config": {
"model": {
"disc_size": 64,
"dropout": {
"sample": "range",
"from": [
0, 0.45, 0.05
]
}
},
"train": {
"datapath": "/network/tmp1/schmidtv/clouds500",
"batch_size": 8,
"num_D_accumulations": 1,
"n_epochs": 500,
"with_stats": false,
"lr_d": {
"sample": "list",
"from": [
0.00001,
0.00005,
0.0001,
0.0005,
0.001,
0.005,
0.01
]
},
"lr_g": {
"sample": "list",
"from": [
0.00001,
0.00005,
0.0001,
0.0005,
0.001,
0.005,
0.01
]
},
"lambda_gan": {
"sample": "list",
"from": [
0.1,
1,
5,
10
]
},
"lambda_L": {
"sample": "list",
"from": [
0,
0.1,
1,
5,
10
]
},
"matching_loss": {
"sample": "list",
"from": [
"l1",
"l2",
"weighted"
]
}
}
}
}
]
}
range of values in the input should be narrow
range of target values should be -1:1
count models parameters
feature activations within unet
I am not sure if this happens only on my side ?
e.g in explore.yaml runtime: 24:00:00
while in parallel_run.py spb[runtime] = 86400
Change the name: RemoveNans --> ReplaceNans
Implement the transformation
mean - 3std
Investigating, I got this:
batch["real_imgs"].sum()
We need to create an output directory, unique per run, with conf, comet, checkpoints, saved images (if need be some day) and so on.
Like:
$SCRATCH/clouds
data/
imgs/
metos/
logs/
outputs/
run-i/
comet.zip
network.pt
final_images/
conf.json
Thoughts?
Hey, as I've seen in other projects, a good software engineering practice is to put [WIP]
at the beginning of a PR's title when it's "work in progress". That will prevent unwanted merges.
For instance, @mustafaghali created a Quantization PR. But from our discussion, it's was not what we had in mind. So he changed it. And now I don't know if it should be merged or not.
So if a PR's not ready to be merged, add [WIP]
in its title :)
initialize early layers from pretrained, overfitting model
Instead of linearly rescaling, quantize values across image per variable: percentiles
Should be an option in the config file, to increase size of generator.
add them to process sample
remove the zeros in the borders of the images
I ran an experiment trying to overfit 100 samples, over 100 epochs.
Here's the link to the comet exp.
Pb: discriminator loss is constantly 0.5.
There must be a bug in the code, something's not right. Trying to investigate issues that may be related to backward
or detach
can we do something about this? I've only got 500 images and it's taking minutes...
Add EG optimizer
write a script to generate the point cloud of generated value vs target value per channel
the concise unet implementation throughs error when trying to concatenate the upsample output with down features for cropped input from 256 to 216.
it works though if I changed the number of blocks to 3
/home/vsch/cloudenv/lib/python3.6/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/vsch/clouds/src/train.py", line 210, in <module>
result = trainer.run_trail()81 train/d_loss:0.2497 train/L1_loss:0.3646 train/g_loss:0.3646
File "/home/vsch/clouds/src/train.py", line 87, in run_trail
lambda_L1=1,
File "/home/vsch/clouds/src/train.py", line 111, in train
for i, (coords, real_img, metos_data) in enumerate(self.trainloader):
File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 568, in __next__
return self._process_next_batch(batch)
File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/vsch/cloudenv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/vsch/clouds/src/data.py", line 58, in __getitem__
path = [s for s in self.paths[key] if self.ids[j] in s][0]
File "/home/vsch/clouds/src/data.py", line 58, in <listcomp>
path = [s for s in self.paths[key] if self.ids[j] in s][0]
IndexError: list index out of range
Continue the work started in session 10/17
see branch run-hyperparams-exploration
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.