ssnl / dataset-distillation Goto Github PK
View Code? Open in Web Editor NEWOpen-source code for paper "Dataset Distillation"
Home Page: https://ssnl.github.io/dataset_distillation
License: MIT License
Open-source code for paper "Dataset Distillation"
Home Page: https://ssnl.github.io/dataset_distillation
License: MIT License
Hello!
We are trying to use dataset distillation with a .gz file (similar to those that can be downloaded from the MNIST dataset). We've been looking through your dataset distillation code but we've been unable to find out where we could edit the code to pull data from our .gz file instead of from the MNIST dataset.
Could you please let me know in which file/line we could edit your dataset distillation code to pull data from the .gz file?
Thank you in advance!
Hello, Dr Wang.
I have some question about dataset distillation. For image x^1 in the synthetic distilled training dataset, the loss function L is very small, or even equal 0. For minimize the objective function, we obtain the distilled dataset only with x^1. As a result, the distilled dataset have only a image.
Or in other words, how do you controll the size of the distilled dataset.
Thank you very much.
Hello!
I am reading the source code, specifically class Trainer
in train_distilled_image.py
. I have two questions regarding to your implementation of optimizing distilled data:
When computing the gradient of final L w.r.t w
in params
, you claim that you use w
(PRE-GD) in paper and comment in the code. But you are actually using weight after gd in line 156-160 in train_distilled_image.py
, since params
stores original model weight and model weights after gd in every step. In the loop (line 143), the w
corresponds to model weight POST-GD, not PRE-GD. To verify my guess, I check that len(params)=31
, len(gws)=30
during running
python main.py --mode distill_basic --dataset MNIST --arch LeNet
. That means in the loop, the updated model weight in the final step in first retrieved.
I guess simply disgarding the model weight after gd in the final step will do the job.
In line 172, you use dw.add_(hvp_grad[0])
to update dw
, which is wierd because gradient through different steps does not accumulate by adding. If dw
denotes the gradient of final L w.r.t the updated w
in each step, I wonder if dw=(hvp_grad[0])
is the correct one. Because in my understanding, every unupdated model weight in this step is the updated model weight in last step, which makes hvp_grad[0]
itself the gradient of final L w.r.t the updated w
in each step.
Anyway, many thanks to your interesting works!
Hi, I came across your paper a few weeks ago.
I have a dataset that constantly grows, like every couple of weeks. The growth is both in terms of more examples of a set of known classes as well as new classes being added.
Is it possible to use this method to keep a reduced dataset of the old images?
For eg: I have 10k images that I want to distill into 100. Then I get a new batch of 200 images.
How would I retrain a model "from scratch" using this combination of distilled and raw images?
I'm a grad student focussing on HPC, so I'm sorry if these questions are silly. But I would greatly appreciate any feedback, thank you!
when i run the code
python main.py --mode distill_basic --dataset MNIST --arch LeNet
i got this
`Traceback (most recent call last):
File "F:\dataset_dis\dataset-distillation-master\utils\distributed.py", line 5, in
from torch.distributed import ReduceOp
ImportError: cannot import name 'ReduceOp'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 11, in
from base_options import options
File "F:\dataset_dis\dataset-distillation-master\base_options.py", line 6, in
import utils
File "F:\dataset_dis\dataset-distillation-master\utils_init_.py", line 3, in
from . import distributed
File "F:\dataset_dis\dataset-distillation-master\utils\distributed.py", line 7, in
from torch.distributed import reduce_op
ImportError: cannot import name 'reduce_op'`
is this because of the version of pytorch or something else?
Thanks for your great work.
I have a question regarding the code in the repo dataset-distillation.
If I understood correctly, after distilling the image, we can train AlexCifarnet with the distilled Cifar10 data. and then conducting the test on the trained model with original Cifar10 test data.
I have gone over the code, however, I didn't find the snippet to train the distilled image. If the training process is actually presented in the code, could you please help me by noting down the command for training the distilled Cifar10 data after distilling the data?
Looking forward to your reply and hope you have an amazing day!
Thank you and best regards,
Dai
state.local_n_nets
is used without definition. What's the meaning of it?
If train_nets_type=='known_init'
, why we want to build several networks with different initialization?
Thanks a lot!
Sorry for bothering, I have some question when reading the paper
I run main.py and it works well, my question is:
The first question:
Does the instillation model and the test model must be the same?
For example, if I got the distillation images using LeNet, can I train these images on AlexNet?
The second question:
Does the number of the distillation images must be equal to the class numbers?
For example, if I want to distillation the MNIST dataset, can I distillate it into 20 images?
(I saw the distillation_lables equals to num_class)
Hi, thank you for your great work,
I want to try this on custom dataset, do you have any suggestions on this?
Thanks
Thanks for your great work, I wonder to know how to get the distilled images for MNIST data.
It seems that you fix the label associated with each image when producing a weight update with it.
This, I presume, helps the images you learn be specialised to single classes. E.g. the MNIST digits you produce (for random initialisation) are clearly distinct digits.
However, the labels are also a vital part of the dataset.
Did you consider randomly initialising them and backpropagating the training signal to learn them too?
Do you guys try to compare the results of distilled data to the ones trained on randomly selected samples of the dataset?
For example, if I randomly select 10 images from the MNIST dataset (1 for each category) and train the network on them, how would the results be? I think it's a fundamental thing to compare with.
Very interesting work by the way!
the dataset is mnist, I use random unknown initialization learned 100 synthetic data, then, i use 100 synthetic data to train a random initialization lenet,the accuracy just achieve 12%.
Maybe my code is wrong, can you release the code for me?
I am getting outputs that look completely random when I try to run distillation on a subset of imagenet with a XResnet 18 model.
I have only tried one set of command line args and was wondering whether you had any intuition for what I might obviously be doing wrong or had tried this before.
My command is:
python main.py --mode distill_basic --dataset Imagenette --arch DXResNet18 --batch_size 64 \
--distill_steps 3 --train_nets_type known_init --n_nets 1 \
--test_nets_type same_as_train
I made my own DXResnet18
class and Imagenette dataloader.
Thanks in advance!
After running the function to extract the kmeans centroids as baselines, I saved some of these centroids as png images and noticed some of them look like noise which might suggest the presence of a bug, however, I haven't gone through the code myself.
Here are a few of the centroids generated for MNIST class 3:
Any idea of what might be the cause? If it is an actual bug I guess this would impact the values presented in the paper.
For reference, here is the code I used:
data = dataset_distillation.utils.baselines.kmeans_train(state, p=2)
imgs, labels = data[-1] # Use last step
for i, img in enumerate(imgs):
torchvision.utils.save_image(img, f"{i}.png", nrow=1, padding=0)
Thank you.
Hello!
I'm wondering what is the correct way to get distilled images and test performance on them as well check performance on a normal dataset after training on the distilled images. I'm confused since there are many parameters and I've already read the advanced docs. So, to get distilled data, for example on Cifar10, I need to run
python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet --distill_lr 0.001
Then distilled images are in file result.pth
.
So, to train the network on usual full dataset I need to set --mode train
, and if I want to test network performance after training on distilled data I need to set --mode train --phase test
?
Or, in other words, how to get results like in your paper where it’s said that you get 80% when fully trained against 54% with distilled data on CIFAR10?
Looking forward for your response! Thanks!
Thank you for your nice work.
Could you please provide an example about how to use it for distilling any custom dataset and employ user-defined network structure?
When I run the unknown initialization experiments on MNIST:
python main.py --mode distill_basic --dataset MNIST --arch LeNet
I get the following after a few dozen epochs:
Traceback (most recent call last): File "main.py", line 402, in <module> main(options.get_state()) File "main.py", line 130, in main steps = train_distilled_image.distill(state, state.models) File "/home/isucholu/original/dataset-distillation/train_distilled_image.py", line 296, in distill return Trainer(state, models).train() File "/home/isucholu/original/dataset-distillation/train_distilled_image.py", line 283, in train raise RuntimeError('loss became NaN') RuntimeError: loss became NaN
Was gradient fairly stable when you were running this for the paper? Do I just need to make some more attempts?
Regarding some of the default arguments set in base_options.py:
Shouldn't the default value be one of the options? An error is thrown out when using the value charge
As indicated by both the paper (section S-1) and the help
argument, the default value should be 0.02 which is not the case. Were the experiments performed with 0.02 or 0.001?
Hello.
While I was reading your code,I met a unresolved reference in Line 40 of base_options.py.It is
self.opt = UniqueNamespace()
So,how can I solve this? Thank you!
Thank you so much for sharing the code. I got this error when I run
python main.py --mode distill_basic --dataset Cifar10 --arch AlexCifarNet \ --distill_lr 0.001
torch version: 1.4.0
torchvision: 0.5.0
I am wondering do you have any hints about what's wrong here? Thank you so much in advance.
2020-03-25 14:01:27 [ERROR] Fatal error:
2020-03-25 14:01:27 [ERROR] Traceback (most recent call last):
2020-03-25 14:01:27 [ERROR] File "main.py", line 402, in <module>
2020-03-25 14:01:27 [ERROR] main(options.get_state())
2020-03-25 14:01:27 [ERROR] File "main.py", line 131, in main
2020-03-25 14:01:27 [ERROR] steps = train_distilled_image.distill(state, state.models)
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 290, in distill
2020-03-25 14:01:27 [ERROR] return Trainer(state, models).train()
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 221, in train
2020-03-25 14:01:27 [ERROR] evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 288, in evaluate_steps
2020-03-25 14:01:27 [ERROR] res = _evaluate_steps(test_nets_desc, reset=(state.test_nets_type == 'unknown_init'))
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 276, in _evaluate_steps
2020-03-25 14:01:27 [ERROR] params = train_steps_inplace(state, models, steps, params, callback=test_callback)
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 75, in train_steps_inplace
2020-03-25 14:01:27 [ERROR] loss.backward(lr)
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
2020-03-25 14:01:27 [ERROR] torch.autograd.backward(self, gradient, retain_graph, create_graph)
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 93, in backward
2020-03-25 14:01:27 [ERROR] grad_tensors = _make_grads(tensors, grad_tensors)
2020-03-25 14:01:27 [ERROR] File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 25, in _make_grads
2020-03-25 14:01:27 [ERROR] raise RuntimeError("Mismatch in shape: grad_output["
2020-03-25 14:01:27 [ERROR] RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).
Begin of epoch 0 (1 unknown_init nets): 50%|███████████████████████████████████████████████████████████████████████████████████████████████ | 1/2 [00:00<00:00, 1.08it/s]Traceback (most recent call last):
File "main.py", line 402, in <module>
main(options.get_state())
File "main.py", line 131, in main
steps = train_distilled_image.distill(state, state.models)
File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 290, in distill
return Trainer(state, models).train()
File "/home/zhedamai/PycharmProjects/dataset-distillation/train_distilled_image.py", line 221, in train
evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 288, in evaluate_steps
res = _evaluate_steps(test_nets_desc, reset=(state.test_nets_type == 'unknown_init'))
File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 276, in _evaluate_steps
params = train_steps_inplace(state, models, steps, params, callback=test_callback)
File "/home/zhedamai/PycharmProjects/dataset-distillation/basics.py", line 75, in train_steps_inplace
loss.backward(lr)
File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 93, in backward
grad_tensors = _make_grads(tensors, grad_tensors)
File "/home/zhedamai/anaconda3/envs/torchnew/lib/python3.8/site-packages/torch/autograd/__init__.py", line 25, in _make_grads
raise RuntimeError("Mismatch in shape: grad_output["
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).
Hi, thanks for your work.
Have you tested whether your method is effective for general structured data other than images?
Hi, I am trying to use VGG to distill the images. But the gradient is too large to run the program. It will cost 38GB of the GPU memory to distill 10 images for Cifar10. Note that I just use one model for the distillation so the method in the advanced.md doesn't work under this situation. Many thanks! Could you provide some solutions for that
Best,
Yugeng
Hello, when I use ResNet18 with pretrained model from pytorch. It will show the warning.
[WARNING] BatchNorm2d contains buffer running_var. The buffer will be treated as a constant and assumed not to change during gradient steps. If this assumption is violated (e.g., BatchNorm*d's running_mean/var), the computation will be incorrect.
I am not sure if it will influence the results.
Many thanks
I see the visuals_step008.png
type files being generated, but where are the individual distilled images?
Thanks in Advance!
To compute the baseline for non-optimized random real images, I'm using the following command:
python main.py --dataset MNIST --arch LeNet \
--distilled_images_per_class_per_step 10 \
--phase test \
--test_nets_type unknown_init \
--test_distilled_images random_train \
--test_n_nets 200 \
--test_n_runs 10
however, I get the following error:
Traceback (most recent call last):
File "main.py", line 402, in <module>
main(options.get_state())
File "main.py", line 359, in main
test_runner = TestRunner(state)
UnboundLocalError: local variable 'TestRunner' referenced before assignment
After taking a look, this seems to happen because test_optimize_n_runs
is not set even though its supposed to be optional.
My guess is that https://github.com/SsnL/dataset-distillation/blob/master/main.py#L324-L355 should be out of its containing else
block. Is that it or am I missing something?
Thank you.
Hey I am very interested in this work, and have some questions to ask.
I used 20 images per class in MINIST dataset-distillation by using
python main.py --mode distill_basic --dataset MNIST --arch LeNet \--distill_steps 1 --train_nets_type known_init --n_nets 1 \--test_nets_type same_as_train
and achieved 96.54 testing accuracy.
But when I use these distilled images as training data to retrain a same initial model as used in distillation step by minibatch-SGD, the testing accuracy dropped to 62% and the overfitting occurred. My question is
(1)Is it just because the different way of optimization?
(2)Why optimized the network in the way of yours can avoid overfitting even used only 1 sample per class in MINIST dataset-distillation?
(3)How to use distilled images to retrain a good model in normal training way such as minibatch-SGD?
Hello!
I've noticed your warning
logging.warn(('{} contains buffer {}. The buffer will be treated as '
'a constant and assumed not to change during gradient '
'steps. If this assumption is violated (e.g., '
'BatchNorm*d\'s running_mean/var), the computation will '
'be incorrect.').format(m.__class__.__name__, n))
May I ask how do you keep buffer fixed during gradient steps(e.g. running mean and running var in batchnorm)? In this code there is only LeNet and AlexNet, so this won't be a problem. But I wonder have you done experiment on networks with batchnorm?
Thanks a lot!
I am very sad to know your paper was rejected by ICLR, I believe your research is very useful to many areas, especially in security and privacy. Good luck in next sumission.
Hello!
Am I right that the maximum number of classes on that you tested the distillation algorithm is 200 classes (CUB200)? Which GPU did you use?
I’m trying to run the code for more than 10 classes, and my GPU out of memory even for 15 classes. But it’s Tesla V100, and I can’t reproduce results for CUB200. Or you parallelized the algorithm somehow?
Hello!
I have a question about back-gradient optimization technique. Your paper mentioned this article, but reading the source code train_distill_image.py, I've noticed that you couldn't use SGD with momentum (because of previous learning rates influence), and so had to save neural network parameters of each forward step. So what is advantage of your scheme over usual backpropagation?
Hi, thank you for sharing your interesting work.
I'm puzzled about how to train a network use the distilled data, does it just set --mode
to 'train' and keep other options unchanged. Such as
python main.py --mode train --dataset MNIST --arch LeNet \ --distill_steps 1 --train_nets_type known_init --n_nets 1 \ --test_nets_type same_as_train
I have download your paper and read it. I think it is a very intesting idea and could give some help for our current reaserch. However, a question, does dataset disstillation could get high accuracy output in other dataset as good as MNIST?
Hello!
Very interesting paper, thanks!
I have a question about the ‘’’results.pth’’’. It’s a file with distilled images (tensors), but what about its structure? For example, for MNIST its ‘’’len’’’ 30, so it’s because we have 3 steps? And labels, it’s just tensor from 0 to 9, so the order is important, like the first 3 tensors are class 0, right?
And at the testing phase we use pretrained model which we got after distill basic?
Thanks!
yaml.load(input) was deprecated in PyYAML 5.1+ and removed in 6.0+ due to a CVE on arbitrary code execution
Code in base_options.py functions get_dummy_state()
and set_state()
uses yaml.load() function and is broken when used with PyYAML versions 6.0+
Instead yaml.full_load(input)
or yaml.load(input, Loader=FullLoader)
can be used
Hello,
when I run the following command:
python main.py --mode distill_basic --dataset MNIST --arch LeNet --distill_steps 1 --train_nets_type known_init --n_nets 1 --test_nets_type same_as_train
I get the following warnings:
/home/claudio.greco/dataset-distillation/base_options.py:423: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please rea
d https://msg.pyyaml.org/load for full details.
old_yaml = yaml.load(f) # this is a dict
2019-09-12 16:18:31 [WARNING] ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/opt.yaml already exists, moved t
o ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/old_opts/opt_2019_09_12__16_13_40.yaml
2019-09-12 16:18:31 [INFO ] train dataset size: 60000
2019-09-12 16:18:31 [INFO ] test dataset size: 10000
2019-09-12 16:18:31 [INFO ] datasets built!
2019-09-12 16:18:31 [INFO ] mode: distill_basic, phase: train
2019-09-12 16:18:31 [INFO ] Build 1 LeNet network(s) with [xavier(1.0)] init
^[[A2019-09-12 16:18:37 [INFO ] Train 1 steps iterated for 3 epochs
/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:82: UserWarning: Detected call of `lr_scheduler.step()` before `
optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in P
yTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
2019-09-12 16:18:37 [INFO ] Results saved to ./results/distill_basic/MNIST/arch(LeNet,xavier,1.0)_distillLR0.02_E(400,40,0.5)_lr0.01_B1x1x3_train(known_init)/checkpoints/epoch
0000/results.pth
2019-09-12 16:18:37 [INFO ]
2019-09-12 16:18:37 [INFO ] Begin of epoch 0 :
Begin of epoch 0 (1 same_as_train nets): 100%|####################################################################################################| 2/2 [00:00<00:00, 3.36it/s]
--- Logging error ---
Traceback (most recent call last):
File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 15, in emit
tqdm.tqdm.write(msg)
File "/home/claudio.greco/dataset-distillation/.venv/lib/python3.6/site-packages/tqdm/_tqdm.py", line 555, in write
fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
File "main.py", line 402, in <module>
main(options.get_state())
File "main.py", line 130, in main
steps = train_distilled_image.distill(state, state.models)
File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
return Trainer(state, models).train()
File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
logging.info(format_stepwise_results(state, steps, result_title, res))
File "/usr/lib64/python3.6/logging/__init__.py", line 1902, in info
root.info(msg, *args, **kwargs)
File "/usr/lib64/python3.6/logging/__init__.py", line 1308, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib64/python3.6/logging/__init__.py", line 1444, in _log
self.handle(record)
File "/usr/lib64/python3.6/logging/__init__.py", line 1454, in handle
self.callHandlers(record)
File "/usr/lib64/python3.6/logging/__init__.py", line 1516, in callHandlers
hdlr.handle(record)
File "/usr/lib64/python3.6/logging/__init__.py", line 865, in handle
self.emit(record)
File "/home/claudio.greco/dataset-distillation/utils/logging.py", line 20, in emit
self.handleError(record)
Message: 'Begin of epoch 0 (1 same_as_train nets) test results:\n\t STEP ACCURACY LOSS \n\t before steps 7.9102 \xb1 nan% 2.4235 \xb1 nan\n\t step 3 (lr=0.0200) 6.7383 \xb1 nan% 2.3925 \xb1 nan'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib64/python3.6/logging/__init__.py", line 996, in emit
stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb1' in position 262: ordinal not in range(128)
Call stack:
File "main.py", line 402, in <module>
main(options.get_state())
File "main.py", line 130, in main
steps = train_distilled_image.distill(state, state.models)
File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 296, in distill
return Trainer(state, models).train()
File "/home/claudio.greco/dataset-distillation/train_distilled_image.py", line 228, in train
evaluate_steps(state, steps, 'Begin of epoch {}'.format(epoch))
File "/home/claudio.greco/dataset-distillation/basics.py", line 300, in evaluate_steps
logging.info(format_stepwise_results(state, steps, result_title, res))
Message: 'Begin of epoch 0 (1 same_as_train nets) test results:\n\t STEP ACCURACY LOSS \n\t before steps 7.9102 \xb1 nan% 2.4235 \xb1 nan\n\t step 3 (lr=0.0200) 6.7383 \xb1 nan% 2.3925 \xb1 nan'
Arguments: ()
2019-09-12 16:18:38 [INFO ]
2019-09-12 16:18:38 [INFO ] Epoch: 0 [ 0/ 60000 ( 0%)] Loss: 2.3755 Data Time: 0.44s Train Time: 0.07s
2019-09-12 16:18:40 [INFO ] Epoch: 1 [ 0/ 60000 ( 0%)] Loss: 2.2400 Data Time: 0.12s Train Time: 0.03s
2019-09-12 16:18:41 [INFO ] Epoch: 2 [ 0/ 60000 ( 0%)] Loss: 1.7438 Data Time: 0.13s Train Time: 0.03s
The error on logging makes me impossible to use this script, because I cannot see accuracy, loss, etc. I also don't know if that error is related to the warning on the order of calling `optimizer.step()` and `lr_scheduler.step()`. (Maybe nan values are generated which cannot be properly encoded by the logger?)
Could you please help me to solve this issue? Could it be related to the versions of Python and PyTorch I am using? I am using Python 3.6.8 and PyTorch 1.2.0. What versions did you use exactly?
Thank you very much in advance.
Best,
Claudio
When I run the demo code without any change, I met the error as follows:
File "main.py", line 167
nonlocal avg_images
^
SyntaxError: invalid syntax
Pytorch version: 1.0
Ubuntu: 16.04
Hi!
I wonder is there any way to distill much more data exceeding gpu memory size limit? For a large scale dataset or a typical 11G/12G gpu memory size, that can be really useful. At first, I thought state.distributed
in your code is intended for that by putting distilled data into multiple gpus, then I found out I was wrong. It seems that this code only distills data of size fit for one gpu memory size. So, any advice on this matter?
Thanks a lot!
Hey I am very interested in this work, can you easy my job and indicate which lines do I need to customize to distillate my own dataset with Xavier initialization (Random initialization according to your paper) and a particular architecture not on your list?
TypeError: init() got an unexpected keyword argument 'padding_mode'.
How to deal with it?
Hi,
I have the following error when using the GPU on my own dataset (2 classes) and my own model:
"RuntimeError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 11.17 GiB total capacity; 10.47 GiB already allocated; 107.25 MiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
Following what you explained at this link : #28 , I tried different combinations of distill_steps, distill_epochs, distilled_images_per_class_per_step and num_distill_classes. I realized that the GPU limit was reached when: num_distill_classes * epochs * images_per_class_per_step * distill_steps > 4 .
The problem is that with 2 epochs, 2 steps, 1 image and 1 distilled class, the results are not sufficient.
What can I do to improve them?
ps : I use a tesla k80 (12 gb of dedicated memory) with 56 gb of RAM.
Thank you in advance
I noticed this problem when I wanted to test the distilled images (see basic.py):
The reason is the condition just above.
Indeed, for a binary classification, whose output returns 32 rows (so 64 values), doing (output > 0.5).to(target.dtype).view(-1) will return a 64-value tensor (32 values of 1) but the target contains only 32 values so it will create this problem.
So, to solve this problem, just apply output.argmax(-1) even for binary classification.
I think your paper is fascinating so I have been experimenting with it for a few weeks now.
I was wondering what hyperparams you used to get 10 images that achieve almost 94% accuracy on MNIST after 1 GD step and 3 epochs. I can't seem to hit this when I run the suggested code for 200 epochs. At most I managed to get around 91%.
python3 main.py --mode distill_basic --dataset MNIST --arch LeNet --distill_steps 1 --train_nets_type known_init --n_nets 1 --test_nets_type same_as_train
Hi SsnL
I'm trying to run this on the Jupyter but it returns this error:
Unexpected args: ['-f', '/home/user/.local/share/jupyter/runtime/kernel-d7f01d0e-54cb-461d-8d44-0d43cb505a17.json']
I searched for it, it seems the jupyter can't pass arguments correctly to base_options.py
Do you have any idea that how can I fix it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.