gt-ripl / continual-learning-benchmark Goto Github PK

View Code? Open in Web Editor NEW

500.0 500.0 91.0 481 KB

Evaluate three types of task shifting with popular continual learning algorithms.

License: MIT License

Python 65.81% Shell 34.19%

artificial-neural-networks continual-learning continuous-learning deep-learning incremental-learning lifelong-learning

continual-learning-benchmark's People

Contributors

Stargazers

Watchers

Forkers

vlomonaco wkumagai nickledave lscheinkman ffeng1996 brjathu prathyusha-akundi jun2tong amir4g abhishekaich27 bobby-he pohanchi yuanmengzhixing pavanteja295 xychen9459 ramasesh fedpernici mcdavid109 megayeye lijuny sailfish009 zhangzhao156 caomaowsh rahzaazhar cliffordlai rohilbadkundri nchuramani tango4j niuwk fengpanhe pyun-ram xiaochanghu e7mul soubanerjee msrocean samuilstoychev sayeedchowdhury pecanjk noodlesz ifgovh twistedmove fagan2888 flyinggh andrewliao11 bwry toil2sweet codeman0001 bartwojcik przem85 aliayub7 zeinabbo jizongfox parvex airone-v2 ml-ai-nlp-ir linhduongtuan gabriel-hurtado tzuren dsoselia snehashis1997 gzgz-code arshika77 miria00 convwei bethhhh stevenyzzhang yixinghuang yulonghui qcwthu psbd25 sakshi09ch abbasi-ali gawainxu mackzacka erikvalle cryptowealth-technology gkyustc pennfranc 2021300350 dsj96 trentbrick bjzhb666 dwh649821599 zmeidd luomingshuang michaeltsai25 kandeldeepak46 betterbench

continual-learning-benchmark's Issues

EWC implementation

Hi! Thanks for your awesome code! It really helps!
I have a problem with the implementation of EWC in this repo.

According to the paper of DeepMind, EWC adopt the Fisher information matrix to approximate the

I think the $p(\mathcal D_A|\theta)$ should be selected according to the ground truth label.
If adopting the maximum value of predictions as

Continual-Learning-Benchmark/agents/regularization.py

Line 133 in d78b997

ind = pred.max(1)[1].flatten() # Choose the one with max

it may have chances to wrongly get the likelihood of a false class when computing the fisher information.

EWC mini-batch sampling

Hi, Thank you so much for this awesome repo. It's the clearest implementation I found out there :)

I have a question regarding the mini-batch sampling. In the code, it is commented that it gives similar performance to (sub-sampling with batch_size=1, i.e., the correct mathematical way). But I'm worried that they are very different.
So I'm curious to know whether there are papers that used this sampling instead and they confirmed its similar performance?

The reason for my doubt is that in general, the expected value of the squared gradients of log-likelihoods which is an estimator for the diagonal of the Fisher matrix is not the same as the expected squared expected gradients of log-likelihoods.

Thank you for your consideration,
Arash

Request for detailed hyper parameters used

Hello,
Thanks a lot for your repository. I'm trying to reproduce your results for my experiments and used them as baseline. However I have some trouble in reproducing the exact numbers for MNIST data using MLP network that was reported in the paper. I feel the values for the following missing hyper parameters are missing in the paper and the code

L2 regularization coefficient used for L2 baseline.
Was lr decay used ? If yes can you shed some details on it .

It would be great if you can share a config file for the best results you got. Although the paper presents some details about these which is very helpful I feel entire configurations are missing.

Thanks in advance

torchvision.datasets.cifar has no training labels, test labels

Hi,
I'm using pytorch=1.1.0 and torchvision=0.3.0. For this current version there is a small bug in the cifar100 and cifar10 data loaders.

It seems to be that cifar class has no attribute training_labels or test_labels which was used in dataloaders/base.py. Also the documentation states no such attributes CIFAR. However MNIST has such attributes.

Is this is a bug or was it working for you for the previous versions? It's strange that torchvision lacks homogenity among datasets.

Lemme know if it's a bug I can raise a pull request

GEM uses memory data in active training set and updates memory twice

The GEM implementation here inherits from Naive_Rehearsal and calls super(GEM, self).learn_batch(train_loader, val_loader). It therefore uses the learn_batch method of Naive_Rehearsal which uses memory data together with the new data to compute the original gradients before checking conflicts with any memory gradients. From my understanding that is not the intent of the original paper and might affect the training results.

Additionally, Naive_Rehearal's learn_batch method already updates the memory and task_count, but GEM does this a second time once the call returns.

[Question] Calculation of Importance of the weights for EWC

Thanks for your great work.

I have a question about the regularization method; we take EWC as an example.

In https://github.com/GT-RIPL/Continual-Learning-Benchmark/blob/master/agents/regularization.py#L43-L44, you calculate the importance weights in each batch, but I think the calculation is useless during the training process. The importance weights only need to be calculated after training one task.

What is the reason for that? I think it is time-consuming if we calculate at each batch.

In https://github.com/srvCodes/continual_learning_with_vit/blob/main/src/approach/ewc.py#L117-L132, they calculate the importance weight only at the end of the training process for one task.

Code of DGR, RtF?

Hello! Thank you for the beautiful code open.

The benchmark results include DGR and RtF. However, there is no code for them.
Can I get DGR and RtF codes for this benchmark?

EWC online weights explosion using SGD

The problem is reproducible by running

python -u iBatchLearn.py --dataset CIFAR100 --train_aug --gpuid 0 --repeat 1 --incremental_class --optimizer SGD --force_out_dim 100 --no_class_remap --first_split_size 20 --other_split_size 20 --schedule 1 --batch_size 128 --model_name WideResNet_28_2_cifar --model_type resnet --agent_type customization --agent_name EWC_online --lr 0.001 --reg_coef 100

Reproducibility issue

Hello,

Thanks for the CIFAR100 scripts. I am having some issues trying to reproduce naive rehearsal results for class incremental learning task. I executed the naive rehearsal 5600 baseline with the following command:

python -u iBatchLearn.py --dataset CIFAR100 --train_aug --gpuid 0 --repeat 5 --incremental_class --optimizer Adam --force_out_dim 100 --no_class_remap --first_split_size 20 --other_split_size 20 --schedule 80 120 160 --batch_size 128 --model_name WideResNet_28_2_cifar --model_type resnet --agent_type customization --agent_name Naive_Rehearsal_5600 --lr 0.001

Comparing the result with Naive Rehearsal-C reported in https://github.com/GT-RIPL/Continual-Learning-Benchmark/blob/master/fig/results_split_cifar100.png, I obtained the following results:
With pytorch 1.0.0 and torchvision 0.2.2, the final result is around 40%, instead of 51.28.
With pytorch 1.4.0 and torchvision 0.5.0, the final result is around 20%, instead of 51.28.

This is consistently happening with various GPUs (RTX 2080Ti, Titan Xp and Tesla P4/K80).

Could you share the environment configuration you used to obtain reported results on CIFAR100?

Grid search script for hyper parameters of EWC, SI, L2

Hello,

Thanks a lot for uploading the additional scripts for CIFAR-100. I'm trying to replicate the results for various methods through your code. First of all I am very much thankful for the scripts as it helps me avoid the burden of finding the best results through hyper parameter sweep. (Although I would try it in the future.)

One thing I want to ask is how did you perform the search for regularization coefficients for various datasets and architectures? Did you set some intuitive limits for the grid search by looking at the train and test graphs? I'm asking this as I want to replicate this on other datasets lets say CIFAR-10. I can add the scripts for CIFAR-10 and Image net in the future to your repository once I finish experimenting.

Would be great if you share your script for the grid search.

gt-ripl / continual-learning-benchmark Goto Github PK

continual-learning-benchmark's People

Contributors

Stargazers

Watchers

Forkers

continual-learning-benchmark's Issues

EWC implementation

EWC mini-batch sampling

Request for detailed hyper parameters used

torchvision.datasets.cifar has no training labels, test labels

GEM uses memory data in active training set and updates memory twice

[Question] Calculation of Importance of the weights for EWC

Code of DGR, RtF?

EWC online weights explosion using SGD

Reproducibility issue

Grid search script for hyper parameters of EWC, SI, L2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent