Giter VIP home page Giter VIP logo

hypercl's Introduction

Continual Learning with Hypernetworks

A continual learning approach that has the flexibility to learn a dedicated set of parameters, fine-tuned for every task, that doesn't require an increase in the number of trainable weights and is robust against catastrophic forgetting.

For details on this approach please read our paper. You can find our spotlight presentation here and a more detailed introduction in this talk. Experiments on continual learning with hypernetworks using sequential data and recurrent networks can be found in this repository. Furthermore, this repository studies a probabilistic extension of the proposed CL algorithm.

If you are interested in working with hypernetworks in PyTorch, check out the package hypnettorch. The package also provides an example implementation of our method for task-incremental learning.

Toy Examples

Some toy regression problems can be explored in the folder toy_example. Please refer to the corresponding documentation. Example run:

$ python3 -m toy_example.train --no_cuda

MNIST Experiments

You can find instructions on how to reproduce our MNIST experiments and on how to use the corresponding code in the subfolder mnist.

CIFAR Experiments

Please checkout the subfolder cifar. You may use the script cifar.train_zenke to run experiments using the same network as Zenke et al. and the script cifar.train_resnet to run experiments with a Resnet-32.

Testing

All testing of implemented functionality is located in the subfolder tests and documented here. To run all unit tests, execute:

$ python3 -m unittest discover -s tests/ -t .

Documentation

Please refer to the README in the subfolder docs for instructions on how to compile and open the documentation.

Setup Python Environment

We use conda to manage Python environments. To create an environment that already fulfills all package requirements of this repository, simply execute

$ conda env create -f environment.yml
$ conda activate hypercl_env

Citation

Please cite our paper if you use this code in your research project.

@inproceedings{ohs2019hypercl,
title={Continual learning with hypernetworks},
author={Johannes von Oswald and Christian Henning and Benjamin F. Grewe and Jo{\~a}o Sacramento},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://arxiv.org/abs/1906.00695}
}

hypercl's People

Contributors

chrhenning avatar johswald avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hypercl's Issues

Context modulation

Thank you for providing this repository. I have a question about the use of context-modulated layers in the target network (mnets/mlp.py, class MLP). I could not find a mention of this in the paper. I also saw that in hypercl/utils/context_mod_layer.py, the class ContextModLayer is not implemented yet. Is there another place where you make this available? Is this layer used in the experiments reported in the paper?

requirements

First, thanks for sharing this repository. Reading the paper made me want to immediately try the code.
Not knowing what versions of software you used, I made the virtual Python environment:
Python 3.7.5
PyTorch 1.3.1
TensorBoard 1.14.0

When trying to train the cifar example via

$ pycl train_resnet.py --use_adam --custom_network_init --plateau_lr_scheduler --lambda_lr_scheduler

I get the following error:

AttributeError: 'SummaryWriter' object has no attribute 'add_hparams'

I suspect that I could avoid this TensorBoard error by using the same software versions as the authors. Could I know the requirements of this repository?

Thanks for any help on this.

Confusing Experiment Result

Hi! Thanks for your great work!!!

When I ran the Split MNIST experiment with HNET+ENT in CL 2 and CL 3 (Using the command in readme $ python train_splitMNIST.py --infer_task_id --infer_with_entropy --cl_scenario 2/3).
I got the result like this. (from the performance_overview.txt)

**My question is why the acc of the last task is only 6.3540% (Even the acc_during is 6.3540%), as a binary classification problem, random selection can get about 50%.
Another problem is why the acc (acc_after) of task1 is much higher than others. If there is a catastrophic forgetting, the acc of task5 should be the highest. But in my result, the acc of task5 is the lowest(6.3540%).
I found the overall_task_infer_accuracy is the same as acc_after_list tensor, is this a coincidence?
How do you get your acc_after_mean result reported in your paper? You got 69.48 ± 0.80, but I just got 50.99 (just like a random selection). **

In CL3 :
acc_after_list tensor(99.7636, device='cuda:2') tensor(62.9285, device='cuda:2') tensor(54.0555, device='cuda:2') tensor(31.8731, device='cuda:2') tensor(6.3540, device='cuda:2')
acc_during_list tensor(99.9527, device='cuda:2') tensor(72.2821, device='cuda:2') tensor(55.6030, device='cuda:2') tensor(31.7221, device='cuda:2') tensor(6.3540, device='cuda:2')
acc_after_mean 50.994938
acc_during_mean 53.182766
overall_task_infer_accuracy_list tensor(99.7636, device='cuda:2') tensor(62.9285, device='cuda:2') tensor(54.0555, device='cuda:2') tensor(31.8731, device='cuda:2') tensor(6.3540, device='cuda:2')
acc_task_infer_mean 50.994938
num_train_iter 2000
num_weights_class_net 478410.000000
num_weights_rp_net 690684.000000
num_weights_rp_hyper_net 552712.000000
num_weights_class_hyper_net 465672.000000
compression_ratio_class 0.973374
compression_ratio_rp 4.061550

In CL2, I got similar results as the paper, but the acc of the last task is still the lowest and the acc of task1 is the highest. I have the same confusions as CL3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.