Giter VIP home page Giter VIP logo

pytorch-bignas's Introduction

PyTorch-BigNAS

Unofficial PyTorch implementation of BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models.

Yu J, Jin P, Liu H, et al. Bignas: Scaling up neural architecture search with big single-stage models[C]//European Conference on Computer Vision. Springer, Cham, 2020: 702-717.

Paper link: Arxiv

Requirements

  • Python >= 3.7
  • PyTorch = 1.9

Other requirements are listed in the requirements.txt.

Data Preparation

It is highly recommended to save or link datasets to the $pytorch-BigNAS/data folder, thus no additional configuration is required. However, manually setting the path for datasets is also available by modifying the cfg.LOADER.DATAPATH attribute in the configuration file $pytorch-BigNAS/core/config.py.

BigNAS uses ImageNet for training only, thus you can simply link ImageNet to the data/imagenet folder like this:

ln -s /PATH/TO/IMAGENET $pytorch-BigNAS/data/imagenet

Run example

Adjust the batch size (and other parameters accordingly) if out of memory (OOM) occurs.

  • Train Supernet (Multi-GPU only) python train_supernet.py --cfg configs/train.yaml
  • Search python search.py --cfg configs/search.yaml
  • Evaluation python eval.py --cfg configs/eval.yaml

The configuration file can be overridden by adding or modifying additional parameters on the command line. For example, run eval.py with the modified output directory could be like: python eval.py --cfg configs/eval.yaml OUT_DIR exp/220621/

Results

We performed the supernet training period using 8 Nvidia Tesla V100 GPUs, and the total time spent was about 16 days. The supernet training results are as follows:

Arch Accuracy (top-1) Accuracy (top-5)
MAX (final) 79.60 93.89
MAX (best) 79.89 94.52
MIN (final) 74.32 91.46
MIN (best) 74.72 91.84

We note that this produces a performance degradation of about 1% from the original paper. This may be due to different specific settings. In practice, we are following most configurations from AttentiveNAS. However, since we reduce the number of running GPUs, the exact results may vary depending on the different hyperparameters. The parts that we have modified in the actual runtime are as follows:

Modified items In paper This code
#GPUs 64 8
batch_size per GPU 64 80
learning_rate 0.256 0.128
optimizer RMSProp SGD
weight_decay 1e-5 for the biggest arch only 3e-6 for all

The learning curves from the tensorboard are as follows:

tensorboard_results

Note:

We notice that the learning curves of the supernet produced a large oscillation under this setting, which may cause the performance drop and training instability. However, due to the resource constraints (125 GPU-days per training using Tesla V100), I am unable to make detailed adjustments and fine-tune the hyper-parameters at this time. If you use this code to run and get better results, we would appreciate you submitting it to us via Pull Requests.

Contributing

We welcome contributions to the library along with any potential issues or suggestions.

Also, if you find this code useful, please consider leaving a star🌟.

Reference

This implementation is mainly adapted from the source code of XNAS, AttentiveNAS, and OFA.

About XNAS

XNAS is an effective, modular, and flexible Neural Architecture Search (NAS) repository, which aims to provide a common framework and baselines for the NAS community. It is originally designed to decouple the search space, search algorithm, and performance evaluation strategy to achieve a freely combinable NAS.

pytorch-bignas's People

Contributors

xfey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pytorch-bignas's Issues

Discuss the pretrain weight from BigNAS?

Hi,

I have trained a bigNas ImageNet. And I use the supernet to finetune the downstream task. Here I use cub to do the experiement. I use r50 as my supernet.

And I found out the transfer performance is not good as compared to the pretrained weight from normal training (you can find resnet50 on torchvision). that is, using pretrained weight from bigNas may not produce a good result.

I also found out in Table 2 from the paper.

image

they should carefully finetune otherwise the performace would drop, even though the imagenet itself does need pretrain weight.

do you have any idea about this?

General Question regarding reset running statistics of BN

Hi thanks for your great work! I got a question regarding your implementation of def reset_running_stats_for_calibration. I find that you set bn.momentum=None, which means that you calibtrate BN layers by using exponential moving average. Why you do this in this way rather than set bn.momentum unchanged?

I'm looking forward to hearing your feedback.
Best regards,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.