Giter VIP home page Giter VIP logo

tenas's Introduction

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [PDF]

Language grade: Python MIT licensed

Wuyang Chen, Xinyu Gong, Zhangyang Wang

In ICLR 2021.

Overview

We present TE-NAS, the first published training-free neural architecture search method with extremely fast search speed (no gradient descent at all!) and high-quality performance.

Highlights:

  • Trainig-free and label-free NAS: we achieved extreme fast neural architecture search without a single gradient descent.
  • Bridging the theory-application gap: We identified two training-free indicators to rank the quality of deep networks: the condition number of their NTKs, and the number of linear regions in their input space.
  • SOTA: TE-NAS achieved extremely fast search speed (one 1080Ti, 20 minutes on NAS-Bench-201 space / four hours on DARTS space on ImageNet) and maintains competitive accuracy.

Prerequisites

  • Ubuntu 16.04
  • Python 3.6.9
  • CUDA 10.1 (lower versions may work but were not tested)
  • NVIDIA GPU + CuDNN v7.3

This repository has been tested on GTX 1080Ti. Configurations may need to be changed on different platforms.

Installation

  • Clone this repo:
git clone https://github.com/chenwydj/TENAS.git
cd TENAS
  • Install dependencies:
pip install -r requirements.txt

Usage

0. Prepare the dataset

  • Please follow the guideline here to prepare the CIFAR-10/100 and ImageNet dataset, and also the NAS-Bench-201 database.
  • Remember to properly set the TORCH_HOME and data_paths in the prune_launch.py.

1. Search

python prune_launch.py --space nas-bench-201 --dataset cifar10 --gpu 0
python prune_launch.py --space nas-bench-201 --dataset cifar100 --gpu 0
python prune_launch.py --space nas-bench-201 --dataset ImageNet16-120 --gpu 0
python prune_launch.py --space darts --dataset cifar10 --gpu 0
python prune_launch.py --space darts --dataset imagenet-1k --gpu 0

2. Evaluation

  • For architectures searched on nas-bench-201, the accuracies are immediately available at the end of search (from the console output).
  • For architectures searched on darts, please use DARTS_evaluation for training the searched architecture from scratch and evaluation.

Citation

@inproceedings{chen2020tenas,
  title={Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective},
  author={Chen, Wuyang and Gong, Xinyu and Wang, Zhangyang},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Acknowledgement

tenas's People

Contributors

chenwydj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tenas's Issues

NTK calculation incorrect for networks with multiple outputs?

Howdy!

In: https://github.com/VITA-Group/TENAS/blob/main/lib/procedures/ntk.py

on line 45:

logit[_idx:_idx+1].backward(torch.ones_like(logit[_idx:_idx+1]), retain_graph=True)

I am confused about your calculation of the NTK, and believe that you may be misusing the first argument of the torch.Tensor.backward() function.

E.g.: when playing with the codebase with a very small 8 parameter network with 2 outputs:

class small(torch.nn.Module):
    def __init__(self,):
        super(small, self).__init__() 
        self.d1 = torch.nn.Linear(2,2,bias=False)
        self.d2 = torch.nn.Linear(2,2,bias=False)
    def forward(self, x):
        x = self.d1(x)
        x = self.d2(x)
        return x

Where for this explanation I have modified to:

gradient = torch.ones_like(logit[_idx:_idx+1])
gradient[0,0] = a
gradient[0,1] = b
logit[_idx:_idx+1].backward(gradient, retain_graph=True)

whereby J I mean your 'grad' list for a single network:

e.g.: lines 45 & 46:

grads = [torch.stack(_grads, 0) for _grads in grads]
ntks = [torch.einsum('nc,mc->nm', [_grads, _grads]) for _grads in grads]
print('J: ',grads)

for

gradient[0,0] = 0
gradient[0,1] = 1

J: [tensor([[-0.6255, -0.5019, 0.1758, 0.1411, 0.0000, 0.0000, -0.0727, -0.4643],
[ 0.9368, -0.0947, -0.2633, 0.0266, 0.0000, 0.0000, 0.0955, -0.0812]])]

=======

for

gradient[0,0] = 1
gradient[0,1] = 0

J: [tensor([[ 0.1540, 0.1236, -0.6473, -0.5194, -0.0727, -0.4643, 0.0000, 0.0000],
[-0.2307, 0.0233, 0.9694, -0.0980, 0.0955, -0.0812, 0.0000, 0.0000]])]

=======

for

gradient[0,0] = 1
gradient[0,1] = 1

J: [tensor([[-0.4715, -0.3783, -0.4715, -0.3783, -0.0727, -0.4643, -0.0727, -0.4643],
[ 0.7061, -0.0714, 0.7062, -0.0714, 0.0955, -0.0812, 0.0955, -0.0812]])]

"""

And so you can verify that your code is adding the two components together to get the last result.

The problem is that your Jacobian should have size: number_samples x [(number_outputs x number_weights)] ; See your own paper, page 2, where you show that the Jacobian's components are defined on the subscript i, the ith output of the model.

If I am right, then any network that has multiple outputs would have their NTK values incorrectly calculated, would have a time and memory footprint that is systematically reduced by the fact that these gradients are being pooled together.

Can not get the same architecture with same random seed and setting

Hi. I am run the commands for darts benchamrk search cifar10, but I got different result. The only difference is that because I need to update pytoch version for my gpu is not 1080ti and pytorch updates their methods for computing eigenvalues in later version, I changed

eigenvalues, _ = torch.symeig(ntk) # ascending
to torch.linalg.eigh, but the result geotype is differnt:

Genotype(normal=[('sep_conv_5x5', 0), ('avg_pool_3x3', 1), ('dil_conv_5x5', 0), ('sep_conv_3x3', 2), ('dil_conv_3x3', 0), ('avg_pool_3x3', 1), ('dil_conv_5x5', 1), ('sep_conv_5x5', 2)], normal_concat=[2, 3, 4, 5], reduce=[('sep_conv_3x3', 0), ('dil_conv_5x5', 1), ('dil_conv_3x3', 0), ('max_pool_3x3', 2), ('dil_conv_3x3', 0), ('sep_conv_5x5', 2), ('sep_conv_3x3', 2), ('dil_conv_5x5', 3)], reduce_concat=[2, 3, 4, 5])

which in DART_evaluation give me 96.79% test accuracy. Do you know the reason behind it?

Linear_Region_Collector

Dear authors,

We're using Linear_Region_Collector class to count the number of linear regions in our network. However, it always returns the same result(batch size * sample_batch). Could be please give us some information about the suitable way of using it?

TYPO in code?

image

Sorry to bother you guys again. I am trying to reproduce your work. I think this is a typo "self.op_name"

Question about training time?

I m training CIFAR10 dataset on a machine with a GPU(2080Ti).

According to the paper,

TE-NAS achieves a test error of 2.63%, ranking among the top of recent NAS results, but meanwhile largely reduces the search cost to only 30 minutes.

However, it takes me about 1hr to execute 80% of the process.
image

The command I use is python prune_launch.py --space darts --dataset cifar10 --gpu 1 following the instructions. Do I miss anything during the training?

Running NTK/LRC multiple times gives very inconsistent results

Hi there,

I am trying to reproduce the NTK and LRC functions that you have in your code and when I run the NTK for 3 (or 5) repeated runs with the same settings and input model, I get vastly different results, ie:

ntk_original, ntk
896.1322631835938 828.4542236328125
1274.0692138671875 1108.636962890625
890.8836059570312 1008.2345581054688

I would love to get a better sense of what the NTK actually does and how we can get consistent results.

Also do we need to initialize with kaiming? what is the point of this initialization and is there an alternative (ie. xavier, zero, none).

Calculating number of linear regions

Dear authors,

I am having a question for calculating number of linear regions. It seems that in TE-NAS, input images are augmented to be of size (1000,1,3,3):
lrc_model = Linear_Region_Collector(input_size=(1000, 1, 3, 3), sample_batch=3, dataset=xargs.dataset, data_path=xargs.data_path, seed=xargs.rand_seed)

Could you explain what the reason is behind this?

How to eval the Darts?

I followed the instructions and successfully obtained "arch_parameter.npy". It took me around 2hrs to search a network from cifar10 dataset. How do I reuse your code to evaluate the performance of the network?

BTW,
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.