Giter VIP home page Giter VIP logo

canyonwind / single-path-one-shot-nas-mxnet Goto Github PK

View Code? Open in Web Editor NEW
152.0 9.0 22.0 115.58 MB

Single Path One-Shot NAS MXNet implementation with full training and searching pipeline. Support both Block and Channel Selection. Searched models better than the original paper are provided.

Python 98.14% Shell 1.86%
neural-architecture-search neural-network shufflenet mxnet gluon single-path-one-shot

single-path-one-shot-nas-mxnet's Introduction

This repository contains Single Path One-shot NAS implementation on MXNet (Gluon). It can finish the whole training and searching pipeline on ImageNet within 60 GPU hours (on 4 V100 GPUs, including supernet training, supernet searching and the searched best subnet training) in the exploration space of about 32^20 choices. By utilizing this implementation, a new state-of-the-art NAS searched model has been found which outperforms other NAS models like FBNet, MnasNet, DARTS, NASNET, PNASNET and the original SinglePathOneShot by a good margin in all factors of FLOPs, parameters amount and top-1/5 accuracies. Also for considering Google's MicroNet Challenge Σ Normalized Scores, before any quantization, it outperforms other popular handcrafted efficient models like MobileNet V1 V2, V3, ShuffleNet V1, V2 too.

alt text

11/12/2019 Update:

Single Path One Shot NAS author has released their Supernet Training and Searching codes. The comparison table between the official version and this repo's implementation has been updated. Check here for more details.

10/09/2019 Update:

A searched model Oneshot-S+, with the block choices and channel choices searched by this repo's implementation, ShuffleNetV2+ style SE and MobileNetV3 last convolution block design, reaches the new highest top-1 & top-5 accuracies with the new lowest Google MicroNet Challenge Σ Normalized Scores among other NAS searched or popular handcrafted efficient models. Check here for comparison.

09/30/2019 Update:

A customized model Oneshot+, with the block choices and channel choices provided from paper, ShuffleNetV2+ style SE and MobileNetV3 last convolution block design, reaches the highest top-1 & top-5 accuracies with the lowest Google MicroNet Challenge Σ Normalized Scores among other NAS searched or popular handcrafted efficient models. Check here for comparison.

NAS Model FLOPs # of Params Top - 1 Top - 5 Σ Normalized Scores Scripts Logs
OneShot+ Supernet 841.9M 15.4M 62.90 84.49 7.09 script log
OneShot-S+ (ours) 291M 3.5M 75.75 92.77 1.9166 script log
OneShot+ (ours) 297M 3.7M 75.24 92.58 1.9937 script log
OneShot (ours) 328M 3.4M 74.02* 91.60 2 script log
OneShot (official) 328M 3.4M 74.9* 92.0 2 - -
FBNet-B 295M 4.5M 74.1 - 2.19 - -
MnasNet 317M 4.2M 74.0 91.8 2.20 - -
DARTS 574M 4.7M 73.3 91.3 3.13 - -
NASNET-A 564M 5.3M 74.0 91.6 3.28 - -
PNASNET 588M 5.1M 74.2 91.9 3.29 - -

*According to this issue, the official released model has been trained multiple times with the reported top-1 accuracy ranging [74.1 ~ 74.9]. Others using the official pytorch release have obtained accuracies ranging [73.7 ~ 73.9]. All models in this repo's implementation have only been trained once.

Model FLOPs # of Params Top - 1 Top - 5 Σ Normalized Scores Scripts Logs
OneShot-S+ (ours) 291M 3.5M 75.75 92.77 1.9166 script log
OneShot+ (ours) 297M 3.7M 75.24 92.58 1.9937 script log
OneShot (ours) 328M 3.4M 74.02 91.60 2 script log
MobileNetV3 Large 217M 5.4M 75.2 - 2.25 - -
MobileNetV2 (1.4) 585M 6.9M 74.7 - 3.81 - -
MobileNetV1 569M 4.2M 70.6 - 2.97 - -
ShuffleNetV2 2.0x 591M 7.4M 75.0 92.4 3.98 - -
ShuffleNetV1 2.0x 524M 5.4M 74.1 91.4 3.19 - -

Comparision to the official release

Single Path One Shot NAS provides an elegent idea to effortlessly search for optimized subnet structures, under different model size/latency constraints, with single time supernet training and multiple times low-cost searching procedures. The flexibility and efficiency of this approach can benefit to many pratical senarios where a neural network model needs to be deployed across platforms. With the aid of this approach, manually tuning the structures to meet different hardware constraits can be avoided. Unfortunately, the author hasn't released the full Supernet Training and Searching parts yet. This repo makes up for the missing of them.

Model Official This repo
Subnet Training
Block Selection
Channel Selection ×
Supernet Training - With Block Choices
Supernet Training - With Channel Choices ×
Supernet Training - With FLOP/Param Constraints ×
Supernet Training - With Strolling Evolution Constraints -
General FLOPs & Parameters Counting Tool
Fast Counting Tool with pre-calculated lookup table ×
BN Stat Update for Val Acc ×
BN Stat Update for Supernet Searching ×
Random Search ×
Genetic Search - On Block Choices
Genetic Search - On Channel Choices ×
Genetic Search - Jointly ×
SE -
Efficient Last Conv Block -
Op to Op Profiling Tool -
Merge BN -
Int8 Quantization -

Usage

Download the ImageNet dataset, reorgnize the raw data and create MXNet RecordIO files (or just put the validation images in its corresponding class folder) by following this script.

Set up the environments.

python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv
python3 -m venv env

source env/bin/activate
pip install -r requirements.txt

Train & search

# Train supernet
sh ./scripts/train_supernet.sh

# Search supernet
sh ./scripts/search_supernet.sh

# Train best searched model
sh ./scripts/train_oneshot.sh

Detailed usage for training and searching can be found here.

Approach breakdown

Our approach is mainly based on the Single Path One Shot NAS in the combination of Squeeze and Excitation (SE), ShuffleNet V2+ and MobileNet V3. Like the original paper, we searched for the choice blocks and block channels with multiple FLOPs and parameter amount constraints. In this section, we will elaborate on the modifications from the original paper.

Supernet Structure Design

For each ShuffleNasBlock, four choice blocks were explored, ShuffleNetBlock-3x3 (SNB-3), SNB-5, SNB-7 and ShuffleXceptionBlock-3x3 (SXB-3). Within each block, eight channel choices are avialable: [0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0] * (BlockOutputChannel / 2). So each ShuffleNasBlock explores 32 possible choices and there are 20 blocks in this implementation, counting for totaly 32^20 design choices.

We also applied the SE, ShuffleNet V2+ SE layout and the MobileNet V3 last convolution block design in the supernet. Finally, the supernet contains 15.4 Million trainable parameters and the possible subnet FLOPs range from 168M to 841M.

Supernet Training

Unlike what the original paper did, in the training stage, we didn't apply uniform distribution from the beginning. We train the supernet totally 120 epochs. In the first 60 epochs doing Block selection only and, for the upcoming 60 epochs, we used Channel Selection Warm-up which gradually allows the supernet to be trained with a larger range of channel choices.

   # Supernet sampling schedule: during channel selection warm-up
   1 - 60 epochs:          Only block selection (BS), Channels are set to maximum (here [2.0])
   61 epoch:               [1.8, 2.0] + BS
   62 epoch:               [1.6, 1.8, 2.0] + BS
   63 epoch:               [1.4, 1.6, 1.8, 2.0] + BS
   64 epoch:               [1.2, 1.4, 1.6, 1.8, 2.0] + BS
   65 - 66 epochs:         [1.0, 1.2, 1.4, 1.6, 1.8, 2.0] + BS
   67 - 69 epochs:         [0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0] + BS
   70 - 73 epochs:         [0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0] + BS 

The reason why we did this in the supernet training is that during our experiments we found, for supernet without SE, doing Block Selection from beginning works well, nevertheless doing Channel Selection from the beginning will cause the network not converging at all. The Channel Selection range needs to be gradually enlarged otherwise it will crash with free-fall drop accuracy. And the range can only be allowed for (0.6 ~ 2.0). Smaller channel scales will make the network crashing too. For supernet with SE, Channel Selection with the full choices (0.2 ~ 2.0) can be used from the beginning and it converges. However, doing this seems like harming accuracy. Compared to the same se-supernet with Channel Selection warm-up, the Channel Selection from scratch model has been always left behind 10% training accuracy during the whole procedure.

Train with Constraints

alt text

Another thing we need to consider is that the subnet sample space during training might not be aligned with where the space being searched on during the searching stage. For instance, if we are searching for the top performance models within (190m ~ 330m) FLOPs and (2.8m ~ 5.0m) # params, the target subnet FLOPs & Params should locate within the blue dotted rectangle shown above. 5000 subnet distributions are drawn here to compare the differences between three training settings: 1) No constraint 2) meet constraints with random sampling and 3) meet constraints with strolling evolution.

If the supernet is trained without constraints (the left plot), about 2/3 of the trained subnets are not located in the target space and these samples might disturbe the final supernet's credibility. The sampled subnet performance from the supernet might not be able to accurately indicate each subnet's real performance.

If the supernet is trained with constraints and these constraints are met by random samlping (the middle plot), although all 5000 subnets are located in the target area now, the distribution is not even. Subnets in top left corner can possibly not even be trained once. Since that part hasn't been trained, their sampled performances from supernet can be misguidingly inaccurate neither.

If the supernet is trained with constraints and these constraints are met by a controllable evolution method (the right plot), named as strolling evolution since it's walking around in the target area, all 5000 subnets are now evenly distributed within the target space. Because most of the cases have been uniformly considered during the training, the supernet trained with this strolling evolution method could be more trustworthy when using it to sample the subnet's performance.

Subnet Searching

Different from the paper, we jointly searched for the Block choices and Channel Choices in the supernet at the same time. It means that for each instance in the population of our genetic algorithm it contains 20 Block choice genes and 20 Channel choice genes. We were aiming to find a combination of these two which optimizing for each other and being complementary.

For each qualified subnet structure (has lower Σ Normalized Scores than the baseline OneShot searched model), like most weight sharing NAS approaches did, we updated the BN statistics firstly with 20,000 fixed training set images and then evaluate this subnet ImageNet validation accuracy as the indicator for its performance.

Subnet Training

For the final searched model, we build and train it from scratch. No previous supernet weights are reused in the subnet.

As for the hyperparameters. We modified the GluonCV official ImageNet training script to support both supernet training and subnet training. We trained both models with initial learning rate 1.3, weight decay 0.00003, cosine learning rate scheduler, 4 GPUs each with batch size 256, label smoothing and no weight decay for BN beta gamma. Supernet was trained 120 epochs and subnet was trained 360 epochs.

Results

Supernet Training

Model FLOPs # of Params Top - 1 Top - 5 Σ Normalized Scores Scripts Logs
OneShot+ Supernet 1684M 15.4M 62.9 84.5 3.67 script log

Supernet Searching

alt text

Two identical supernets have been trained and searched to verify whether we can directly search for the Block and Channel choices on the supernet trained without Channel Selection. On the left, the supernet is trained only with Block Selection but no Channel Selection. On the right, it is trained with both Block and Channel selections. These two supernets are searched with the same Block & Channel joint evolution method. The evolution method is indeed able to gradually find good candidates from the left supernet, with Block Selection alone. But the subnets sampled from the right supernet, trained with Block & Channel Selection, clearly contain higher accuracy range (0.6 ~ 0.63) than the left one (0.2 ~ 0.6).

Searched Models Performance

Model FLOPs # of Params Top - 1 Top - 5 Σ Normalized Scores Scripts Logs
OneShot+ Supernet 841.9M 15.4M 62.90 84.49 7.09 script log
OneShot-S+ 291M 3.5M 75.75 92.77 1.9166 script log
OneShot+ 297M 3.7M 75.24 92.58 1.9937 script log
OneShot (our) 328M 3.4M 74.02 91.60 2 script log
OneShot (official) 328M 3.4M 74.9 92.0 2 - -
FBNet-B 295M 4.5M 74.1 - 2.19 - -
MnasNet 317M 4.2M 74.0 91.8 2.20 - -
DARTS 574M 4.7M 73.3 91.3 3.13 - -
NASNET-A 564M 5.3M 74.0 91.6 3.28 - -
PNASNET 588M 5.1M 74.2 91.9 3.29 - -
MobileNetV3 Large 217M 5.4M 75.2 - 2.25 - -
MobileNetV2 (1.4) 585M 6.9M 74.7 - 3.81 - -
MobileNetV1 569M 4.2M 70.6 - 2.97 - -
ShuffleNetV2 2.0x 591M 7.4M 75.0 92.4 3.98 - -
ShuffleNetV1 2.0x 524M 5.4M 74.1 91.4 3.19 - -

OneShot-S+ Profiling

A detailed op to op profiling can be found here. The calculation here follows MicroNet Challenge way. It's slightly different from how most paper reported FLOPs.

Roadmap

  • Implement the fixed architecture model from the official pytorch release.
  • Implement the random block selection and channel selection.
  • Verify conv kernel gradients would be be updated according to ChannelSelector
  • Make the fixed architecture model hybridizable.
  • Train a tiny model on Imagenet to verify the feasibility.
  • Modify the open source MXNet FLOP calculator to support BN
  • Verify that this repo's implementation shares the same # parameters and # FLOPs with the official one.
  • Add SE and hard swish in the model (on/off can be controlled by --use-se)
  • Add MobileNetV3 style last conv (on/off can be controlled by --last-conv-after-pooling)
  • Train the official fixed architecture model on Imagenet
  • Train the official uniform selection supernet model on Imagenet
    • Add --use-all-blocks, --use-all-channels and --epoch-start-cs options for the supernet training.
    • Add channel selection warm up: after epoch_start_cs, the channel selection range will be gradually increased.
    • Train the supernet with --use-se and --last-conv-after-pooling --cs-warm-up
  • Build the evolution algorithm to search within the pretrained supernet model.
    • Build random search
    • update BN before calculating the validation accuracy for each choice
      • Build and do unit test on the customized BN for updating moving mean & variance during inference
      • Replace nn.batchnorm with the customized BN
    • Evolution algorithm
    • Evolution algorithm with flop and # parameters constraint(s)
  • Quantization
    • To eliminate the possibility that BN may cause quantization problem, add merge BN tool
    • To eliminate the possibility that reshape may cause quantization problem, add ShuffleChannelByConv option
    • Follow up on this issue
  • Search a model having both less FLOPs and # of parameters than MobileNet V3
    • Add a searching mode which can specify hard FLOP and # of parameter constrains but not just the Σscores.
    • Search within the OneShot supernet with provided stage channels, se and MobilNet V3 style conv
      • This supernet setting cannot (quickly) find enough qualified candidates for population
    • In progress: Train ShuffleNetV2+ channels layout supernet with se and MobilNet V3 style last convolution block.
    • Train the best searched subnet model
  • Two stage searching
    • Do Block search firstly
    • Based on the best searched blocks, do channel search
  • Estimate each (block, # channel) combination cpu & gpu latency
    • Build a tool to generate repeating blocks
    • Estimate speeds for 4 choice blocks with different input/mid/output channels
  • More upcoming features/plans are moved into the project section

Summary

In this work, we provided a state-of-the-art open-sourced weight sharing Neural Architecture Search (NAS) pipeline, which can be trained and searched on ImageNet totally within 60 GPU hours (on 4 V100 GPUS) and the exploration space is about 32^20. The model searched by this implementation outperforms the other NAS searched models, such as Single Path One Shot, FBNet, MnasNet, DARTS, NASNET, PNASNET by a good margin in all factors of FLOPS, # of parameters and Top-1 accuracy. Also for considering the MicroNet Challenge Σ score, without any quantization, it outperforms MobileNet V2, V3, ShuffleNet V1, V2. This implementation can benefit to many pratical senarios where a neural network model needs to be deployed across platforms. With the aid of this approach, manually tuning the model structures to meet different hardware constraits can be avoided.

Citation

If you use these models in your research, please cite the original paper.

@article{guo2019single,
        title={Single path one-shot neural architecture search with uniform sampling},
        author={Guo, Zichao and Zhang, Xiangyu and Mu, Haoyuan and Heng, Wen and Liu, Zechun and Wei, Yichen and Sun, Jian},
        journal={arXiv preprint arXiv:1904.00420},
        year={2019}
}

And references to the following BibTex entry would be appreciated too.

@misc{yan2019sposmxnet,
      title={single-path-one-shot-mxnet},
      author={Kang, Yan},
      howpublished={\url{https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet}},
      year={2019}
}

single-path-one-shot-nas-mxnet's People

Contributors

canyonwind avatar celia-xy avatar yaooxii avatar zhennanqin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

single-path-one-shot-nas-mxnet's Issues

Channel selection is disabled after resuming

https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/blob/e1928e5bbf071ce76ddf7d9774ca11d07d8ab269/train_imagenet.py#L533-L534

It should be:

if epoch >= opt.epoch_start_cs: 
     opt.use_all_channels = False

Because of this, this supernet, which was trained from 0 - 70 and resumed from 70 to the end, was actually mainly trained with Block Selection. Only epochs between 60 - 70 are trained with Block Selection + Channel Selection. And during this period, the validation accuracy is found dropping to 1/1000. The same phenomenon was found in #4 (comment)

On the contrary, this Block Selection only supernet works well with the random/genetic search which was exploring Blocks as well as Channels (even though the channels were not randomly selected and trained as what the original paper claims). This may raise the possibility that randomly sampling (decoupling) these Channels may not be strongly related to a representative supernet.

Further experiments are required to make a more concrete conclusion.

supernet training with resource constrain

Thanks for your excellent work!

the supernet trained with this strolling evolution method could be more trustworthy when using it to sample the subnet's performance.

Have you compared the results of searched acrhitecture between strolling evolution and random?

Official Review

Hello! Thanks so much for your submission!

When we try to run your model checkpoints we're getting a segmentation fault. We're using the mxnet/python:1.5.0_gpu_cu101_mkl_py3 docker image and running on a V100. We downloaded the dataset from your link. Could you confirm what environment you're evaluating in?

Thanks!
Trevor

the loss of supernet can't converge

Hi,thanks for your excellent work!
I am preparing to reappear your work,but when trainning supernet, the loss can't converge, and val top-1 error is don't decline. my trainning scripts is
python train_imagenet.py \ --rec-train ~/facedata.mxnet.hot/rec2/train.rec --rec-train-idx ~/facedata.mxnet.hot/rec2/train.idx \ --rec-val ~/facedata.mxnet.hot/rec2/val.rec --rec-val-idx ~/facedata.mxnet.hot/rec2/val.idx \ --mode imperative --lr 0.65 --wd 0.00004 --lr-mode cosine --dtype float16\ --num-epochs 120 --batch-size 64 --num-gpus 1 -j 16 \ --label-smoothing --no-wd --warmup-epochs 5 --use-rec \ --model ShuffleNas \ --epoch-start-cs 60 --cs-warm-up --use-se --last-conv-after-pooling --channels-layout OneShot \ --save-dir params_shufflenas_supernet+ --logging-file ./logs/shufflenas_supernet+.log \ --train-upper-constraints flops-330-params-5.0 --train-bottom-constraints flops-190-params-2.8 \ --train-constraint-method evolution

and when run test ,it will report a error, i change select_all_channels=True in line 435 and 440 of train_imagenet.py

Context switching causes multi GPU idling

x = F.broadcast_mul(x, block_channel_mask.as_in_context(x.context))

running_mean = F.add(F.multiply(self.running_mean.data(), self.momentum.as_in_context(x.context)),
F.multiply(mean, self.momentum_rest.as_in_context(x.context)))
running_var = F.add(F.multiply(self.running_var.data(), self.momentum.as_in_context(x.context)),
F.multiply(var, self.momentum_rest.as_in_context(x.context)))

Supernet model weights

Thanks a lot for the work
I am preparing to reproduce your experimental results
Can you release the model weights of supernet + and supernet + S?
thank you very much

Supernet Training with Constraints

Thanks for your excellent work!
When i train supernet with constraints with follow script, i meet error in the val.

export MXNET_SAFE_ACCUMULATION=1

python train_imagenet.py
--rec-train /data3/wangzhaoming/mxnet_imagenet/rec/train.rec --rec-train-idx /data3/wangzhaoming/mxnet_imagenet/rec/train.idx
--rec-val /data3/wangzhaoming/mxnet_imagenet/rec/val.rec --rec-val-idx /data3/wangzhaoming/mxnet_imagenet/rec/val.idx
--mode imperative --lr 1.3 --wd 0.00004 --lr-mode cosine --dtype float16
--num-epochs 120 --batch-size 128 --num-gpus 8 -j 48
--label-smoothing --no-wd --warmup-epochs 5 --use-rec
--model ShuffleNas
--epoch-start-cs 60 --cs-warm-up --channels-layout OneShot
--save-dir params_shufflenas_supernet --logging-file ./logs/shufflenas_supernet.log
--train-upper-constraints flops-160-params-2.5 --train-bottom-constraints flops-90-params-1.4
--train-constraint-method evolution

Epoch[0] Batch [49] Speed: 322.095226 samples/sec accuracy=0.000605 lr=0.010393
Epoch[0] Batch [99] Speed: 492.513575 samples/sec accuracy=0.000791 lr=0.020787
Epoch[0] Batch [149] Speed: 457.981573 samples/sec accuracy=0.000937 lr=0.031180
Epoch[0] Batch [199] Speed: 688.650089 samples/sec accuracy=0.000903 lr=0.041573
Epoch[0] Batch [249] Speed: 465.918790 samples/sec accuracy=0.000957 lr=0.051967
Epoch[0] Batch [299] Speed: 490.846376 samples/sec accuracy=0.000957 lr=0.062360
Epoch[0] Batch [349] Speed: 606.910845 samples/sec accuracy=0.000977 lr=0.072753
Epoch[0] Batch [399] Speed: 567.445527 samples/sec accuracy=0.000986 lr=0.083147
Epoch[0] Batch [449] Speed: 618.184875 samples/sec accuracy=0.000990 lr=0.093540
Epoch[0] Batch [499] Speed: 593.677446 samples/sec accuracy=0.000982 lr=0.103933
Epoch[0] Batch [549] Speed: 631.991306 samples/sec accuracy=0.000978 lr=0.114327
Epoch[0] Batch [599] Speed: 614.757373 samples/sec accuracy=0.000985 lr=0.124720
Epoch[0] Batch [649] Speed: 568.749700 samples/sec accuracy=0.000975 lr=0.135114
Epoch[0] Batch [699] Speed: 610.768222 samples/sec accuracy=0.000961 lr=0.145507
Epoch[0] Batch [749] Speed: 659.102106 samples/sec accuracy=0.000961 lr=0.155900
Epoch[0] Batch [799] Speed: 563.044769 samples/sec accuracy=0.000964 lr=0.166294
Epoch[0] Batch [849] Speed: 572.482835 samples/sec accuracy=0.000959 lr=0.176687
Epoch[0] Batch [899] Speed: 611.510812 samples/sec accuracy=0.000969 lr=0.187080
Epoch[0] Batch [949] Speed: 585.310555 samples/sec accuracy=0.000970 lr=0.197474
Epoch[0] Batch [999] Speed: 586.269362 samples/sec accuracy=0.000970 lr=0.207867
Epoch[0] Batch [1049] Speed: 584.871140 samples/sec accuracy=0.000973 lr=0.218260
Epoch[0] Batch [1099] Speed: 580.345403 samples/sec accuracy=0.000976 lr=0.228654
Epoch[0] Batch [1149] Speed: 604.746532 samples/sec accuracy=0.000979 lr=0.239047
Epoch[0] Batch [1199] Speed: 425.625182 samples/sec accuracy=0.000976 lr=0.249440
Epoch[0] Batch [1249] Speed: 673.577257 samples/sec accuracy=0.000977 lr=0.259834
Traceback (most recent call last):
File "train_imagenet.py", line 738, in
main()
File "train_imagenet.py", line 734, in main
train(context)
File "train_imagenet.py", line 710, in train
err_top1_val, err_top5_val = test(ctx, val_data, epoch)
File "train_imagenet.py", line 439, in test
ignore_first_two_cs=opt.ignore_first_two_cs)
File "/data3/wangzhaoming/Single-Path-One-Shot-NAS-MXNet/oneshot_nas_network.py", line 248, in random_channel_mask
channel_choice = random.randint(channel_scale_start, len(self.candidate_scales) - 1)
File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 222, in randint
return self.randrange(a, b+1)
File "/data3/wangzhaoming/anconda3/lib/python3.7/random.py", line 200, in randrange
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (68,10, -58)

evolution search

Hi!
When running evolution search, I encountered the following problems:
1.there did not pass the parameters into Evolver, causing the latter search to run on the cpu, which should be evolver = Evolver(net, train_data, val_data, batch_fn, param_dict, num_gpus=num_gpus). In addition, there seems to be redundant.
2. In order to quickly look at the effects of the evolutionary algorithm, I set the number of count to 20, but the following error occurred:

Traceback (most recent call last):
  File "search_supernet.py", line 527, in <module>
    main(num_gpus=1, batch_size=128, search_mode='genetic', dtype='float16', comparison_model='SinglePathOneShot')
  File "search_supernet.py", line 521, in main
    genetic_search(net, num_gpus=len(context), batch_size=batch_size, logger=logger, ctx=context)
  File "search_supernet.py", line 467, in genetic_search
    population, local_topk = evolver.evolve(population)
  File "search_supernet.py", line 300, in evolve
    selected2 = [(self.fitness_2nd_stage(person['block'], person['channel']), person) for person in parents]
  File "search_supernet.py", line 300, in <listcomp>
    selected2 = [(self.fitness_2nd_stage(person['block'], person['channel']), person) for person in parents]
  File "search_supernet.py", line 225, in fitness_2nd_stage
    batch_size=self.batch_size, update_bn_images=self.update_bn_images)
  File "search_supernet.py", line 113, in update_bn
    data, _ = batch_fn(batch, ctx)
TypeError: 'list' object is not callable

Bug in random_channel_mask

if i set epoch_start_cs=30 and when epoch=0, and epoch_after_cs will be -30.

epoch_delay_early = {0: 0,  # 8
                             1: 1, 2: 1,  # 7
                             3: 2, 4: 2, 5: 2,  # 6
                             6: 3, 7: 3, 8: 3, 9: 3,  # 5
                             10: 4, 11: 4, 12: 4, 13: 4, 14: 4,
                             15: 5, 16: 5, 17: 5, 18: 5, 19: 5, 20: 5,
                             21: 6, 22: 6, 23: 6, 24: 6, 25: 6, 27: 6, 28: 6,
                             29: 6, 30: 6, 31: 6, 32: 6, 33: 6, 34: 6, 35: 6, 36: 7,
                           }
        epoch_delay_late = {0: 0,
                            1: 1,
                            2: 2,
                            3: 3,
                            4: 4, 5: 4,  # warm up epoch: 2 [1.0, 1.2, ... 1.8, 2.0]
                            6: 5, 7: 5, 8: 5,  # warm up epoch: 3 ...
                            9: 6, 10: 6, 11: 6, 12: 6,  # warm up epoch: 4 ...
                            13: 7, 14: 7, 15: 7, 16: 7, 17: 7,  # warm up epoch: 5 [0.4, 0.6, ... 1.8, 2.0]
                            18: 8, 19: 8, 20: 8, 21: 8, 22: 8, 23: 8}  # warm up epoch: 6, after 17, use all scales

        if 0 <= epoch_after_cs <= 23 and self.stage_out_channels[0] >= 64:
            delayed_epoch_after_cs = epoch_delay_late[epoch_after_cs]
        elif 0 <= epoch_after_cs <= 36 and self.stage_out_channels[0] < 64:
            delayed_epoch_after_cs = epoch_delay_early[epoch_after_cs]
        else:
            delayed_epoch_after_cs = epoch_after_cs # delayed_epoch_after_cs = -30

                        channel_scale_start = max(2, 10 - (-30) - 2) # channel_scale_start=38
                        channel_choice = random.randint(channel_scale_start, len(self.candidate_scales) - 1) #random.randint(38, 9) will be error ValueError: empty range for randrange() (38,10, -28)
                        

And why epoch_delay_early doesn`t have the key 26

supernet training

Training an updated version of the supernet, resulting in the following error:

File "train_imagenet.py", line 512, in <module>
    main()
  File "train_imagenet.py", line 508, in main
    train(context)
  File "train_imagenet.py", line 393, in train
    trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)
  File "/usr/local/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/gluon/trainer.py", line 100, in __init__
    self._contexts = self._check_contexts()
  File "/usr/local/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/gluon/trainer.py", line 113, in _check_contexts
    ctx = param.list_ctx()
  File "/usr/local/lib/python3.6/site-packages/mxnet-1.5.0-py3.6.egg/mxnet/gluon/parameter.py", line 539, in list_ctx
    raise RuntimeError("Parameter '%s' has not been initialized"%self.name)
RuntimeError: Parameter 'shufflenasoneshot0_features_fc_weight' has not been initialized

the loss of supernet doesn't seem to converge

Thanks for your amazing work!
I run sh ./train_supernet.sh, but loss seems cannot converge.

Namespace(batch_norm=False, batch_size=128, crop_ratio=0.875, data_dir='~/.mxnet/datasets/imagenet', dtype='float16', hard_weight=0.5, input_size=224, label_smoothing=True, last_gamma=False, log_interval=50, logging_file='shufflenas_supernet.log', lr=0.5, lr_decay=0.1, lr_decay_epoch='40,60', lr_decay_period=0, lr_mode='cosine', mixup=False, mixup_alpha=0.2, mixup_off_epoch=0, mode='imperative', model='ShuffleNas', momentum=0.9, no_wd=True, num_epochs=120, num_gpus=4, num_workers=8, rec_train='/ImageNet/ILSVRC2012_img_train_rec/_train.rec', rec_train_idx='/ImageNet/ILSVRC2012_img_train_rec/_train.idx', rec_val='/ImageNet/ILSVRC2012_img_val_rec/_val.rec', rec_val_idx='/ImageNet/ILSVRC2012_img_val_rec/_val.idx', resume_epoch=0, resume_params='', resume_states='', save_dir='params_shufflenas_supernet', save_frequency=10, teacher=None, temperature=20, use_gn=False, use_pretrained=False, use_rec=True, use_se=False, warmup_epochs=10, warmup_lr=0.0, wd=4e-05) [13:25:44] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: /ImageNet/ILSVRC2012_img_train_rec/_train.rec, use 8 threads for decoding.. [13:25:51] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: /ImageNet/ILSVRC2012_img_val_rec/_val.rec, use 8 threads for decoding.. Epoch[0] Batch [49] Speed: 535.322791 samples/sec accuracy=0.001016 lr=0.000999 Epoch[0] Batch [99] Speed: 1081.946811 samples/sec accuracy=0.001133 lr=0.001998 Epoch[0] Batch [149] Speed: 954.276519 samples/sec accuracy=0.001055 lr=0.002998 Epoch[0] Batch [199] Speed: 838.546627 samples/sec accuracy=0.001055 lr=0.003997 Epoch[0] Batch [249] Speed: 871.972491 samples/sec accuracy=0.001133 lr=0.004996 Epoch[0] Batch [299] Speed: 887.530725 samples/sec accuracy=0.001152 lr=0.005995 Epoch[0] Batch [349] Speed: 843.583828 samples/sec accuracy=0.001122 lr=0.006995 Epoch[0] Batch [399] Speed: 849.005440 samples/sec accuracy=0.001108 lr=0.007994 Epoch[0] Batch [449] Speed: 893.221403 samples/sec accuracy=0.001111 lr=0.008993 Epoch[0] Batch [499] Speed: 852.374898 samples/sec accuracy=0.001070 lr=0.009992 Epoch[0] Batch [549] Speed: 887.384738 samples/sec accuracy=0.001048 lr=0.010992 Epoch[0] Batch [599] Speed: 847.726991 samples/sec accuracy=0.001025 lr=0.011991 Epoch[0] Batch [649] Speed: 872.797215 samples/sec accuracy=0.001043 lr=0.012990 Epoch[0] Batch [699] Speed: 876.450059 samples/sec accuracy=0.001024 lr=0.013989 Epoch[0] Batch [749] Speed: 846.336330 samples/sec accuracy=0.001021 lr=0.014989 Epoch[0] Batch [799] Speed: 867.812489 samples/sec accuracy=0.001033 lr=0.015988 Epoch[0] Batch [849] Speed: 873.677731 samples/sec accuracy=0.001023 lr=0.016987 Epoch[0] Batch [899] Speed: 874.266688 samples/sec accuracy=0.001039 lr=0.017986 Epoch[0] Batch [949] Speed: 850.073132 samples/sec accuracy=0.001044 lr=0.018986 Epoch[0] Batch [999] Speed: 869.275106 samples/sec accuracy=0.001055 lr=0.019985 Epoch[0] Batch [1049] Speed: 896.434749 samples/sec accuracy=0.001038 lr=0.020984 Epoch[0] Batch [1099] Speed: 848.964063 samples/sec accuracy=0.001046 lr=0.021983 Epoch[0] Batch [1149] Speed: 865.988757 samples/sec accuracy=0.001039 lr=0.022983 Epoch[0] Batch [1199] Speed: 885.459825 samples/sec accuracy=0.001047 lr=0.023982 Epoch[0] Batch [1249] Speed: 862.004503 samples/sec accuracy=0.001044 lr=0.024981 Epoch[0] Batch [1299] Speed: 853.095485 samples/sec accuracy=0.001043 lr=0.025980 Epoch[0] Batch [1349] Speed: 871.394519 samples/sec accuracy=0.001039 lr=0.026979 Epoch[0] Batch [1399] Speed: 860.783110 samples/sec accuracy=0.001048 lr=0.027979 Epoch[0] Batch [1449] Speed: 890.852874 samples/sec accuracy=0.001026 lr=0.028978 Epoch[0] Batch [1499] Speed: 860.505098 samples/sec accuracy=0.001012 lr=0.029977 Epoch[0] Batch [1549] Speed: 854.555221 samples/sec accuracy=0.001006 lr=0.030976 Epoch[0] Batch [1599] Speed: 885.776952 samples/sec accuracy=0.001003 lr=0.031976 Epoch[0] Batch [1649] Speed: 879.426519 samples/sec accuracy=0.001006 lr=0.032975 Epoch[0] Batch [1699] Speed: 906.833596 samples/sec accuracy=0.001006 lr=0.033974 Epoch[0] Batch [1749] Speed: 888.854880 samples/sec accuracy=0.001010 lr=0.034973 Epoch[0] Batch [1799] Speed: 927.614338 samples/sec accuracy=0.001007 lr=0.035973 Epoch[0] Batch [1849] Speed: 862.451794 samples/sec accuracy=0.001011 lr=0.036972 Epoch[0] Batch [1899] Speed: 871.394123 samples/sec accuracy=0.001008 lr=0.037971 Epoch[0] Batch [1949] Speed: 877.935443 samples/sec accuracy=0.001009 lr=0.038970 Epoch[0] Batch [1999] Speed: 855.234498 samples/sec accuracy=0.001013 lr=0.039970 Epoch[0] Batch [2049] Speed: 863.621976 samples/sec accuracy=0.001011 lr=0.040969 Epoch[0] Batch [2099] Speed: 879.784556 samples/sec accuracy=0.001004 lr=0.041968 Epoch[0] Batch [2149] Speed: 876.706042 samples/sec accuracy=0.001001 lr=0.042967 Epoch[0] Batch [2199] Speed: 857.379596 samples/sec accuracy=0.000998 lr=0.043967 Epoch[0] Batch [2249] Speed: 870.759792 samples/sec accuracy=0.001001 lr=0.044966 Epoch[0] Batch [2299] Speed: 931.048332 samples/sec accuracy=0.001001 lr=0.045965 Epoch[0] Batch [2349] Speed: 837.125432 samples/sec accuracy=0.001004 lr=0.046964 Epoch[0] Batch [2399] Speed: 890.299059 samples/sec accuracy=0.001003 lr=0.047964 Epoch[0] Batch [2449] Speed: 851.878090 samples/sec accuracy=0.001001 lr=0.048963 Epoch[0] Batch [2499] Speed: 843.473559 samples/sec accuracy=0.001005 lr=0.049962 [Epoch 0] training: accuracy=0.001005 [Epoch 0] speed: 863 samples/sec time cost: 1511.254638 [Epoch 0] validation: err-top1=0.999123 err-top5=0.995117 Epoch[1] Batch [49] Speed: 887.937809 samples/sec accuracy=0.000937 lr=0.051021 Epoch[1] Batch [99] Speed: 867.829884 samples/sec accuracy=0.000879 lr=0.052020 Epoch[1] Batch [149] Speed: 885.684899 samples/sec accuracy=0.000990 lr=0.053020 Epoch[1] Batch [199] Speed: 859.224721 samples/sec accuracy=0.000986 lr=0.054019 Epoch[1] Batch [249] Speed: 931.187842 samples/sec accuracy=0.000930 lr=0.055018 Epoch[1] Batch [299] Speed: 873.437715 samples/sec accuracy=0.001048 lr=0.056017 Epoch[1] Batch [349] Speed: 907.056282 samples/sec accuracy=0.001004 lr=0.057017 Epoch[1] Batch [399] Speed: 883.788834 samples/sec accuracy=0.001001 lr=0.058016 Epoch[1] Batch [449] Speed: 872.911729 samples/sec accuracy=0.001059 lr=0.059015 Epoch[1] Batch [499] Speed: 856.328974 samples/sec accuracy=0.001066 lr=0.060014 Epoch[1] Batch [549] Speed: 880.310697 samples/sec accuracy=0.001044 lr=0.061014 Epoch[1] Batch [599] Speed: 854.885046 samples/sec accuracy=0.001051 lr=0.062013 Epoch[1] Batch [649] Speed: 873.367026 samples/sec accuracy=0.001025 lr=0.063012 Epoch[1] Batch [699] Speed: 888.783690 samples/sec accuracy=0.001032 lr=0.064011 Epoch[1] Batch [749] Speed: 883.911068 samples/sec accuracy=0.001026 lr=0.065011 Epoch[1] Batch [799] Speed: 849.367160 samples/sec accuracy=0.001023 lr=0.066010 Epoch[1] Batch [849] Speed: 893.493904 samples/sec accuracy=0.001032 lr=0.067009 Epoch[1] Batch [899] Speed: 869.416983 samples/sec accuracy=0.001031 lr=0.068008 Epoch[1] Batch [949] Speed: 889.061814 samples/sec accuracy=0.001028 lr=0.069008 Epoch[1] Batch [999] Speed: 886.601807 samples/sec accuracy=0.001016 lr=0.070007 Epoch[1] Batch [1049] Speed: 861.480139 samples/sec accuracy=0.001017 lr=0.071006 Epoch[1] Batch [1099] Speed: 847.957797 samples/sec accuracy=0.001019 lr=0.072005 Epoch[1] Batch [1149] Speed: 871.888190 samples/sec accuracy=0.001009 lr=0.073005 Epoch[1] Batch [1199] Speed: 854.231719 samples/sec accuracy=0.001011 lr=0.074004 Epoch[1] Batch [1249] Speed: 860.546974 samples/sec accuracy=0.001020 lr=0.075003 Epoch[1] Batch [1299] Speed: 892.008358 samples/sec accuracy=0.001022 lr=0.076002 Epoch[1] Batch [1349] Speed: 860.516429 samples/sec accuracy=0.001021 lr=0.077001 Epoch[1] Batch [1399] Speed: 870.058043 samples/sec accuracy=0.001014 lr=0.078001 Epoch[1] Batch [1449] Speed: 874.015535 samples/sec accuracy=0.001009 lr=0.079000 Epoch[1] Batch [1499] Speed: 877.646925 samples/sec accuracy=0.001009 lr=0.079999 Epoch[1] Batch [1549] Speed: 881.612005 samples/sec accuracy=0.001014 lr=0.080998 Epoch[1] Batch [1599] Speed: 923.026051 samples/sec accuracy=0.001017 lr=0.081998 Epoch[1] Batch [1649] Speed: 922.794219 samples/sec accuracy=0.001016 lr=0.082997 Epoch[1] Batch [1699] Speed: 840.872535 samples/sec accuracy=0.001014 lr=0.083996 Epoch[1] Batch [1749] Speed: 882.178377 samples/sec accuracy=0.001013 lr=0.084995 Epoch[1] Batch [1799] Speed: 851.254716 samples/sec accuracy=0.001011 lr=0.085995 Epoch[1] Batch [1849] Speed: 854.998162 samples/sec accuracy=0.001005 lr=0.086994 Epoch[1] Batch [1899] Speed: 869.947032 samples/sec accuracy=0.001006 lr=0.087993 Epoch[1] Batch [1949] Speed: 878.984794 samples/sec accuracy=0.001003 lr=0.088992 Epoch[1] Batch [1999] Speed: 894.164795 samples/sec accuracy=0.001009 lr=0.089992 Epoch[1] Batch [2049] Speed: 858.502574 samples/sec accuracy=0.000998 lr=0.090991 Epoch[1] Batch [2099] Speed: 873.440842 samples/sec accuracy=0.000996 lr=0.091990 Epoch[1] Batch [2149] Speed: 867.112840 samples/sec accuracy=0.000997 lr=0.092989 Epoch[1] Batch [2199] Speed: 862.130601 samples/sec accuracy=0.000995 lr=0.093989 Epoch[1] Batch [2249] Speed: 876.746889 samples/sec accuracy=0.000990 lr=0.094988 Epoch[1] Batch [2299] Speed: 922.624034 samples/sec accuracy=0.000988 lr=0.095987 Epoch[1] Batch [2349] Speed: 922.009291 samples/sec accuracy=0.000986 lr=0.096986 Epoch[1] Batch [2399] Speed: 849.558231 samples/sec accuracy=0.000989 lr=0.097986 Epoch[1] Batch [2449] Speed: 876.022224 samples/sec accuracy=0.000996 lr=0.098985 Epoch[1] Batch [2499] Speed: 871.302722 samples/sec accuracy=0.000991 lr=0.099984 [Epoch 1] training: accuracy=0.000991 [Epoch 1] speed: 876 samples/sec time cost: 1490.117548 [Epoch 1] validation: err-top1=0.998864 err-top5=0.994858 Epoch[2] Batch [49] Speed: 875.446324 samples/sec accuracy=0.001289 lr=0.101023 Epoch[2] Batch [99] Speed: 880.258945 samples/sec accuracy=0.001270 lr=0.102022 Epoch[2] Batch [149] Speed: 890.495640 samples/sec accuracy=0.001185 lr=0.103022 Epoch[2] Batch [199] Speed: 886.474817 samples/sec accuracy=0.001182 lr=0.104021 Epoch[2] Batch [249] Speed: 844.365761 samples/sec accuracy=0.001219 lr=0.105020 Epoch[2] Batch [299] Speed: 860.068421 samples/sec accuracy=0.001198 lr=0.106019 Epoch[2] Batch [349] Speed: 886.609999 samples/sec accuracy=0.001166 lr=0.107019 Epoch[2] Batch [399] Speed: 856.511474 samples/sec accuracy=0.001138 lr=0.108018 Epoch[2] Batch [449] Speed: 889.596902 samples/sec accuracy=0.001159 lr=0.109017 Epoch[2] Batch [499] Speed: 857.013210 samples/sec accuracy=0.001133 lr=0.110016 Epoch[2] Batch [549] Speed: 846.386632 samples/sec accuracy=0.001108 lr=0.111016 Epoch[2] Batch [599] Speed: 867.927046 samples/sec accuracy=0.001117 lr=0.112015 Epoch[2] Batch [649] Speed: 873.530772 samples/sec accuracy=0.001106 lr=0.113014 Epoch[2] Batch [699] Speed: 899.242059 samples/sec accuracy=0.001113 lr=0.114013 Epoch[2] Batch [749] Speed: 897.646130 samples/sec accuracy=0.001102 lr=0.115013 Epoch[2] Batch [799] Speed: 863.833289 samples/sec accuracy=0.001079 lr=0.116012 Epoch[2] Batch [849] Speed: 850.293461 samples/sec accuracy=0.001071 lr=0.117011 Epoch[2] Batch [899] Speed: 877.293435 samples/sec accuracy=0.001059 lr=0.118010 Epoch[2] Batch [949] Speed: 853.094725 samples/sec accuracy=0.001055 lr=0.119010 Epoch[2] Batch [999] Speed: 877.785427 samples/sec accuracy=0.001043 lr=0.120009 Epoch[2] Batch [1049] Speed: 865.957000 samples/sec accuracy=0.001047 lr=0.121008 Epoch[2] Batch [1099] Speed: 881.378680 samples/sec accuracy=0.001060 lr=0.122007 Epoch[2] Batch [1149] Speed: 869.027128 samples/sec accuracy=0.001060 lr=0.123007 Epoch[2] Batch [1199] Speed: 885.561954 samples/sec accuracy=0.001048 lr=0.124006 Epoch[2] Batch [1249] Speed: 850.997669 samples/sec accuracy=0.001055 lr=0.125005 Epoch[2] Batch [1299] Speed: 901.019216 samples/sec accuracy=0.001050 lr=0.126004 Epoch[2] Batch [1349] Speed: 861.174421 samples/sec accuracy=0.001053 lr=0.127003 Epoch[2] Batch [1399] Speed: 866.571390 samples/sec accuracy=0.001059 lr=0.128003 Epoch[2] Batch [1449] Speed: 882.264165 samples/sec accuracy=0.001057 lr=0.129002 Epoch[2] Batch [1499] Speed: 826.516384 samples/sec accuracy=0.001046 lr=0.130001 Epoch[2] Batch [1549] Speed: 879.534336 samples/sec accuracy=0.001045 lr=0.131000 Epoch[2] Batch [1599] Speed: 877.408366 samples/sec accuracy=0.001038 lr=0.132000 Epoch[2] Batch [1649] Speed: 880.151845 samples/sec accuracy=0.001036 lr=0.132999 Epoch[2] Batch [1699] Speed: 869.908042 samples/sec accuracy=0.001040 lr=0.133998 Epoch[2] Batch [1749] Speed: 867.162827 samples/sec accuracy=0.001044 lr=0.134997 Epoch[2] Batch [1799] Speed: 885.801702 samples/sec accuracy=0.001041 lr=0.135997 Epoch[2] Batch [1849] Speed: 867.087289 samples/sec accuracy=0.001034 lr=0.136996 Epoch[2] Batch [1899] Speed: 890.086716 samples/sec accuracy=0.001022 lr=0.137995 Epoch[2] Batch [1949] Speed: 873.000224 samples/sec accuracy=0.001017 lr=0.138994 Epoch[2] Batch [1999] Speed: 853.781716 samples/sec accuracy=0.001013 lr=0.139994 Epoch[2] Batch [2049] Speed: 856.835816 samples/sec accuracy=0.001009 lr=0.140993 Epoch[2] Batch [2099] Speed: 860.044303 samples/sec accuracy=0.001009 lr=0.141992 Epoch[2] Batch [2149] Speed: 876.481051 samples/sec accuracy=0.001009 lr=0.142991 Epoch[2] Batch [2199] Speed: 892.382938 samples/sec accuracy=0.001002 lr=0.143991 Epoch[2] Batch [2249] Speed: 855.905974 samples/sec accuracy=0.001003 lr=0.144990 Epoch[2] Batch [2299] Speed: 889.843663 samples/sec accuracy=0.001003 lr=0.145989 Epoch[2] Batch [2349] Speed: 863.953235 samples/sec accuracy=0.000997 lr=0.146988 Epoch[2] Batch [2399] Speed: 870.380783 samples/sec accuracy=0.000997 lr=0.147988 Epoch[2] Batch [2449] Speed: 903.148794 samples/sec accuracy=0.001000 lr=0.148987 Epoch[2] Batch [2499] Speed: 916.964349 samples/sec accuracy=0.000999 lr=0.149986 [Epoch 2] training: accuracy=0.000998 [Epoch 2] speed: 872 samples/sec time cost: 1495.993527 [Epoch 2] validation: err-top1=0.998953 err-top5=0.994644 Epoch[3] Batch [49] Speed: 929.914738 samples/sec accuracy=0.000937 lr=0.151025 Epoch[3] Batch [99] Speed: 906.610637 samples/sec accuracy=0.000937 lr=0.152024 Epoch[3] Batch [149] Speed: 919.249578 samples/sec accuracy=0.001042 lr=0.153024 Epoch[3] Batch [199] Speed: 879.526642 samples/sec accuracy=0.001104 lr=0.154023 Epoch[3] Batch [249] Speed: 854.386749 samples/sec accuracy=0.001047 lr=0.155022 Epoch[3] Batch [299] Speed: 924.763784 samples/sec accuracy=0.001100 lr=0.156021 Epoch[3] Batch [349] Speed: 873.171714 samples/sec accuracy=0.001027 lr=0.157021 Epoch[3] Batch [399] Speed: 895.651351 samples/sec accuracy=0.001050 lr=0.158020 Epoch[3] Batch [449] Speed: 900.585902 samples/sec accuracy=0.001042 lr=0.159019 Epoch[3] Batch [499] Speed: 860.875526 samples/sec accuracy=0.001027 lr=0.160018 Epoch[3] Batch [549] Speed: 887.030089 samples/sec accuracy=0.001023 lr=0.161018 Epoch[3] Batch [599] Speed: 843.265624 samples/sec accuracy=0.001012 lr=0.162017 Epoch[3] Batch [649] Speed: 860.336493 samples/sec accuracy=0.001004 lr=0.163016 Epoch[3] Batch [699] Speed: 862.636338 samples/sec accuracy=0.000974 lr=0.164015 Epoch[3] Batch [749] Speed: 893.532553 samples/sec accuracy=0.001000 lr=0.165015 Epoch[3] Batch [799] Speed: 851.940280 samples/sec accuracy=0.000999 lr=0.166014 Epoch[3] Batch [849] Speed: 882.564236 samples/sec accuracy=0.000988 lr=0.167013 Epoch[3] Batch [899] Speed: 868.853416 samples/sec accuracy=0.000994 lr=0.168012 Epoch[3] Batch [949] Speed: 874.946556 samples/sec accuracy=0.001003 lr=0.169012 Epoch[3] Batch [999] Speed: 879.573134 samples/sec accuracy=0.000986 lr=0.170011 Epoch[3] Batch [1049] Speed: 897.879537 samples/sec accuracy=0.000962 lr=0.171010 Epoch[3] Batch [1099] Speed: 871.219244 samples/sec accuracy=0.000977 lr=0.172009 Epoch[3] Batch [1149] Speed: 844.503256 samples/sec accuracy=0.000978 lr=0.173009 Epoch[3] Batch [1199] Speed: 864.961451 samples/sec accuracy=0.000981 lr=0.174008 Epoch[3] Batch [1249] Speed: 854.099408 samples/sec accuracy=0.000980 lr=0.175007 Epoch[3] Batch [1299] Speed: 883.432764 samples/sec accuracy=0.000969 lr=0.176006 Epoch[3] Batch [1349] Speed: 869.831730 samples/sec accuracy=0.000969 lr=0.177005 Epoch[3] Batch [1399] Speed: 862.704275 samples/sec accuracy=0.000975 lr=0.178005 Epoch[3] Batch [1449] Speed: 883.263738 samples/sec accuracy=0.000986 lr=0.179004 Epoch[3] Batch [1499] Speed: 854.746531 samples/sec accuracy=0.000975 lr=0.180003 Epoch[3] Batch [1549] Speed: 872.240603 samples/sec accuracy=0.000964 lr=0.181002 Epoch[3] Batch [1599] Speed: 894.921970 samples/sec accuracy=0.000973 lr=0.182002 Epoch[3] Batch [1649] Speed: 882.197715 samples/sec accuracy=0.000975 lr=0.183001 Epoch[3] Batch [1699] Speed: 852.241153 samples/sec accuracy=0.000977 lr=0.184000 Epoch[3] Batch [1749] Speed: 846.303891 samples/sec accuracy=0.000977 lr=0.184999 Epoch[3] Batch [1799] Speed: 869.288118 samples/sec accuracy=0.000967 lr=0.185999 Epoch[3] Batch [1849] Speed: 878.585063 samples/sec accuracy=0.000958 lr=0.186998 Epoch[3] Batch [1899] Speed: 912.308064 samples/sec accuracy=0.000959 lr=0.187997 Epoch[3] Batch [1949] Speed: 880.248265 samples/sec accuracy=0.000957 lr=0.188996 Epoch[3] Batch [1999] Speed: 869.212224 samples/sec accuracy=0.000963 lr=0.189996 Epoch[3] Batch [2049] Speed: 895.356755 samples/sec accuracy=0.000976 lr=0.190995 Epoch[3] Batch [2099] Speed: 906.813738 samples/sec accuracy=0.000965 lr=0.191994 Epoch[3] Batch [2149] Speed: 878.990076 samples/sec accuracy=0.000966 lr=0.192993

Export ONNX model

I am trying to export MNXet Model to ONNX by doing the following at the end of the script, Single-Path-One-Shot-NAS-MXNet/oneshot_nas_network.py

sym_file_name = "./symbols/ShuffleNas_fixArch-symbol.json"
param_file_name = './symbols/ShuffleNas_fixArch-0000.params'
onnx_file_name = "./supernet_random.onnx"
input_shape = (1,3,224,224)
converted_model_path = onnx_mxnet.export_model(sym_file_name, param_file_name, [input_shape], np.float32, onnx_file_name, verbose=True)


I am getting the following error,
AttributeError: ('Reshape: Shape value not supported in ONNX', -4)

output_shape_list is [0, -4, 2, -1, -2]. And MxNet/ONNX does not support the following values.
not_supported_shape = [-2, -3, -4]

What am I doing wrong? Or ONNX export is not supported for this model?

Supernet training is too slow

Thanks for your implementation of SPOS by MXNET^_^. But I found the supernet training was too slow when I trained my own network. I profiled the training procedure and found some problems as follows.

At first, the imperative mode is slower than hybrid mode so much. Then I tried to use more GPUs to train, however, get no acceleration. Instead, the GPU utility decreased dramatically when GPU numbers increase. I guess the calculation in different GPUs is serial but not parallel in imperative mode. Have you ever encountered these problems above?

Furthermore, anything can be improved to accelerate the training? Could we set the mode to be imperative when sampling subnet, then change the mode to be hybrid when training subnet?

Waiting for your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.