khanrc / pt.darts Goto Github PK

View Code? Open in Web Editor NEW

438.0 438.0 108.0 11.63 MB

PyTorch Implementation of DARTS: Differentiable Architecture Search

License: MIT License

Python 99.75% Dockerfile 0.25%

pt.darts's People

Contributors

Stargazers

Watchers

Forkers

leehongpyo shiyongde laubeke nanyangye hiyoung-asr baiyancheng20 tianmingchen greenfigo2015 wh-forker thangvubk 0xcreo shubhampachori12110095 abcp4 zhengxiawu gentaiscool shichaosuper creeperlin we0091234 ddghost easyfan327 zhengyu-yang peterouzh december-boy zhulishun beckgom ultmaster ignatiuszy ldd91 chunhuizng zhyhy elischwartz ilyatrofimov wang-shihao dzwallkilled arui1 samirmoustafa asderfreedom seanlin2000 youngbaby123 kairos03 shield-coder 1061136002 hyqyoung robot-ai-machinelearning clayygou seanhtchoi water2bear ristoranterist sytelus xlfzjmm longcw marvis beyondliangcai azuredsky chomd90 kim-sunghoon frizy-up blair-johnson lliai bahleg yddd2333 dxc33linger liang-yc dev233 andreyvelich liuli9412 zhoubinlong wenruiliao zarkpx andrew-zhu taoari udonda ahmadmobeen chnxindong mattpoyser nhamlv-55 mldl tata17 liugt123 sjoshi804 zyg11 yuyijie1995 ddamddi ming-er smilingjimmy rainwangphy moctordaster buttercutter guochengqian swiftsss chester-w-xie suyash-singh5 tanishqj2005 mttsky xiaojiuwo168 ajchler ishita-2097 jizongfox hariyanobuki hajekel

pt.darts's Issues

low test accuracy in FashionMNIST

Thanks for your work. I use your repo to train FashionMNIST dataset. I use the Genotype yor provided as following:

# FashionMNIST
Genotype(
    normal=[[('max_pool_3x3', 0), ('dil_conv_5x5', 1)], [('max_pool_3x3', 0), ('sep_conv_3x3', 1)], [('sep_conv_5x5', 1), ('sep_conv_3x3', 3)], [('sep_conv_5x5', 4), ('dil_conv_5x5', 3)]],
    normal_concat=range(2, 6),
    reduce=[[('sep_conv_3x3', 1), ('avg_pool_3x3', 0)], [('avg_pool_3x3', 0), ('skip_connect', 2)], [('skip_connect', 3), ('avg_pool_3x3', 0)], [('sep_conv_3x3', 2), ('skip_connect', 3)]],
    reduce_concat=range(2, 6)
)

After 300 epochs, it only achieves Final best Prec@1 = 95.9000%, which is much lower than you reported.
Here are my config file and log file. Is anything wrong? Can you provide your FashionMNIST log file?

batch_size issue

I ran python search.py --name cifar10 --dataset cifar10 --batch_size 32 , but I have the following error at utils.py. Why ?

Traceback (most recent call last):
  File "/home/phung/Downloads/darts/pt.darts/search.py", line 201, in <module>
    main()
  File "/home/phung/Downloads/darts/pt.darts/search.py", line 84, in main
    train(train_loader, valid_loader, model, architect, w_optim, alpha_optim, lr, epoch)
  File "/home/phung/Downloads/darts/pt.darts/search.py", line 144, in train
    prec1, prec5 = utils.accuracy(logits, trn_y, topk=(1, 5))
  File "/home/phung/Downloads/darts/pt.darts/utils.py", line 103, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

It seems that there lost a parentheses in the compute_hessian function. -- (p-n) / 2.*eps

compute_hessian function last line :
hessian = [(p-n) / 2.*eps for p, n in zip(dalpha_pos, dalpha_neg)] should be changed to:
hessian = [(p-n) / (2.*eps) for p, n in zip(dalpha_pos, dalpha_neg)] according to the paper equation 8.

Where do I control the connections between nodes?

If I'm not interested in fully connected networks, which file do I modify to control the connections between nodes?

Missing Softmax for Genotype selection

Hi @khanrc,

first of all thanks for the recent and understandable implementation of DARTS. I really enjoyed using this code.

After some exploration and alternation of the code, I stumbled over the implementation of the parse(alpha, k, reduction=True) function. In there the most heavily weighted operations are selected. As I wondered why the "none" operation is omitted, I compared the code to the original implementation. The function there does not compare the original weights but rather the softmaxed ones.
https://github.com/quark0/darts/blob/f276dd346a09ae3160f8e3aca5c7b193fda1da37/cnn/model_search.py#L154

Which seems to be quite crucial, since then there is more focus on the "none" operation. (For further explanation see this comment in the openreview of the paper).

Maybe you need to calculate the line wise softmax of the alpha values before calling the function or compute it in the function before searching the topK. Please correct me if I'm wrong.

Thanks in Advance
Lucas

featuremap size reduce in op.py FactorizedReduce function

Hi all,
Thank you very much for this repo. Here I have a question. In op.py line 178 FactorizedReduce function, why conv2 layer need to reduce size on dim=2 and 3? In my case this cause dimension mismatch.
out = torch.cat([self.conv1(x), self.conv2(x[:, :, 1:, 1:])], dim=1)

Search Not Working?

Hi @khanrc, Thanks for releasing the code. I am trying to search a cell using DARTS as mentioned in the README. However, the search is not working. It becomes stuck at the very beginning (training seems to be not working). There is no GPU utilization either. Am I missing something here? Can you please comment on this? Thanks.

validate loss cannot update

loss and prec1 during training is a normal drop, but during validate ,loss is always some value, prec1 also keep zero. do you met this problem？

AttributeError: 'CIFAR10' object has no attribute 'train_data'

Cannot obtain the cell architecture as the reported one.

Training with the default parameters, cannot obtain the cell architecture as the reported one.

Running via Google Colab

Hi.. I got problem while running via google colab..

How to run the Augment

Hi, After running the Search.py, how to run the Augment.py. I mean what should the genotype be?
Thanks for your help.

Why using broadcast for edge weights?

Hi~ Thanks for providing this great implementation！
I'm quite interested in the multi-gpu part which uses replicate for network weights and broadcast for the edge weights. I'm not familiar with parallel programming in pytorch, and I'm curious about the difference between broadcast and replicate, could you explain why we should use broadcast for edge weights?

what does the "auxiliary_head" mean?

I dont know what it meas and there isn't any decrease in any of the evaluation

how to use the visualize.py

can i just use the visualize.py to draw the cell?

Search runs only for 1 iteration

Hi!

I run your code with
python search.py --name cifar10 --dataset cifar10 --batch_size 16 --gpu 2

and progress doesn't go further this line for several hours
09/10 11:26:50 AM | Train: [ 1/50] Step 000/781 Loss 2.312 Prec@(1,5) (15.6%, 56.2%)

Can you please check where could be a problem?

How to train own dataset!

If i use this method to train own dataset, how do i change the code!

Hessian computation issue (seems like there is a bug)

The original implementation of DARTS seems to be dividing the (positive - negative) by (2 * eps): Code

return [(x-y).div_(2*R) for x, y in zip(grads_p, grads_n)]

In the implementation of pt.darts, seems like the braces for (2 * eps) are omitted. Which leads to wrong values of hessian and the final expression is evaluated as eps * (positive - negative) / 2:

pt.darts/architect.py

Line 108 in 48e7137

hessian = [(p-n) / 2.*eps for p, n in zip(dalpha_pos, dalpha_neg)]

Shouldn't the expression be (2 * eps) instead of 2 * eps?

Does anyone meet the problem of memory leak? I trained this code on my own dataset and found that the memory keep rising during training, and I have no idea to solve it.

Why is the search accuracy on the CIFAR-10 dataset only 89% accuracy?

Thank you for providing very good code. I executed search.py on multiple gpus, and the final result was only 89% accurate. Why? Need to continue augment.py to get 97% accuracy? Thank you

Run augment.py and out of memory after one epoch

Hello,
Thanks for your great work!
When I run the augment.py, it is out of memory after the first epoch, is it the problem of the batch size?
The args are as follow:

09/11 03:17:31 AM |
09/11 03:17:31 AM | Parameters:
09/11 03:17:31 AM | AUX_WEIGHT=0.4
09/11 03:17:31 AM | BATCH_SIZE=48
09/11 03:17:31 AM | CUTOUT_LENGTH=16
09/11 03:17:31 AM | DATA_PATH=./data/
09/11 03:17:31 AM | DATASET=cifar10
09/11 03:17:31 AM | DROP_PATH_PROB=0.2
09/11 03:17:31 AM | EPOCHS=600
09/11 03:17:31 AM | GENOTYPE=Genotype(normal=[[('sep_conv_3x3', 1), ('sep_conv_3x3', 0)], [('sep_conv_3x3', 0), ('sep_conv_5x5', 1)],
[('sep_conv_3x3', 1), ('sep_conv_3x3', 2)], [('sep_conv_3x3', 0), ('sep_conv_3x3', 2)]], normal_concat=range(2, 6), reduce=[[('max_p
ool_3x3', 0), ('max_pool_3x3', 1)], [('max_pool_3x3', 0), ('skip_connect', 2)], [('max_pool_3x3', 0), ('skip_connect', 2)], [('max_po
ol_3x3', 0), ('skip_connect', 2)]], reduce_concat=range(2, 6))
09/11 03:17:31 AM | GPUS=[0]
09/11 03:17:31 AM | GRAD_CLIP=5.0
09/11 03:17:31 AM | INIT_CHANNELS=36
09/11 03:17:31 AM | LAYERS=20
09/11 03:17:31 AM | LR=0.025
09/11 03:17:31 AM | MOMENTUM=0.9
09/11 03:17:31 AM | NAME=cifar10
09/11 03:17:31 AM | PATH=augments/cifar10
09/11 03:17:31 AM | PRINT_FREQ=200
09/11 03:17:31 AM | SEED=2
09/11 03:17:31 AM | WEIGHT_DECAY=0.0003
09/11 03:17:31 AM | WORKERS=4
09/11 03:17:31 AM |
09/11 03:17:31 AM | Logger is set - training start

AssertionError: can only join a child process

Hi @khanrc, could you pls have a look, thank you!

####### ALPHA #######
# Alpha - normal
tensor([[0.1218, 0.0975, 0.1138, 0.1351, 0.1548, 0.1235, 0.1320, 0.1215],
        [0.1139, 0.1003, 0.1164, 0.1334, 0.1465, 0.1337, 0.1122, 0.1436]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1214, 0.0977, 0.1129, 0.1615, 0.1369, 0.1158, 0.1270, 0.1268],
        [0.1156, 0.1016, 0.1173, 0.1353, 0.1403, 0.1248, 0.1239, 0.1412],
        [0.1093, 0.0945, 0.1168, 0.1412, 0.1366, 0.1320, 0.1244, 0.1452]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1237, 0.0983, 0.1121, 0.1561, 0.1244, 0.1320, 0.1224, 0.1310],
        [0.1165, 0.1013, 0.1166, 0.1399, 0.1313, 0.1222, 0.1181, 0.1541],
        [0.1115, 0.0964, 0.1188, 0.1337, 0.1248, 0.1320, 0.1303, 0.1525],
        [0.1046, 0.0929, 0.1095, 0.1312, 0.1279, 0.1366, 0.1376, 0.1597]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1203, 0.0975, 0.1106, 0.1398, 0.1291, 0.1238, 0.1283, 0.1506],
        [0.1149, 0.1007, 0.1136, 0.1293, 0.1449, 0.1309, 0.1268, 0.1389],
        [0.1023, 0.0916, 0.1123, 0.1368, 0.1317, 0.1337, 0.1295, 0.1621],
        [0.0976, 0.0884, 0.1021, 0.1404, 0.1391, 0.1330, 0.1297, 0.1698],
        [0.0925, 0.0855, 0.0926, 0.1443, 0.1380, 0.1312, 0.1340, 0.1819]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)

# Alpha - reduce
tensor([[0.1393, 0.1220, 0.1175, 0.1296, 0.1360, 0.1224, 0.1201, 0.1131],
        [0.1237, 0.1151, 0.1221, 0.1237, 0.1376, 0.1152, 0.1246, 0.1379]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1412, 0.1234, 0.1178, 0.1322, 0.1343, 0.1160, 0.1214, 0.1138],
        [0.1200, 0.1119, 0.1241, 0.1250, 0.1303, 0.1242, 0.1288, 0.1358],
        [0.1238, 0.1044, 0.1233, 0.1229, 0.1382, 0.1205, 0.1391, 0.1277]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1366, 0.1207, 0.1217, 0.1282, 0.1400, 0.1193, 0.1187, 0.1148],
        [0.1253, 0.1168, 0.1290, 0.1238, 0.1278, 0.1233, 0.1187, 0.1354],
        [0.1167, 0.1024, 0.1208, 0.1316, 0.1365, 0.1304, 0.1352, 0.1264],
        [0.1172, 0.1006, 0.1191, 0.1271, 0.1343, 0.1302, 0.1401, 0.1314]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1443, 0.1285, 0.1273, 0.1274, 0.1273, 0.1195, 0.1124, 0.1133],
        [0.1248, 0.1172, 0.1232, 0.1251, 0.1308, 0.1228, 0.1199, 0.1360],
        [0.1244, 0.1066, 0.1240, 0.1254, 0.1320, 0.1290, 0.1335, 0.1251],
        [0.1219, 0.1057, 0.1230, 0.1340, 0.1291, 0.1253, 0.1327, 0.1284],
        [0.1173, 0.1051, 0.1216, 0.1347, 0.1382, 0.1252, 0.1289, 0.1289]],
       device='cuda:3', grad_fn=<SoftmaxBackward>)
#####################
04/05 09:25:34 AM | Train: [10/50] Step 000/390 Loss 0.460 Prec@(1,5) (87.5%, 98.4%)
04/05 09:30:35 AM | Train: [10/50] Step 050/390 Loss 0.436 Prec@(1,5) (85.2%, 99.3%)
04/05 09:35:18 AM | Train: [10/50] Step 100/390 Loss 0.445 Prec@(1,5) (84.7%, 99.1%)
04/05 09:40:06 AM | Train: [10/50] Step 150/390 Loss 0.449 Prec@(1,5) (84.4%, 99.2%)
04/05 09:44:48 AM | Train: [10/50] Step 200/390 Loss 0.460 Prec@(1,5) (84.1%, 99.2%)
04/05 09:49:28 AM | Train: [10/50] Step 250/390 Loss 0.464 Prec@(1,5) (83.8%, 99.2%)
04/05 09:54:09 AM | Train: [10/50] Step 300/390 Loss 0.463 Prec@(1,5) (84.0%, 99.3%)
04/05 09:58:48 AM | Train: [10/50] Step 350/390 Loss 0.465 Prec@(1,5) (83.9%, 99.3%)
04/05 10:02:31 AM | Train: [10/50] Step 390/390 Loss 0.469 Prec@(1,5) (83.8%, 99.3%)
04/05 10:02:31 AM | Train: [10/50] Final Prec@1 83.8000%
04/05 10:02:32 AM | Valid: [10/50] Step 000/390 Loss 0.688 Prec@(1,5) (75.0%, 96.9%)
^CTraceback (most recent call last):
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1658, in main
Traceback (most recent call last):
  File "/home/wangtao/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_comm.py", line 365, in _on_run
    r = self.sock.recv(1024)
    globals = debugger.run(setup['file'], None, None, is_module)
KeyboardInterrupt
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/wangtao/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/wangtao/prj/pt.darts/search.py", line 201, in <module>
    main()
  File "/home/wangtao/prj/pt.darts/search.py", line 88, in main
    top1 = validate(valid_loader, model, epoch, cur_step)
  File "/home/wangtao/prj/pt.darts/search.py", line 172, in validate
    for step, (X, y) in enumerate(valid_loader):
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 819, in __iter__
    return _DataLoaderIter(self)
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 560, in __init__
    w.start()
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
  File "/home/wangtao/.pycharm_helpers/pydev/_pydev_bundle/pydev_monkey.py", line 464, in new_fork
    _on_forked_process()
  File "/home/wangtao/.pycharm_helpers/pydev/_pydev_bundle/pydev_monkey.py", line 50, in _on_forked_process
    pydevd.settrace_forked()
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1445, in settrace_forked
    patch_multiprocessing=True,
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1210, in settrace
    patch_multiprocessing,
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1254, in _locked_settrace
    debugger.connect(host, port)  # Note: connect can raise error.
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 328, in connect
    self.initialize_network(s)
  File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 320, in initialize_network
    time.sleep(0.1)  # give threads time to start
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/util.py", line 319, in _exit_function
    p.join()
  File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/process.py", line 122, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process

Questions about DARTS

For DARTS complexity analysis, anyone have any idea how to derive the (k+1)*k/2 expression ? Why 2 input nodes ? How will the calculated value change if graph isomorphism is considered ? Why "2+3+4+5" learnable edges ? If there is lack of connection, the paper should not add 1 which does not actually contribute to learnable edges configurations at all ?
Why need to train the weights for normal cells and reduction cells separately as shown in Figures 4 and 5 below ?
How to arrange the nodes such that the NAS search will actually converge with minimum error ? Note: Not all nodes are connected to each and every other nodes
Why is GDAS 10 times faster than DARTS ?

what's the use of augment.py

evalutate the result of search?

How to install graphviz?

I have install the graphviz but still got module not found error.
Can anyone share your way?

Suggestion for torch.backends.cudnn.benchmark = True in search process

First, thanks for your code! It is much clearer than original code and easy to read.

In your code, you set torch.backends.cudnn.benchmark = True in search.py, but the Pytorch document mentions that if the graph will not be modified, then torch.backends.cudnn.benchmark = True would speed up, otherwise it would reduce the performance. So is it better to set it False? Just a suggestion,hah. Thanks again!

nothing happens after running search.py

Hi, thanks for sharing your code!

could you help me to fix a problem please: I run search.py and got nothing and the information shown in the picture.

meaning of Normal and Reduce?

What's the meaning and purpose of Normal and Rdeuce network?
Thanks for your sharing codes!

Is there anyone who can reproduce the result ?

Hi ,everyone

.

I run the command in Readme.md.
python augment.py --name cifar10 --dataset cifar10 --genotype "Genotype( normal=[[('sep_conv_3x3', 0), ('dil_conv_5x5', 1)], [('skip_connect', 0), ('dil_conv_3x3', 2)], [('sep_conv_3x3', 1), ('skip_connect', 0)], [('sep_conv_3x3', 1), ('skip_connect', 0)]], normal_concat=range(2, 6), reduce=[[('max_pool_3x3', 0), ('max_pool_3x3', 1)], [('max_pool_3x3', 0), ('skip_connect', 2)], [('skip_connect', 3), ('max_pool_3x3', 0)], [('skip_connect', 2), ('max_pool_3x3', 0)]], reduce_concat=range(2, 6))"

.

My enviroment is pytorch 1.2 and device is a single RTX2080Ti , but 600 epoch seems will cost at about 20 hours , it's a little longer to me , is there any method that we can accelerate convergence , such as change the SGD optimizer to Adam ?

Error in deriving the final genotype

In the genotype.py (parse function), you just select the edges and the operations with the original alpha, not the alpha possibility. Sometimes it may derive wrong genotypes. For example, the alpha param for the second node is as follows:
alpha = [[1, 1, 1, 1, 5, 1, 1, 1], [10, 10, 10, 11, 10, 10, 10, 10], [10, 15, 10, 10, 10, 10, 10, 10]]
In your parse function, the final genotype would select the second and third edge. But larger values do not always reflect larger possibilities. Actually, we should select the first and third edge. LOL

Also, as mentioned in another issue, the formula of hessian should be as follows:
hessian = [(p-n) / (2.*eps) for p, n in zip(dalpha_pos, dalpha_neg)]

why darts only halved the image twice in the main part?

Augment

请问，Augment是用来做什么的?

Low CIFAR-100 accuracy

Has anyone run this on CIFAR-100? I modified the code just enough to download and train on it, changing nothing else, and after running augment.py on two GPUs, I got 65% top-1 and 88% top-5 accuracy on the validation set. 65% top-1 is state of the art for 2014. Current SOTA for CIFAR-100 is 91.3 for top-1 accuracy. https://benchmarks.ai/cifar-100

Here's the search command:
python search.py --name cifar100 --dataset cifar100 --gpus all --batch_size 96 --workers 8 --print_freq 10 --w_lr 0.05 --w_lr_min 0.002 --alpha_lr 0.0006

Here's the augment command:
python augment.py --name cifar100_1 --dataset cifar100 --genotype "Genotype(normal=[[('skip_connect', 0), ('skip_connect', 1)], [('skip_connect', 0), ('skip_connect', 1)], [('skip_connect', 0), ('skip_connect', 1)], [('skip_connect', 0), ('skip_connect', 1)]], normal_concat=range(2, 6), reduce=[[('avg_pool_3x3', 0), ('avg_pool_3x3', 1)], [('skip_connect', 2), ('max_pool_3x3', 0)], [('skip_connect', 2), ('avg_pool_3x3', 0)], [('skip_connect', 2), ('avg_pool_3x3', 0)]], reduce_concat=range(2, 6)

Has anyone else tried to train CIFAR-100?

Could you please tell me how do you get the random search baseline

Hi,
In the Table 1 in your paper, there is a random search baseline. Are these arch get by random sample architectures? Could you please tell me how do you implement the random sample in your paper? Thx!

TypeError: init() got an unexpected keyword argument 'log_dir'

Running the code on Google Colab, I got:

Traceback (most recent call last):
  File "search.py", line 19, in <module>
    writer = SummaryWriter(log_dir=os.path.join(config.path, "tb"))
  File "/usr/local/lib/python3.6/dist-packages/tensorboardX/writer.py", line 254, in __init__
    self._get_file_writer()
  File "/usr/local/lib/python3.6/dist-packages/tensorboardX/writer.py", line 310, in _get_file_writer
    self.file_writer = FileWriter(logdir=self.logdir, **self.kwargs)
TypeError: __init__() got an unexpected keyword argument 'log_dir'

Is it proper to use valid_loader in both train and validation?

I found that valid_loader is used in train(train_loader, valid_loader, ...) and validate(valid_loader, ...) .
The gradient of model params (indicated as $\alpha$ in original paper) are calculated with valid_loader in train func, so validation with valid_loader is not fair, I think.

How to train on imagenet?THX.

questions on v_grads = torch.autograd.grad(loss,v_alphas+v_weights)

in architect.py, Im confused about the following 3 lines of code:
v_grads = torch.autograd.grad(loss, v_alphas + v_weights)
dalpha = v_grads[:len(v_alphas)]
dw = v_grads[len(v_alphas):]
why does the gradient compute w.r.t (v_alphas+v_weights)? and the dalpha is retrieved from v_grads[:len(v_alphas)]. I thought it should be computed w.r.t v_alphas only based on equation (7).
the other question is why can you get dalpha and dw from v_grads directly instead of doing autograd separately?

result not same as REAME.md

python 3.5.6
pytorch 0.4.1
torchvision 0.2.1

Run
python search.py --name cifar10 --dataset cifar10

but my result graph is not same as README.md, such as epoch 24

How to use multi-gpu

Hi, thanks for the nice implementation. I am trying to modify the codes to support multi-gpu but it didn't work out. I don't know how to parallel the Architect. Do you have any suggestions or are you going to add the multi-gpu feature？ Thanks for your help.

hello, Thank you for your code. I want to ask about SearchCNN,  60      C_cur_out = C_cur * n_nodes, what does this mean?