khanrc / pt.darts Goto Github PK
View Code? Open in Web Editor NEWPyTorch Implementation of DARTS: Differentiable Architecture Search
License: MIT License
PyTorch Implementation of DARTS: Differentiable Architecture Search
License: MIT License
Thanks for your work. I use your repo to train FashionMNIST dataset. I use the Genotype
yor provided as following:
# FashionMNIST
Genotype(
normal=[[('max_pool_3x3', 0), ('dil_conv_5x5', 1)], [('max_pool_3x3', 0), ('sep_conv_3x3', 1)], [('sep_conv_5x5', 1), ('sep_conv_3x3', 3)], [('sep_conv_5x5', 4), ('dil_conv_5x5', 3)]],
normal_concat=range(2, 6),
reduce=[[('sep_conv_3x3', 1), ('avg_pool_3x3', 0)], [('avg_pool_3x3', 0), ('skip_connect', 2)], [('skip_connect', 3), ('avg_pool_3x3', 0)], [('sep_conv_3x3', 2), ('skip_connect', 3)]],
reduce_concat=range(2, 6)
)
After 300 epochs, it only achieves Final best Prec@1 = 95.9000%
, which is much lower than you reported.
Here are my config file and log file. Is anything wrong? Can you provide your FashionMNIST log file?
I ran python search.py --name cifar10 --dataset cifar10 --batch_size 32
, but I have the following error at utils.py. Why ?
Traceback (most recent call last):
File "/home/phung/Downloads/darts/pt.darts/search.py", line 201, in <module>
main()
File "/home/phung/Downloads/darts/pt.darts/search.py", line 84, in main
train(train_loader, valid_loader, model, architect, w_optim, alpha_optim, lr, epoch)
File "/home/phung/Downloads/darts/pt.darts/search.py", line 144, in train
prec1, prec5 = utils.accuracy(logits, trn_y, topk=(1, 5))
File "/home/phung/Downloads/darts/pt.darts/utils.py", line 103, in accuracy
correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
compute_hessian function last line :
hessian = [(p-n) / 2.*eps for p, n in zip(dalpha_pos, dalpha_neg)] should be changed to:
hessian = [(p-n) / (2.*eps) for p, n in zip(dalpha_pos, dalpha_neg)] according to the paper equation 8.
If I'm not interested in fully connected networks, which file do I modify to control the connections between nodes?
Hi @khanrc,
first of all thanks for the recent and understandable implementation of DARTS. I really enjoyed using this code.
After some exploration and alternation of the code, I stumbled over the implementation of the parse(alpha, k, reduction=True) function. In there the most heavily weighted operations are selected. As I wondered why the "none" operation is omitted, I compared the code to the original implementation. The function there does not compare the original weights but rather the softmaxed ones.
https://github.com/quark0/darts/blob/f276dd346a09ae3160f8e3aca5c7b193fda1da37/cnn/model_search.py#L154
Which seems to be quite crucial, since then there is more focus on the "none" operation. (For further explanation see this comment in the openreview of the paper).
Maybe you need to calculate the line wise softmax of the alpha values before calling the function or compute it in the function before searching the topK. Please correct me if I'm wrong.
Thanks in Advance
Lucas
Hi all,
Thank you very much for this repo. Here I have a question. In op.py line 178 FactorizedReduce function, why conv2 layer need to reduce size on dim=2 and 3? In my case this cause dimension mismatch.
out = torch.cat([self.conv1(x), self.conv2(x[:, :, 1:, 1:])], dim=1)
Hi @khanrc, Thanks for releasing the code. I am trying to search a cell using DARTS as mentioned in the README. However, the search is not working. It becomes stuck at the very beginning (training seems to be not working). There is no GPU utilization either. Am I missing something here? Can you please comment on this? Thanks.
Training with the default parameters, cannot obtain the cell architecture as the reported one.
Hi, After running the Search.py, how to run the Augment.py. I mean what should the genotype be?
Thanks for your help.
Hi~ Thanks for providing this great implementation!
I'm quite interested in the multi-gpu part which uses replicate for network weights and broadcast for the edge weights. I'm not familiar with parallel programming in pytorch, and I'm curious about the difference between broadcast and replicate, could you explain why we should use broadcast for edge weights?
I dont know what it meas and there isn't any decrease in any of the evaluation
can i just use the visualize.py to draw the cell?
Hi!
I run your code with
python search.py --name cifar10 --dataset cifar10 --batch_size 16 --gpu 2
and progress doesn't go further this line for several hours
09/10 11:26:50 AM | Train: [ 1/50] Step 000/781 Loss 2.312 Prec@(1,5) (15.6%, 56.2%)
Can you please check where could be a problem?
If i use this method to train own dataset, how do i change the code!
The original implementation of DARTS seems to be dividing the (positive - negative)
by (2 * eps)
: Code
return [(x-y).div_(2*R) for x, y in zip(grads_p, grads_n)]
In the implementation of pt.darts, seems like the braces for (2 * eps)
are omitted. Which leads to wrong values of hessian and the final expression is evaluated as eps * (positive - negative) / 2
:
Line 108 in 48e7137
Shouldn't the expression be (2 * eps)
instead of 2 * eps
?
Thank you for providing very good code. I executed search.py on multiple gpus, and the final result was only 89% accurate. Why? Need to continue augment.py to get 97% accuracy? Thank you
Hello,
Thanks for your great work!
When I run the augment.py, it is out of memory after the first epoch, is it the problem of the batch size?
The args are as follow:
09/11 03:17:31 AM |
09/11 03:17:31 AM | Parameters:
09/11 03:17:31 AM | AUX_WEIGHT=0.4
09/11 03:17:31 AM | BATCH_SIZE=48
09/11 03:17:31 AM | CUTOUT_LENGTH=16
09/11 03:17:31 AM | DATA_PATH=./data/
09/11 03:17:31 AM | DATASET=cifar10
09/11 03:17:31 AM | DROP_PATH_PROB=0.2
09/11 03:17:31 AM | EPOCHS=600
09/11 03:17:31 AM | GENOTYPE=Genotype(normal=[[('sep_conv_3x3', 1), ('sep_conv_3x3', 0)], [('sep_conv_3x3', 0), ('sep_conv_5x5', 1)],
[('sep_conv_3x3', 1), ('sep_conv_3x3', 2)], [('sep_conv_3x3', 0), ('sep_conv_3x3', 2)]], normal_concat=range(2, 6), reduce=[[('max_p
ool_3x3', 0), ('max_pool_3x3', 1)], [('max_pool_3x3', 0), ('skip_connect', 2)], [('max_pool_3x3', 0), ('skip_connect', 2)], [('max_po
ol_3x3', 0), ('skip_connect', 2)]], reduce_concat=range(2, 6))
09/11 03:17:31 AM | GPUS=[0]
09/11 03:17:31 AM | GRAD_CLIP=5.0
09/11 03:17:31 AM | INIT_CHANNELS=36
09/11 03:17:31 AM | LAYERS=20
09/11 03:17:31 AM | LR=0.025
09/11 03:17:31 AM | MOMENTUM=0.9
09/11 03:17:31 AM | NAME=cifar10
09/11 03:17:31 AM | PATH=augments/cifar10
09/11 03:17:31 AM | PRINT_FREQ=200
09/11 03:17:31 AM | SEED=2
09/11 03:17:31 AM | WEIGHT_DECAY=0.0003
09/11 03:17:31 AM | WORKERS=4
09/11 03:17:31 AM |
09/11 03:17:31 AM | Logger is set - training start
Hi @khanrc, could you pls have a look, thank you!
####### ALPHA #######
# Alpha - normal
tensor([[0.1218, 0.0975, 0.1138, 0.1351, 0.1548, 0.1235, 0.1320, 0.1215],
[0.1139, 0.1003, 0.1164, 0.1334, 0.1465, 0.1337, 0.1122, 0.1436]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1214, 0.0977, 0.1129, 0.1615, 0.1369, 0.1158, 0.1270, 0.1268],
[0.1156, 0.1016, 0.1173, 0.1353, 0.1403, 0.1248, 0.1239, 0.1412],
[0.1093, 0.0945, 0.1168, 0.1412, 0.1366, 0.1320, 0.1244, 0.1452]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1237, 0.0983, 0.1121, 0.1561, 0.1244, 0.1320, 0.1224, 0.1310],
[0.1165, 0.1013, 0.1166, 0.1399, 0.1313, 0.1222, 0.1181, 0.1541],
[0.1115, 0.0964, 0.1188, 0.1337, 0.1248, 0.1320, 0.1303, 0.1525],
[0.1046, 0.0929, 0.1095, 0.1312, 0.1279, 0.1366, 0.1376, 0.1597]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1203, 0.0975, 0.1106, 0.1398, 0.1291, 0.1238, 0.1283, 0.1506],
[0.1149, 0.1007, 0.1136, 0.1293, 0.1449, 0.1309, 0.1268, 0.1389],
[0.1023, 0.0916, 0.1123, 0.1368, 0.1317, 0.1337, 0.1295, 0.1621],
[0.0976, 0.0884, 0.1021, 0.1404, 0.1391, 0.1330, 0.1297, 0.1698],
[0.0925, 0.0855, 0.0926, 0.1443, 0.1380, 0.1312, 0.1340, 0.1819]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
# Alpha - reduce
tensor([[0.1393, 0.1220, 0.1175, 0.1296, 0.1360, 0.1224, 0.1201, 0.1131],
[0.1237, 0.1151, 0.1221, 0.1237, 0.1376, 0.1152, 0.1246, 0.1379]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1412, 0.1234, 0.1178, 0.1322, 0.1343, 0.1160, 0.1214, 0.1138],
[0.1200, 0.1119, 0.1241, 0.1250, 0.1303, 0.1242, 0.1288, 0.1358],
[0.1238, 0.1044, 0.1233, 0.1229, 0.1382, 0.1205, 0.1391, 0.1277]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1366, 0.1207, 0.1217, 0.1282, 0.1400, 0.1193, 0.1187, 0.1148],
[0.1253, 0.1168, 0.1290, 0.1238, 0.1278, 0.1233, 0.1187, 0.1354],
[0.1167, 0.1024, 0.1208, 0.1316, 0.1365, 0.1304, 0.1352, 0.1264],
[0.1172, 0.1006, 0.1191, 0.1271, 0.1343, 0.1302, 0.1401, 0.1314]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
tensor([[0.1443, 0.1285, 0.1273, 0.1274, 0.1273, 0.1195, 0.1124, 0.1133],
[0.1248, 0.1172, 0.1232, 0.1251, 0.1308, 0.1228, 0.1199, 0.1360],
[0.1244, 0.1066, 0.1240, 0.1254, 0.1320, 0.1290, 0.1335, 0.1251],
[0.1219, 0.1057, 0.1230, 0.1340, 0.1291, 0.1253, 0.1327, 0.1284],
[0.1173, 0.1051, 0.1216, 0.1347, 0.1382, 0.1252, 0.1289, 0.1289]],
device='cuda:3', grad_fn=<SoftmaxBackward>)
#####################
04/05 09:25:34 AM | Train: [10/50] Step 000/390 Loss 0.460 Prec@(1,5) (87.5%, 98.4%)
04/05 09:30:35 AM | Train: [10/50] Step 050/390 Loss 0.436 Prec@(1,5) (85.2%, 99.3%)
04/05 09:35:18 AM | Train: [10/50] Step 100/390 Loss 0.445 Prec@(1,5) (84.7%, 99.1%)
04/05 09:40:06 AM | Train: [10/50] Step 150/390 Loss 0.449 Prec@(1,5) (84.4%, 99.2%)
04/05 09:44:48 AM | Train: [10/50] Step 200/390 Loss 0.460 Prec@(1,5) (84.1%, 99.2%)
04/05 09:49:28 AM | Train: [10/50] Step 250/390 Loss 0.464 Prec@(1,5) (83.8%, 99.2%)
04/05 09:54:09 AM | Train: [10/50] Step 300/390 Loss 0.463 Prec@(1,5) (84.0%, 99.3%)
04/05 09:58:48 AM | Train: [10/50] Step 350/390 Loss 0.465 Prec@(1,5) (83.9%, 99.3%)
04/05 10:02:31 AM | Train: [10/50] Step 390/390 Loss 0.469 Prec@(1,5) (83.8%, 99.3%)
04/05 10:02:31 AM | Train: [10/50] Final Prec@1 83.8000%
04/05 10:02:32 AM | Valid: [10/50] Step 000/390 Loss 0.688 Prec@(1,5) (75.0%, 96.9%)
^CTraceback (most recent call last):
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1664, in <module>
main()
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1658, in main
Traceback (most recent call last):
File "/home/wangtao/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_comm.py", line 365, in _on_run
r = self.sock.recv(1024)
globals = debugger.run(setup['file'], None, None, is_module)
KeyboardInterrupt
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/wangtao/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/wangtao/prj/pt.darts/search.py", line 201, in <module>
main()
File "/home/wangtao/prj/pt.darts/search.py", line 88, in main
top1 = validate(valid_loader, model, epoch, cur_step)
File "/home/wangtao/prj/pt.darts/search.py", line 172, in validate
for step, (X, y) in enumerate(valid_loader):
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 819, in __iter__
return _DataLoaderIter(self)
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 560, in __init__
w.start()
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
File "/home/wangtao/.pycharm_helpers/pydev/_pydev_bundle/pydev_monkey.py", line 464, in new_fork
_on_forked_process()
File "/home/wangtao/.pycharm_helpers/pydev/_pydev_bundle/pydev_monkey.py", line 50, in _on_forked_process
pydevd.settrace_forked()
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1445, in settrace_forked
patch_multiprocessing=True,
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1210, in settrace
patch_multiprocessing,
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 1254, in _locked_settrace
debugger.connect(host, port) # Note: connect can raise error.
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 328, in connect
self.initialize_network(s)
File "/home/wangtao/.pycharm_helpers/pydev/pydevd.py", line 320, in initialize_network
time.sleep(0.1) # give threads time to start
KeyboardInterrupt
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/util.py", line 319, in _exit_function
p.join()
File "/home/wangtao/anaconda2/envs/pytorch1.0py3.5/lib/python3.5/multiprocessing/process.py", line 122, in join
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
For DARTS complexity analysis, anyone have any idea how to derive the (k+1)*k/2 expression ? Why 2 input nodes ? How will the calculated value change if graph isomorphism is considered ? Why "2+3+4+5" learnable edges ? If there is lack of connection, the paper should not add 1 which does not actually contribute to learnable edges configurations at all ?
Why need to train the weights for normal cells and reduction cells separately as shown in Figures 4 and 5 below ?
How to arrange the nodes such that the NAS search will actually converge with minimum error ? Note: Not all nodes are connected to each and every other nodes
Why is GDAS 10 times faster than DARTS ?
evalutate the result of search?
I have install the graphviz but still got module not found error.
Can anyone share your way?
First, thanks for your code! It is much clearer than original code and easy to read.
In your code, you set torch.backends.cudnn.benchmark = True
in search.py
, but the Pytorch document mentions that if the graph will not be modified, then torch.backends.cudnn.benchmark = True
would speed up, otherwise it would reduce the performance. So is it better to set it False? Just a suggestion,hah. Thanks again!
What's the meaning and purpose of Normal and Rdeuce network?
Thanks for your sharing codes!
Hi ,everyone
.
I run the command in Readme.md.
python augment.py --name cifar10 --dataset cifar10 --genotype "Genotype( normal=[[('sep_conv_3x3', 0), ('dil_conv_5x5', 1)], [('skip_connect', 0), ('dil_conv_3x3', 2)], [('sep_conv_3x3', 1), ('skip_connect', 0)], [('sep_conv_3x3', 1), ('skip_connect', 0)]], normal_concat=range(2, 6), reduce=[[('max_pool_3x3', 0), ('max_pool_3x3', 1)], [('max_pool_3x3', 0), ('skip_connect', 2)], [('skip_connect', 3), ('max_pool_3x3', 0)], [('skip_connect', 2), ('max_pool_3x3', 0)]], reduce_concat=range(2, 6))"
.
My enviroment is pytorch 1.2 and device is a single RTX2080Ti , but 600 epoch
seems will cost at about 20 hours
, it's a little longer to me , is there any method that we can accelerate convergence , such as change the SGD optimizer to Adam ?
In the genotype.py (parse function), you just select the edges and the operations with the original alpha, not the alpha possibility. Sometimes it may derive wrong genotypes. For example, the alpha param for the second node is as follows:
alpha = [[1, 1, 1, 1, 5, 1, 1, 1], [10, 10, 10, 11, 10, 10, 10, 10], [10, 15, 10, 10, 10, 10, 10, 10]]
In your parse function, the final genotype would select the second and third edge. But larger values do not always reflect larger possibilities. Actually, we should select the first and third edge. LOL
Also, as mentioned in another issue, the formula of hessian should be as follows:
hessian = [(p-n) / (2.*eps) for p, n in zip(dalpha_pos, dalpha_neg)]
请问,Augment是用来做什么的?
Has anyone run this on CIFAR-100? I modified the code just enough to download and train on it, changing nothing else, and after running augment.py
on two GPUs, I got 65% top-1 and 88% top-5 accuracy on the validation set. 65% top-1 is state of the art for 2014. Current SOTA for CIFAR-100 is 91.3 for top-1 accuracy. https://benchmarks.ai/cifar-100
Here's the search command:
python search.py --name cifar100 --dataset cifar100 --gpus all --batch_size 96 --workers 8 --print_freq 10 --w_lr 0.05 --w_lr_min 0.002 --alpha_lr 0.0006
Here's the augment command:
python augment.py --name cifar100_1 --dataset cifar100 --genotype "Genotype(normal=[[('skip_connect', 0), ('skip_connect', 1)], [('skip_connect', 0), ('skip_connect', 1)], [('skip_connect', 0), ('skip_connect', 1)], [('skip_connect', 0), ('skip_connect', 1)]], normal_concat=range(2, 6), reduce=[[('avg_pool_3x3', 0), ('avg_pool_3x3', 1)], [('skip_connect', 2), ('max_pool_3x3', 0)], [('skip_connect', 2), ('avg_pool_3x3', 0)], [('skip_connect', 2), ('avg_pool_3x3', 0)]], reduce_concat=range(2, 6)
Has anyone else tried to train CIFAR-100?
Hi,
In the Table 1 in your paper, there is a random search baseline. Are these arch get by random sample architectures? Could you please tell me how do you implement the random sample in your paper? Thx!
Running the code on Google Colab, I got:
Traceback (most recent call last):
File "search.py", line 19, in <module>
writer = SummaryWriter(log_dir=os.path.join(config.path, "tb"))
File "/usr/local/lib/python3.6/dist-packages/tensorboardX/writer.py", line 254, in __init__
self._get_file_writer()
File "/usr/local/lib/python3.6/dist-packages/tensorboardX/writer.py", line 310, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir, **self.kwargs)
TypeError: __init__() got an unexpected keyword argument 'log_dir'
I found that valid_loader
is used in train(train_loader, valid_loader, ...)
and validate(valid_loader, ...)
.
The gradient of model params (indicated as valid_loader
in train
func, so validation with valid_loader
is not fair, I think.
How to train on imagenet?THX.
in architect.py, Im confused about the following 3 lines of code:
v_grads = torch.autograd.grad(loss, v_alphas + v_weights)
dalpha = v_grads[:len(v_alphas)]
dw = v_grads[len(v_alphas):]
why does the gradient compute w.r.t (v_alphas+v_weights)? and the dalpha is retrieved from v_grads[:len(v_alphas)]. I thought it should be computed w.r.t v_alphas only based on equation (7).
the other question is why can you get dalpha and dw from v_grads directly instead of doing autograd separately?
Hi, thanks for the nice implementation. I am trying to modify the codes to support multi-gpu but it didn't work out. I don't know how to parallel the Architect. Do you have any suggestions or are you going to add the multi-gpu feature? Thanks for your help.
Could you please add a license, so your code can be used legally? :)
Environmental for requests
numpy==1.15.4
graphviz==0.8.4
torch==1.0.0
torchvision==0.2.1
tensorboard==1.13.0
tensorboardX==1.6
when I use 4 gpus(GPU-Util about 4-21%), time will be very very slower than only use 1 gpu(GPU-Util about 30-90%)?
hello, Thank you for your code. I want to ask about SearchCNN, 60 C_cur_out = C_cur * n_nodes, what does this mean?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.