Comments (8)
执行
quant_program = quant.quant_aware(train_prog, exe.place, for_test=False)
val_program = fluid.default_main_program().clone(for_test=True)
再训练quant_program,发觉不能用fluid.ParallelExecutor,提示
AttributeError: 'CompiledProgram' object has no attribute '_enable_dgc'
quant_program的类型是CompiledProgram, 可以用CompiledProgram.with_data_parallel来多卡并行。和ParallelExecutor是不一样的方式,但是功能都是实现多卡并行
from paddleslim.
用CompiledProgram.with_data_parallel 多卡训练,run部分代码如下
train_cp = train_prog.with_data_parallel(
loss_name=sum_cost,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
outs = train_exe.run(train_cp,
fetch_list=[sum_cost.name, token_num.name],
feed=feed_dict_list)
执行后出现如下错误提示,会是什么原因呢?
Traceback (most recent call last):
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/pydevd.py", line 1664, in
main()
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/qzy3/transformer/ver7-len80/transf-quant-1/train.py", line 547, in
train(args)
File "/home/qzy3/transformer/ver7-len80/transf-quant-1/train.py", line 534, in train
token_num, pyreader)
File "/home/qzy3/transformer/ver7-len80/transf-quant-1/train.py", line 368, in train_loop
feed=feed_dict_list)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 783, in run
six.reraise(*sys.exc_info())
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 778, in run
use_program_cache=use_program_cache)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 833, in _run_impl
program._compile(scope, self.place)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/compiler.py", line 424, in _compile
places=self._places)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/compiler.py", line 377, in _compile_data_parallel
self._exec_strategy, self._build_strategy, self._graph)
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. paddle.fluid.core_avx.ParallelExecutor(arg0: List[paddle.fluid.core_avx.Place], arg1: List[str], arg2: str, arg3: paddle.fluid.core_avx._Scope, arg4: List[paddle.fluid.core_avx._Scope], arg5: paddle.fluid.core_avx.ParallelExecutor.ExecutionStrategy, arg6: paddle.fluid.core_avx.ParallelExecutor.BuildStrategy, arg7: paddle::framework::ir::Graph)
Invoked with: [<paddle.fluid.core_avx.Place object at 0x7f2eab27ff10>, <paddle.fluid.core_avx.Place object at 0x7f2eab27fed8>, <paddle.fluid.core_avx.Place object at 0x7f2eab27fea0>], ['@LR_DECAY_COUNTER@', 'accum_0', 'accum_1', 'accum_10', 'accum_11', 'accum_12', 'accum_13', 'accum_14', 'accum_15', 'accum_16', 'accum_17', 'accum_18', 'accum_19', 'accum_2', 'accum_20', 'accum_21', 'accum_22', 'accum_23', 'accum_24', 'accum_25', 'accum_26', 'accum_27', 'accum_28', 'accum_29', 'accum_3', 'accum_30', 'accum_31', 'accum_32', 'accum_33', 'accum_34', 'accum_35', 'accum_36', 'accum_37', 'accum_38', 'accum_39', 'accum_4', 'accum_40', 'accum_41', 'accum_42', 'accum_43', 'accum_44', 'accum_45', 'accum_46', 'accum_47', 'accum_48', 'accum_49', 'accum_5', 'accum_50', 'accum_51', 'accum_52', 'accum_53', 'accum_54', 'accum_55', 'accum_56', 'accum_57', 'accum_58', 'accum_59', 'accum_6', 'accum_60', 'accum_61', 'accum_7', 'accum_8', 'accum_9', 'dropout_11.tmp_0.scale', 'dropout_15.tmp_0.scale', 'dropout_19.tmp_0.scale', 'dropout_23.tmp_0.scale', 'dropout_3.tmp_0.scale', 'dropout_30.tmp_0.scale', 'dropout_36.tmp_0.scale', 'dropout_42.tmp_0.scale', 'dropout_48.tmp_0.scale', 'dropout_54.tmp_0.scale', 'dropout_60.tmp_0.scale', 'dropout_7.tmp_0.scale', 'fc_0.w_0', 'fc_0.w_0_beta1_pow_acc_0', 'fc_0.w_0_beta2_pow_acc_0', 'fc_0.w_0_moment1_0', 'fc_0.w_0_moment2_0', 'fc_1.w_0', 'fc_1.w_0_beta1_pow_acc_0', 'fc_1.w_0_beta2_pow_acc_0', 'fc_1.w_0_moment1_0', 'fc_1.w_0_moment2_0', 'fc_10.b_0', 'fc_10.b_0_beta1_pow_acc_0', 'fc_10.b_0_beta2_pow_acc_0', 'fc_10.b_0_moment1_0', 'fc_10.b_0_moment2_0', 'fc_10.w_0', 'fc_10.w_0_beta1_pow_acc_0', 'fc_10.w_0_beta2_pow_acc_0', 'fc_10.w_0_moment1_0', 'fc_10.w_0_moment2_0', 'fc_11.b_0', 'fc_11.b_0_beta1_pow_acc_0', 'fc_11.b_0_beta2_pow_acc_0', 'fc_11.b_0_moment1_0', 'fc_11.b_0_moment2_0', 'fc_11.w_0', 'fc_11.w_0_beta1_pow_acc_0', 'fc_11.w_0_beta2_pow_acc_0', 'fc_11.w_0_moment1_0', 'fc_11.w_0_moment2_0', 'fc_12.w_0', 'fc_12.w_0_beta1_pow_acc_0', 'fc_12.w_0_beta2_pow_acc_0', 'fc_12.w_0_moment1_0', 'fc_12.w_0_moment2_0', 'fc_13.w_0', 'fc_13.w_0_beta1_pow_acc_0', 'fc_13.w_0_beta2_pow_acc_0', 'fc_13.w_0_moment1_0', 'fc_13.w_0_moment2_0', 'fc_14.w_0', 'fc_14.w_0_beta1_pow_acc_0', 'fc_14.w_0_beta2_pow_acc_0', 'fc_14.w_0_moment1_0', 'fc_14.w_0_moment2_0', 'fc_15.w_0', 'fc_15.w_0_beta1_pow_acc_0', 'fc_15.w_0_beta2_pow_acc_0', 'fc_15.w_0_moment1_0', 'fc_15.w_0_moment2_0', 'fc_16.b_0', 'fc_16.b_0_beta1_pow_acc_0', 'fc_16.b_0_beta2_pow_acc_0', 'fc_16.b_0_moment1_0', 'fc_16.b_0_moment2_0', 'fc_16.w_0', 'fc_16.w_0_beta1_pow_acc_0', 'fc_16.w_0_beta2_pow_acc_0', 'fc_16.w_0_moment1_0', 'fc_16.w_0_moment2_0', 'fc_17.b_0', 'fc_17.b_0_beta1_pow_acc_0', 'fc_17.b_0_beta2_pow_acc_0', 'fc_17.b_0_moment1_0', 'fc_17.b_0_moment2_0', 'fc_17.w_0', 'fc_17.w_0_beta1_pow_acc_0', 'fc_17.w_0_beta2_pow_acc_0', 'fc_17.w_0_moment1_0', 'fc_17.w_0_moment2_0', 'fc_18.w_0', 'fc_18.w_0_beta1_pow_acc_0', 'fc_18.w_0_beta2_pow_acc_0', 'fc_18.w_0_moment1_0', 'fc_18.w_0_moment2_0', 'fc_19.w_0', 'fc_19.w_0_beta1_pow_acc_0', 'fc_19.w_0_beta2_pow_acc_0', 'fc_19.w_0_moment1_0', 'fc_19.w_0_moment2_0', 'fc_2.w_0', 'fc_2.w_0_beta1_pow_acc_0', 'fc_2.w_0_beta2_pow_acc_0', 'fc_2.w_0_moment1_0', 'fc_2.w_0_moment2_0', 'fc_20.w_0', 'fc_20.w_0_beta1_pow_acc_0', 'fc_20.w_0_beta2_pow_acc_0', 'fc_20.w_0_moment1_0', 'fc_20.w_0_moment2_0', 'fc_21.w_0', 'fc_21.w_0_beta1_pow_acc_0', 'fc_21.w_0_beta2_pow_acc_0', 'fc_21.w_0_moment1_0', 'fc_21.w_0_moment2_0', 'fc_22.b_0', 'fc_22.b_0_beta1_pow_acc_0', 'fc_22.b_0_beta2_pow_acc_0', 'fc_22.b_0_moment1_0', 'fc_22.b_0_moment2_0', 'fc_22.w_0', 'fc_22.w_0_beta1_pow_acc_0', 'fc_22.w_0_beta2_pow_acc_0', 'fc_22.w_0_moment1_0', 'fc_22.w_0_moment2_0', 'fc_23.b_0', 'fc_23.b_0_beta1_pow_acc_0', 'fc_23.b_0_beta2_pow_acc_0', 'fc_23.b_0_moment1_0', 'fc_23.b_0_moment2_0', 'fc_23.w_0', 'fc_23.w_0_beta1_pow_acc_0', 'fc_23.w_0_beta2_pow_acc_0', 'fc_23.w_0_moment1_0', 'fc_23.w_0_moment2_0', 'fc_24.w_0', 'fc_24.w_0_beta1_pow_acc_0', 'fc_24.w_0_beta2_pow_acc_0', 'fc_24.w_0_moment1_0', 'fc_24.w_0_moment2_0', 'fc_25.w_0', 'fc_25.w_0_beta1_pow_acc_0', 'fc_25.w_0_beta2_pow_acc_0', 'fc_25.w_0_moment1_0', 'fc_25.w_0_moment2_0', 'fc_26.w_0', 'fc_26.w_0_beta1_pow_acc_0', 'fc_26.w_0_beta2_pow_acc_0', 'fc_26.w_0_moment1_0', 'fc_26.w_0_moment2_0', 'fc_27.w_0', 'fc_27.w_0_beta1_pow_acc_0', 'fc_27.w_0_beta2_pow_acc_0', 'fc_27.w_0_moment1_0', 'fc_27.w_0_moment2_0', 'fc_28.b_0', 'fc_28.b_0_beta1_pow_acc_0', 'fc_28.b_0_beta2_pow_acc_0', 'fc_28.b_0_moment1_0', 'fc_28.b_0_moment2_0', 'fc_28.w_0', 'fc_28.w_0_beta1_pow_acc_0', 'fc_28.w_0_beta2_pow_acc_0', 'fc_28.w_0_moment1_0', 'fc_28.w_0_moment2_0', 'fc_29.b_0', 'fc_29.b_0_beta1_pow_acc_0', 'fc_29.b_0_beta2_pow_acc_0', 'fc_29.b_0_moment1_0', 'fc_29.b_0_moment2_0', 'fc_29.w_0', 'fc_29.w_0_beta1_pow_acc_0', 'fc_29.w_0_beta2_pow_acc_0', 'fc_29.w_0_moment1_0', 'fc_29.w_0_moment2_0', 'fc_3.w_0', 'fc_3.w_0_beta1_pow_acc_0', 'fc_3.w_0_beta2_pow_acc_0', 'fc_3.w_0_moment1_0', 'fc_3.w_0_moment2_0', 'fc_30.w_0', 'fc_30.w_0_beta1_pow_acc_0', 'fc_30.w_0_beta2_pow_acc_0', 'fc_30.w_0_moment1_0', 'fc_30.w_0_moment2_0', 'fc_31.w_0', 'fc_31.w_0_beta1_pow_acc_0', 'fc_31.w_0_beta2_pow_acc_0', 'fc_31.w_0_moment1_0', 'fc_31.w_0_moment2_0', 'fc_32.w_0', 'fc_32.w_0_beta1_pow_acc_0', 'fc_32.w_0_beta2_pow_acc_0', 'fc_32.w_0_moment1_0', 'fc_32.w_0_moment2_0', 'fc_33.w_0', 'fc_33.w_0_beta1_pow_acc_0', 'fc_33.w_0_beta2_pow_acc_0', 'fc_33.w_0_moment1_0', 'fc_33.w_0_moment2_0', 'fc_34.b_0', 'fc_34.b_0_beta1_pow_acc_0', 'fc_34.b_0_beta2_pow_acc_0', 'fc_34.b_0_moment1_0', 'fc_34.b_0_moment2_0', 'fc_34.w_0', 'fc_34.w_0_beta1_pow_acc_0', 'fc_34.w_0_beta2_pow_acc_0', 'fc_34.w_0_moment1_0', 'fc_34.w_0_moment2_0', 'fc_35.b_0', 'fc_35.b_0_beta1_pow_acc_0', 'fc_35.b_0_beta2_pow_acc_0', 'fc_35.b_0_moment1_0', 'fc_35.b_0_moment2_0', 'fc_35.w_0', 'fc_35.w_0_beta1_pow_acc_0', 'fc_35.w_0_beta2_pow_acc_0', 'fc_35.w_0_moment1_0', 'fc_35.w_0_moment2_0', 'fc_36.w_0', 'fc_36.w_0_beta1_pow_acc_0', 'fc_36.w_0_beta2_pow_acc_0', 'fc_36.w_0_moment1_0', 'fc_36.w_0_moment2_0', 'fc_37.w_0', 'fc_37.w_0_beta1_pow_acc_0', 'fc_37.w_0_beta2_pow_acc_0', 'fc_37.w_0_moment1_0', 'fc_37.w_0_moment2_0', 'fc_38.w_0', 'fc_38.w_0_beta1_pow_acc_0', 'fc_38.w_0_beta2_pow_acc_0', 'fc_38.w_0_moment1_0', 'fc_38.w_0_moment2_0', 'fc_39.w_0', 'fc_39.w_0_beta1_pow_acc_0', 'fc_39.w_0_beta2_pow_acc_0', 'fc_39.w_0_moment1_0', 'fc_39.w_0_moment2_0', 'fc_4.b_0', 'fc_4.b_0_beta1_pow_acc_0', 'fc_4.b_0_beta2_pow_acc_0', 'fc_4.b_0_moment1_0', 'fc_4.b_0_moment2_0', 'fc_4.w_0', 'fc_4.w_0_beta1_pow_acc_0', 'fc_4.w_0_beta2_pow_acc_0', 'fc_4.w_0_moment1_0', 'fc_4.w_0_moment2_0', 'fc_40.w_0', 'fc_40.w_0_beta1_pow_acc_0', 'fc_40.w_0_beta2_pow_acc_0', 'fc_40.w_0_moment1_0', 'fc_40.w_0_moment2_0', 'fc_41.w_0', 'fc_41.w_0_beta1_pow_acc_0', 'fc_41.w_0_beta2_pow_acc_0', 'fc_41.w_0_moment1_0', 'fc_41.w_0_moment2_0', 'fc_42.w_0', 'fc_42.w_0_beta1_pow_acc_0', 'fc_42.w_0_beta2_pow_acc_0', 'fc_42.w_0_moment1_0', 'fc_42.w_0_moment2_0', 'fc_43.w_0', 'fc_43.w_0_beta1_pow_acc_0', 'fc_43.w_0_beta2_pow_acc_0', 'fc_43.w_0_moment1_0', 'fc_43.w_0_moment2_0', 'fc_44.b_0', 'fc_44.b_0_beta1_pow_acc_0', 'fc_44.b_0_beta2_pow_acc_0', 'fc_44.b_0_moment1_0', 'fc_44.b_0_moment2_0', 'fc_44.w_0', 'fc_44.w_0_beta1_pow_acc_0', 'fc_44.w_0_beta2_pow_acc_0', 'fc_44.w_0_moment1_0', 'fc_44.w_0_moment2_0', 'fc_45.b_0', 'fc_45.b_0_beta1_pow_acc_0', 'fc_45.b_0_beta2_pow_acc_0', 'fc_45.b_0_moment1_0', 'fc_45.b_0_moment2_0', 'fc_45.w_0', 'fc_45.w_0_beta1_pow_acc_0', 'fc_45.w_0_beta2_pow_acc_0', 'fc_45.w_0_moment1_0', 'fc_45.w_0_moment2_0', 'fc_46.w_0', 'fc_46.w_0_beta1_pow_acc_0', 'fc_46.w_0_beta2_pow_acc_0', 'fc_46.w_0_moment1_0', 'fc_46.w_0_moment2_0', 'fc_47.w_0', 'fc_47.w_0_beta1_pow_acc_0', 'fc_47.w_0_beta2_pow_acc_0', 'fc_47.w_0_moment1_0', 'fc_47.w_0_moment2_0', 'fc_48.w_0', 'fc_48.w_0_beta1_pow_acc_0', 'fc_48.w_0_beta2_pow_acc_0', 'fc_48.w_0_moment1_0', 'fc_48.w_0_moment2_0', 'fc_49.w_0', 'fc_49.w_0_beta1_pow_acc_0', 'fc_49.w_0_beta2_pow_acc_0', 'fc_49.w_0_moment1_0', 'fc_49.w_0_moment2_0', 'fc_5.b_0', 'fc_5.b_0_beta1_pow_acc_0', 'fc_5.b_0_beta2_pow_acc_0', 'fc_5.b_0_moment1_0', 'fc_5.b_0_moment2_0', 'fc_5.w_0', 'fc_5.w_0_beta1_pow_acc_0', 'fc_5.w_0_beta2_pow_acc_0', 'fc_5.w_0_moment1_0', 'fc_5.w_0_moment2_0', 'fc_50.w_0', 'fc_50.w_0_beta1_pow_acc_0', 'fc_50.w_0_beta2_pow_acc_0', 'fc_50.w_0_moment1_0', 'fc_50.w_0_moment2_0', 'fc_51.w_0', 'fc_51.w_0_beta1_pow_acc_0', 'fc_51.w_0_beta2_pow_acc_0', 'fc_51.w_0_moment1_0', 'fc_51.w_0_moment2_0', 'fc_52.w_0', 'fc_52.w_0_beta1_pow_acc_0', 'fc_52.w_0_beta2_pow_acc_0', 'fc_52.w_0_moment1_0', 'fc_52.w_0_moment2_0', 'fc_53.w_0', 'fc_53.w_0_beta1_pow_acc_0', 'fc_53.w_0_beta2_pow_acc_0', 'fc_53.w_0_moment1_0', 'fc_53.w_0_moment2_0', 'fc_54.b_0', 'fc_54.b_0_beta1_pow_acc_0', 'fc_54.b_0_beta2_pow_acc_0', 'fc_54.b_0_moment1_0', 'fc_54.b_0_moment2_0', 'fc_54.w_0', 'fc_54.w_0_beta1_pow_acc_0', 'fc_54.w_0_beta2_pow_acc_0', 'fc_54.w_0_moment1_0', 'fc_54.w_0_moment2_0', 'fc_55.b_0', 'fc_55.b_0_beta1_pow_acc_0', 'fc_55.b_0_beta2_pow_acc_0', 'fc_55.b_0_moment1_0', 'fc_55.b_0_moment2_0', 'fc_55.w_0', 'fc_55.w_0_beta1_pow_acc_0', 'fc_55.w_0_beta2_pow_acc_0', 'fc_55.w_0_moment1_0', 'fc_55.w_0_moment2_0', 'fc_56.w_0', 'fc_56.w_0_beta1_pow_acc_0', 'fc_56.w_0_beta2_pow_acc_0', 'fc_56.w_0_moment1_0', 'fc_56.w_0_moment2_0', 'fc_57.w_0', 'fc_57.w_0_beta1_pow_acc_0', 'fc_57.w_0_beta2_pow_acc_0', 'fc_57.w_0_moment1_0', 'fc_57.w_0_moment2_0', 'fc_58.w_0', 'fc_58.w_0_beta1_pow_acc_0', 'fc_58.w_0_beta2_pow_acc_0', 'fc_58.w_0_moment1_0', 'fc_58.w_0_moment2_0', 'fc_59.w_0', 'fc_59.w_0_beta1_pow_acc_0', 'fc_59.w_0_beta2_pow_acc_0', 'fc_59.w_0_moment1_0', 'fc_59.w_0_moment2_0', 'fc_6.w_0', 'fc_6.w_0_beta1_pow_acc_0', 'fc_6.w_0_beta2_pow_acc_0', 'fc_6.w_0_moment1_0', 'fc_6.w_0_moment2_0', 'fc_60.w_0', 'fc_60.w_0_beta1_pow_acc_0', 'fc_60.w_0_beta2_pow_acc_0', 'fc_60.w_0_moment1_0', 'fc_60.w_0_moment2_0', 'fc_61.w_0', 'fc_61.w_0_beta1_pow_acc_0', 'fc_61.w_0_beta2_pow_acc_0', 'fc_61.w_0_moment1_0', 'fc_61.w_0_moment2_0', 'fc_62.w_0', 'fc_62.w_0_beta1_pow_acc_0', 'fc_62.w_0_beta2_pow_acc_0', 'fc_62.w_0_moment1_0', 'fc_62.w_0_moment2_0', 'fc_63.w_0', 'fc_63.w_0_beta1_pow_acc_0', 'fc_63.w_0_beta2_pow_acc_0', 'fc_63.w_0_moment1_0', 'fc_63.w_0_moment2_0', 'fc_64.b_0', 'fc_64.b_0_beta1_pow_acc_0', 'fc_64.b_0_beta2_pow_acc_0', 'fc_64.b_0_moment1_0', 'fc_64.b_0_moment2_0', 'fc_64.w_0', 'fc_64.w_0_beta1_pow_acc_0', 'fc_64.w_0_beta2_pow_acc_0', 'fc_64.w_0_moment1_0', 'fc_64.w_0_moment2_0', 'fc_65.b_0', 'fc_65.b_0_beta1_pow_acc_0', 'fc_65.b_0_beta2_pow_acc_0', 'fc_65.b_0_moment1_0', 'fc_65.b_0_moment2_0', 'fc_65.w_0', 'fc_65.w_0_beta1_pow_acc_0', 'fc_65.w_0_beta2_pow_acc_0', 'fc_65.w_0_moment1_0', 'fc_65.w_0_moment2_0', 'fc_66.w_0', 'fc_66.w_0_beta1_pow_acc_0', 'fc_66.w_0_beta2_pow_acc_0', 'fc_66.w_0_moment1_0', 'fc_66.w_0_moment2_0', 'fc_67.w_0', 'fc_67.w_0_beta1_pow_acc_0', 'fc_67.w_0_beta2_pow_acc_0', 'fc_67.w_0_moment1_0', 'fc_67.w_0_moment2_0', 'fc_68.w_0', 'fc_68.w_0_beta1_pow_acc_0', 'fc_68.w_0_beta2_pow_acc_0', 'fc_68.w_0_moment1_0', 'fc_68.w_0_moment2_0', 'fc_69.w_0', 'fc_69.w_0_beta1_pow_acc_0', 'fc_69.w_0_beta2_pow_acc_0', 'fc_69.w_0_moment1_0', 'fc_69.w_0_moment2_0', 'fc_7.w_0', 'fc_7.w_0_beta1_pow_acc_0', 'fc_7.w_0_beta2_pow_acc_0', 'fc_7.w_0_moment1_0', 'fc_7.w_0_moment2_0', 'fc_70.w_0', 'fc_70.w_0_beta1_pow_acc_0', 'fc_70.w_0_beta2_pow_acc_0', 'fc_70.w_0_moment1_0', 'fc_70.w_0_moment2_0', 'fc_71.w_0', 'fc_71.w_0_beta1_pow_acc_0', 'fc_71.w_0_beta2_pow_acc_0', 'fc_71.w_0_moment1_0', 'fc_71.w_0_moment2_0', 'fc_72.w_0', 'fc_72.w_0_beta1_pow_acc_0', 'fc_72.w_0_beta2_pow_acc_0', 'fc_72.w_0_moment1_0', 'fc_72.w_0_moment2_0', 'fc_73.w_0', 'fc_73.w_0_beta1_pow_acc_0', 'fc_73.w_0_beta2_pow_acc_0', 'fc_73.w_0_moment1_0', 'fc_73.w_0_moment2_0', 'fc_74.b_0', 'fc_74.b_0_beta1_pow_acc_0', 'fc_74.b_0_beta2_pow_acc_0', 'fc_74.b_0_moment1_0', 'fc_74.b_0_moment2_0', 'fc_74.w_0', 'fc_74.w_0_beta1_pow_acc_0', 'fc_74.w_0_beta2_pow_acc_0', 'fc_74.w_0_moment1_0', 'fc_74.w_0_moment2_0', 'fc_75.b_0', 'fc_75.b_0_beta1_pow_acc_0', 'fc_75.b_0_beta2_pow_acc_0', 'fc_75.b_0_moment1_0', 'fc_75.b_0_moment2_0', 'fc_75.w_0', 'fc_75.w_0_beta1_pow_acc_0', 'fc_75.w_0_beta2_pow_acc_0', 'fc_75.w_0_moment1_0', 'fc_75.w_0_moment2_0', 'fc_76.w_0', 'fc_76.w_0_beta1_pow_acc_0', 'fc_76.w_0_beta2_pow_acc_0', 'fc_76.w_0_moment1_0', 'fc_76.w_0_moment2_0', 'fc_77.w_0', 'fc_77.w_0_beta1_pow_acc_0', 'fc_77.w_0_beta2_pow_acc_0', 'fc_77.w_0_moment1_0', 'fc_77.w_0_moment2_0', 'fc_78.w_0', 'fc_78.w_0_beta1_pow_acc_0', 'fc_78.w_0_beta2_pow_acc_0', 'fc_78.w_0_moment1_0', 'fc_78.w_0_moment2_0', 'fc_79.w_0', 'fc_79.w_0_beta1_pow_acc_0', 'fc_79.w_0_beta2_pow_acc_0', 'fc_79.w_0_moment1_0', 'fc_79.w_0_moment2_0', 'fc_8.w_0', 'fc_8.w_0_beta1_pow_acc_0', 'fc_8.w_0_beta2_pow_acc_0', 'fc_8.w_0_moment1_0', 'fc_8.w_0_moment2_0', 'fc_80.w_0', 'fc_80.w_0_beta1_pow_acc_0', 'fc_80.w_0_beta2_pow_acc_0', 'fc_80.w_0_moment1_0', 'fc_80.w_0_moment2_0', 'fc_81.w_0', 'fc_81.w_0_beta1_pow_acc_0', 'fc_81.w_0_beta2_pow_acc_0', 'fc_81.w_0_moment1_0', 'fc_81.w_0_moment2_0', 'fc_82.w_0', 'fc_82.w_0_beta1_pow_acc_0', 'fc_82.w_0_beta2_pow_acc_0', 'fc_82.w_0_moment1_0', 'fc_82.w_0_moment2_0', 'fc_83.w_0', 'fc_83.w_0_beta1_pow_acc_0', 'fc_83.w_0_beta2_pow_acc_0', 'fc_83.w_0_moment1_0', 'fc_83.w_0_moment2_0', 'fc_84.b_0', 'fc_84.b_0_beta1_pow_acc_0', 'fc_84.b_0_beta2_pow_acc_0', 'fc_84.b_0_moment1_0', 'fc_84.b_0_moment2_0', 'fc_84.w_0', 'fc_84.w_0_beta1_pow_acc_0', 'fc_84.w_0_beta2_pow_acc_0', 'fc_84.w_0_moment1_0', 'fc_84.w_0_moment2_0', 'fc_85.b_0', 'fc_85.b_0_beta1_pow_acc_0', 'fc_85.b_0_beta2_pow_acc_0', 'fc_85.b_0_moment1_0', 'fc_85.b_0_moment2_0', 'fc_85.w_0', 'fc_85.w_0_beta1_pow_acc_0', 'fc_85.w_0_beta2_pow_acc_0', 'fc_85.w_0_moment1_0', 'fc_85.w_0_moment2_0', 'fc_86.w_0', 'fc_86.w_0_beta1_pow_acc_0', 'fc_86.w_0_beta2_pow_acc_0', 'fc_86.w_0_moment1_0', 'fc_86.w_0_moment2_0', 'fc_87.w_0', 'fc_87.w_0_beta1_pow_acc_0', 'fc_87.w_0_beta2_pow_acc_0', 'fc_87.w_0_moment1_0', 'fc_87.w_0_moment2_0', 'fc_88.w_0', 'fc_88.w_0_beta1_pow_acc_0', 'fc_88.w_0_beta2_pow_acc_0', 'fc_88.w_0_moment1_0', 'fc_88.w_0_moment2_0', 'fc_89.w_0', 'fc_89.w_0_beta1_pow_acc_0', 'fc_89.w_0_beta2_pow_acc_0', 'fc_89.w_0_moment1_0', 'fc_89.w_0_moment2_0', 'fc_9.w_0', 'fc_9.w_0_beta1_pow_acc_0', 'fc_9.w_0_beta2_pow_acc_0', 'fc_9.w_0_moment1_0', 'fc_9.w_0_moment2_0', 'fc_90.w_0', 'fc_90.w_0_beta1_pow_acc_0', 'fc_90.w_0_beta2_pow_acc_0', 'fc_90.w_0_moment1_0', 'fc_90.w_0_moment2_0', 'fc_91.w_0', 'fc_91.w_0_beta1_pow_acc_0', 'fc_91.w_0_beta2_pow_acc_0', 'fc_91.w_0_moment1_0', 'fc_91.w_0_moment2_0', 'fc_92.w_0', 'fc_92.w_0_beta1_pow_acc_0', 'fc_92.w_0_beta2_pow_acc_0', 'fc_92.w_0_moment1_0', 'fc_92.w_0_moment2_0', 'fc_93.w_0', 'fc_93.w_0_beta1_pow_acc_0', 'fc_93.w_0_beta2_pow_acc_0', 'fc_93.w_0_moment1_0', 'fc_93.w_0_moment2_0', 'fc_94.b_0', 'fc_94.b_0_beta1_pow_acc_0', 'fc_94.b_0_beta2_pow_acc_0', 'fc_94.b_0_moment1_0', 'fc_94.b_0_moment2_0', 'fc_94.w_0', 'fc_94.w_0_beta1_pow_acc_0', 'fc_94.w_0_beta2_pow_acc_0', 'fc_94.w_0_moment1_0', 'fc_94.w_0_moment2_0', 'fc_95.b_0', 'fc_95.b_0_beta1_pow_acc_0', 'fc_95.b_0_beta2_pow_acc_0', 'fc_95.b_0_moment1_0', 'fc_95.b_0_moment2_0', 'fc_95.w_0', 'fc_95.w_0_beta1_pow_acc_0', 'fc_95.w_0_beta2_pow_acc_0', 'fc_95.w_0_moment1_0', 'fc_95.w_0_moment2_0', 'fc_96.w_0', 'fc_96.w_0_beta1_pow_acc_0', 'fc_96.w_0_beta2_pow_acc_0', 'fc_96.w_0_moment1_0', 'fc_96.w_0_moment2_0', 'layer_norm_0.b_0', 'layer_norm_0.b_0_beta1_pow_acc_0', 'layer_norm_0.b_0_beta2_pow_acc_0', 'layer_norm_0.b_0_moment1_0', 'layer_norm_0.b_0_moment2_0', 'layer_norm_0.tmp_2.scale', 'layer_norm_0.w_0', 'layer_norm_0.w_0_beta1_pow_acc_0', 'layer_norm_0.w_0_beta2_pow_acc_0', 'layer_norm_0.w_0_moment1_0', 'layer_norm_0.w_0_moment2_0', 'layer_norm_1.b_0', 'layer_norm_1.b_0_beta1_pow_acc_0', 'layer_norm_1.b_0_beta2_pow_acc_0', 'layer_norm_1.b_0_moment1_0', 'layer_norm_1.b_0_moment2_0', 'layer_norm_1.tmp_2.scale', 'layer_norm_1.w_0', 'layer_norm_1.w_0_beta1_pow_acc_0', 'layer_norm_1.w_0_beta2_pow_acc_0', 'layer_norm_1.w_0_moment1_0', 'layer_norm_1.w_0_moment2_0', 'layer_norm_10.b_0', 'layer_norm_10.b_0_beta1_pow_acc_0', 'layer_norm_10.b_0_beta2_pow_acc_0', 'layer_norm_10.b_0_moment1_0', 'layer_norm_10.b_0_moment2_0', 'layer_norm_10.tmp_2.scale', 'layer_norm_10.w_0', 'layer_norm_10.w_0_beta1_pow_acc_0', 'layer_norm_10.w_0_beta2_pow_acc_0', 'layer_norm_10.w_0_moment1_0', 'layer_norm_10.w_0_moment2_0', 'layer_norm_11.b_0', 'layer_norm_11.b_0_beta1_pow_acc_0', 'layer_norm_11.b_0_beta2_pow_acc_0', 'layer_norm_11.b_0_moment1_0', 'layer_norm_11.b_0_moment2_0', 'layer_norm_11.tmp_2.scale', 'layer_norm_11.w_0', 'layer_norm_11.w_0_beta1_pow_acc_0', 'layer_norm_11.w_0_beta2_pow_acc_0', 'layer_norm_11.w_0_moment1_0', 'layer_norm_11.w_0_moment2_0', 'layer_norm_12.b_0', 'layer_norm_12.b_0_beta1_pow_acc_0', 'layer_norm_12.b_0_beta2_pow_acc_0', 'layer_norm_12.b_0_moment1_0', 'layer_norm_12.b_0_moment2_0', 'layer_norm_12.tmp_2.scale', 'layer_norm_12.w_0', 'layer_norm_12.w_0_beta1_pow_acc_0', 'layer_norm_12.w_0_beta2_pow_acc_0', 'layer_norm_12.w_0_moment1_0', 'layer_norm_12.w_0_moment2_0', 'layer_norm_13.b_0', 'layer_norm_13.b_0_beta1_pow_acc_0', 'layer_norm_13.b_0_beta2_pow_acc_0', 'layer_norm_13.b_0_moment1_0', 'layer_norm_13.b_0_moment2_0', 'layer_norm_13.tmp_2.scale', 'layer_norm_13.w_0', 'layer_norm_13.w_0_beta1_pow_acc_0', 'layer_norm_13.w_0_beta2_pow_acc_0', 'layer_norm_13.w_0_moment1_0', 'layer_norm_13.w_0_moment2_0', 'layer_norm_14.b_0', 'layer_norm_14.b_0_beta1_pow_acc_0', 'layer_norm_14.b_0_beta2_pow_acc_0', 'layer_norm_14.b_0_moment1_0', 'layer_norm_14.b_0_moment2_0', 'layer_norm_14.tmp_2.scale', 'layer_norm_14.w_0', 'layer_norm_14.w_0_beta1_pow_acc_0', 'layer_norm_14.w_0_beta2_pow_acc_0', 'layer_norm_14.w_0_moment1_0', 'layer_norm_14.w_0_moment2_0', 'layer_norm_15.b_0', 'layer_norm_15.b_0_beta1_pow_acc_0', 'layer_norm_15.b_0_beta2_pow_acc_0', 'layer_norm_15.b_0_moment1_0', 'layer_norm_15.b_0_moment2_0', 'layer_norm_15.tmp_2.scale', 'layer_norm_15.w_0', 'layer_norm_15.w_0_beta1_pow_acc_0', 'layer_norm_15.w_0_beta2_pow_acc_0', 'layer_norm_15.w_0_moment1_0', 'layer_norm_15.w_0_moment2_0', 'layer_norm_16.b_0', 'layer_norm_16.b_0_beta1_pow_acc_0', 'layer_norm_16.b_0_beta2_pow_acc_0', 'layer_norm_16.b_0_moment1_0', 'layer_norm_16.b_0_moment2_0', 'layer_norm_16.tmp_2.scale', 'layer_norm_16.w_0', 'layer_norm_16.w_0_beta1_pow_acc_0', 'layer_norm_16.w_0_beta2_pow_acc_0', 'layer_norm_16.w_0_moment1_0', 'layer_norm_16.w_0_moment2_0', 'layer_norm_17.b_0', 'layer_norm_17.b_0_beta1_pow_acc_0', 'layer_norm_17.b_0_beta2_pow_acc_0', 'layer_norm_17.b_0_moment1_0', 'layer_norm_17.b_0_moment2_0', 'layer_norm_17.tmp_2.scale', 'layer_norm_17.w_0', 'layer_norm_17.w_0_beta1_pow_acc_0', 'layer_norm_17.w_0_beta2_pow_acc_0', 'layer_norm_17.w_0_moment1_0', 'layer_norm_17.w_0_moment2_0', 'layer_norm_18.b_0', 'layer_norm_18.b_0_beta1_pow_acc_0', 'layer_norm_18.b_0_beta2_pow_acc_0', 'layer_norm_18.b_0_moment1_0', 'layer_norm_18.b_0_moment2_0', 'layer_norm_18.tmp_2.scale', 'layer_norm_18.w_0', 'layer_norm_18.w_0_beta1_pow_acc_0', 'layer_norm_18.w_0_beta2_pow_acc_0', 'layer_norm_18.w_0_moment1_0', 'layer_norm_18.w_0_moment2_0', 'layer_norm_19.b_0', 'layer_norm_19.b_0_beta1_pow_acc_0', 'layer_norm_19.b_0_beta2_pow_acc_0', 'layer_norm_19.b_0_moment1_0', 'layer_norm_19.b_0_moment2_0', 'layer_norm_19.tmp_2.scale', 'layer_norm_19.w_0', 'layer_norm_19.w_0_beta1_pow_acc_0', 'layer_norm_19.w_0_beta2_pow_acc_0', 'layer_norm_19.w_0_moment1_0', 'layer_norm_19.w_0_moment2_0', 'layer_norm_2.b_0', 'layer_norm_2.b_0_beta1_pow_acc_0', 'layer_norm_2.b_0_beta2_pow_acc_0', 'layer_norm_2.b_0_moment1_0', 'layer_norm_2.b_0_moment2_0', 'layer_norm_2.tmp_2.scale', 'layer_norm_2.w_0', 'layer_norm_2.w_0_beta1_pow_acc_0', 'layer_norm_2.w_0_beta2_pow_acc_0', 'layer_norm_2.w_0_moment1_0', 'layer_norm_2.w_0_moment2_0', 'layer_norm_20.b_0', 'layer_norm_20.b_0_beta1_pow_acc_0', 'layer_norm_20.b_0_beta2_pow_acc_0', 'layer_norm_20.b_0_moment1_0', 'layer_norm_20.b_0_moment2_0', 'layer_norm_20.tmp_2.scale', 'layer_norm_20.w_0', 'layer_norm_20.w_0_beta1_pow_acc_0', 'layer_norm_20.w_0_beta2_pow_acc_0', 'layer_norm_20.w_0_moment1_0', 'layer_norm_20.w_0_moment2_0', 'layer_norm_21.b_0', 'layer_norm_21.b_0_beta1_pow_acc_0', 'layer_norm_21.b_0_beta2_pow_acc_0', 'layer_norm_21.b_0_moment1_0', 'layer_norm_21.b_0_moment2_0', 'layer_norm_21.tmp_2.scale', 'layer_norm_21.w_0', 'layer_norm_21.w_0_beta1_pow_acc_0', 'layer_norm_21.w_0_beta2_pow_acc_0', 'layer_norm_21.w_0_moment1_0', 'layer_norm_21.w_0_moment2_0', 'layer_norm_22.b_0', 'layer_norm_22.b_0_beta1_pow_acc_0', 'layer_norm_22.b_0_beta2_pow_acc_0', 'layer_norm_22.b_0_moment1_0', 'layer_norm_22.b_0_moment2_0', 'layer_norm_22.tmp_2.scale', 'layer_norm_22.w_0', 'layer_norm_22.w_0_beta1_pow_acc_0', 'layer_norm_22.w_0_beta2_pow_acc_0', 'layer_norm_22.w_0_moment1_0', 'layer_norm_22.w_0_moment2_0', 'layer_norm_23.b_0', 'layer_norm_23.b_0_beta1_pow_acc_0', 'layer_norm_23.b_0_beta2_pow_acc_0', 'layer_norm_23.b_0_moment1_0', 'layer_norm_23.b_0_moment2_0', 'layer_norm_23.tmp_2.scale', 'layer_norm_23.w_0', 'layer_norm_23.w_0_beta1_pow_acc_0', 'layer_norm_23.w_0_beta2_pow_acc_0', 'layer_norm_23.w_0_moment1_0', 'layer_norm_23.w_0_moment2_0', 'layer_norm_24.b_0', 'layer_norm_24.b_0_beta1_pow_acc_0', 'layer_norm_24.b_0_beta2_pow_acc_0', 'layer_norm_24.b_0_moment1_0', 'layer_norm_24.b_0_moment2_0', 'layer_norm_24.tmp_2.scale', 'layer_norm_24.w_0', 'layer_norm_24.w_0_beta1_pow_acc_0', 'layer_norm_24.w_0_beta2_pow_acc_0', 'layer_norm_24.w_0_moment1_0', 'layer_norm_24.w_0_moment2_0', 'layer_norm_25.b_0', 'layer_norm_25.b_0_beta1_pow_acc_0', 'layer_norm_25.b_0_beta2_pow_acc_0', 'layer_norm_25.b_0_moment1_0', 'layer_norm_25.b_0_moment2_0', 'layer_norm_25.tmp_2.scale', 'layer_norm_25.w_0', 'layer_norm_25.w_0_beta1_pow_acc_0', 'layer_norm_25.w_0_beta2_pow_acc_0', 'layer_norm_25.w_0_moment1_0', 'layer_norm_25.w_0_moment2_0', 'layer_norm_26.b_0', 'layer_norm_26.b_0_beta1_pow_acc_0', 'layer_norm_26.b_0_beta2_pow_acc_0', 'layer_norm_26.b_0_moment1_0', 'layer_norm_26.b_0_moment2_0', 'layer_norm_26.tmp_2.scale', 'layer_norm_26.w_0', 'layer_norm_26.w_0_beta1_pow_acc_0', 'layer_norm_26.w_0_beta2_pow_acc_0', 'layer_norm_26.w_0_moment1_0', 'layer_norm_26.w_0_moment2_0', 'layer_norm_27.b_0', 'layer_norm_27.b_0_beta1_pow_acc_0', 'layer_norm_27.b_0_beta2_pow_acc_0', 'layer_norm_27.b_0_moment1_0', 'layer_norm_27.b_0_moment2_0', 'layer_norm_27.tmp_2.scale', 'layer_norm_27.w_0', 'layer_norm_27.w_0_beta1_pow_acc_0', 'layer_norm_27.w_0_beta2_pow_acc_0', 'layer_norm_27.w_0_moment1_0', 'layer_norm_27.w_0_moment2_0', 'layer_norm_28.b_0', 'layer_norm_28.b_0_beta1_pow_acc_0', 'layer_norm_28.b_0_beta2_pow_acc_0', 'layer_norm_28.b_0_moment1_0', 'layer_norm_28.b_0_moment2_0', 'layer_norm_28.tmp_2.scale', 'layer_norm_28.w_0', 'layer_norm_28.w_0_beta1_pow_acc_0', 'layer_norm_28.w_0_beta2_pow_acc_0', 'layer_norm_28.w_0_moment1_0', 'layer_norm_28.w_0_moment2_0', 'layer_norm_29.b_0', 'layer_norm_29.b_0_beta1_pow_acc_0', 'layer_norm_29.b_0_beta2_pow_acc_0', 'layer_norm_29.b_0_moment1_0', 'layer_norm_29.b_0_moment2_0', 'layer_norm_29.tmp_2.scale', 'layer_norm_29.w_0', 'layer_norm_29.w_0_beta1_pow_acc_0', 'layer_norm_29.w_0_beta2_pow_acc_0', 'layer_norm_29.w_0_moment1_0', 'layer_norm_29.w_0_moment2_0', 'layer_norm_3.b_0', 'layer_norm_3.b_0_beta1_pow_acc_0', 'layer_norm_3.b_0_beta2_pow_acc_0', 'layer_norm_3.b_0_moment1_0', 'layer_norm_3.b_0_moment2_0', 'layer_norm_3.tmp_2.scale', 'layer_norm_3.w_0', 'layer_norm_3.w_0_beta1_pow_acc_0', 'layer_norm_3.w_0_beta2_pow_acc_0', 'layer_norm_3.w_0_moment1_0', 'layer_norm_3.w_0_moment2_0', 'layer_norm_30.b_0', 'layer_norm_30.b_0_beta1_pow_acc_0', 'layer_norm_30.b_0_beta2_pow_acc_0', 'layer_norm_30.b_0_moment1_0', 'layer_norm_30.b_0_moment2_0', 'layer_norm_30.tmp_2.scale', 'layer_norm_30.w_0', 'layer_norm_30.w_0_beta1_pow_acc_0', 'layer_norm_30.w_0_beta2_pow_acc_0', 'layer_norm_30.w_0_moment1_0', 'layer_norm_30.w_0_moment2_0', 'layer_norm_31.b_0', 'layer_norm_31.b_0_beta1_pow_acc_0', 'layer_norm_31.b_0_beta2_pow_acc_0', 'layer_norm_31.b_0_moment1_0', 'layer_norm_31.b_0_moment2_0', 'layer_norm_31.tmp_2.scale', 'layer_norm_31.w_0', 'layer_norm_31.w_0_beta1_pow_acc_0', 'layer_norm_31.w_0_beta2_pow_acc_0', 'layer_norm_31.w_0_moment1_0', 'layer_norm_31.w_0_moment2_0', 'layer_norm_4.b_0', 'layer_norm_4.b_0_beta1_pow_acc_0', 'layer_norm_4.b_0_beta2_pow_acc_0', 'layer_norm_4.b_0_moment1_0', 'layer_norm_4.b_0_moment2_0', 'layer_norm_4.tmp_2.scale', 'layer_norm_4.w_0', 'layer_norm_4.w_0_beta1_pow_acc_0', 'layer_norm_4.w_0_beta2_pow_acc_0', 'layer_norm_4.w_0_moment1_0', 'layer_norm_4.w_0_moment2_0', 'layer_norm_5.b_0', 'layer_norm_5.b_0_beta1_pow_acc_0', 'layer_norm_5.b_0_beta2_pow_acc_0', 'layer_norm_5.b_0_moment1_0', 'layer_norm_5.b_0_moment2_0', 'layer_norm_5.tmp_2.scale', 'layer_norm_5.w_0', 'layer_norm_5.w_0_beta1_pow_acc_0', 'layer_norm_5.w_0_beta2_pow_acc_0', 'layer_norm_5.w_0_moment1_0', 'layer_norm_5.w_0_moment2_0', 'layer_norm_6.b_0', 'layer_norm_6.b_0_beta1_pow_acc_0', 'layer_norm_6.b_0_beta2_pow_acc_0', 'layer_norm_6.b_0_moment1_0', 'layer_norm_6.b_0_moment2_0', 'layer_norm_6.tmp_2.scale', 'layer_norm_6.w_0', 'layer_norm_6.w_0_beta1_pow_acc_0', 'layer_norm_6.w_0_beta2_pow_acc_0', 'layer_norm_6.w_0_moment1_0', 'layer_norm_6.w_0_moment2_0', 'layer_norm_7.b_0', 'layer_norm_7.b_0_beta1_pow_acc_0', 'layer_norm_7.b_0_beta2_pow_acc_0', 'layer_norm_7.b_0_moment1_0', 'layer_norm_7.b_0_moment2_0', 'layer_norm_7.tmp_2.scale', 'layer_norm_7.w_0', 'layer_norm_7.w_0_beta1_pow_acc_0', 'layer_norm_7.w_0_beta2_pow_acc_0', 'layer_norm_7.w_0_moment1_0', 'layer_norm_7.w_0_moment2_0', 'layer_norm_8.b_0', 'layer_norm_8.b_0_beta1_pow_acc_0', 'layer_norm_8.b_0_beta2_pow_acc_0', 'layer_norm_8.b_0_moment1_0', 'layer_norm_8.b_0_moment2_0', 'layer_norm_8.tmp_2.scale', 'layer_norm_8.w_0', 'layer_norm_8.w_0_beta1_pow_acc_0', 'layer_norm_8.w_0_beta2_pow_acc_0', 'layer_norm_8.w_0_moment1_0', 'layer_norm_8.w_0_moment2_0', 'layer_norm_9.b_0', 'layer_norm_9.b_0_beta1_pow_acc_0', 'layer_norm_9.b_0_beta2_pow_acc_0', 'layer_norm_9.b_0_moment1_0', 'layer_norm_9.b_0_moment2_0', 'layer_norm_9.tmp_2.scale', 'layer_norm_9.w_0', 'layer_norm_9.w_0_beta1_pow_acc_0', 'layer_norm_9.w_0_beta2_pow_acc_0', 'layer_norm_9.w_0_moment1_0', 'layer_norm_9.w_0_moment2_0', 'src_pos_enc_table', 'src_word_emb_table', 'src_word_emb_table_beta1_pow_acc_0', 'src_word_emb_table_beta2_pow_acc_0', 'src_word_emb_table_moment1_0', 'src_word_emb_table_moment2_0', 'state_0', 'state_1', 'state_10', 'state_11', 'state_12', 'state_13', 'state_14', 'state_15', 'state_16', 'state_17', 'state_18', 'state_19', 'state_2', 'state_20', 'state_21', 'state_22', 'state_23', 'state_24', 'state_25', 'state_26', 'state_27', 'state_28', 'state_29', 'state_3', 'state_30', 'state_31', 'state_32', 'state_33', 'state_34', 'state_35', 'state_36', 'state_37', 'state_38', 'state_39', 'state_4', 'state_40', 'state_41', 'state_42', 'state_43', 'state_44', 'state_45', 'state_46', 'state_47', 'state_48', 'state_49', 'state_5', 'state_50', 'state_51', 'state_52', 'state_53', 'state_54', 'state_55', 'state_56', 'state_57', 'state_58', 'state_59', 'state_6', 'state_60', 'state_61', 'state_7', 'state_8', 'state_9', 'transpose_11.tmp_0.scale', 'transpose_15.tmp_0.scale', 'transpose_19.tmp_0.scale', 'transpose_23.tmp_0.scale', 'transpose_27.tmp_0.scale', 'transpose_3.tmp_0.scale', 'transpose_31.tmp_0.scale', 'transpose_35.tmp_0.scale', 'transpose_39.tmp_0.scale', 'transpose_43.tmp_0.scale', 'transpose_47.tmp_0.scale', 'transpose_51.tmp_0.scale', 'transpose_55.tmp_0.scale', 'transpose_59.tmp_0.scale', 'transpose_63.tmp_0.scale', 'transpose_67.tmp_0.scale', 'transpose_7.tmp_0.scale', 'transpose_71.tmp_0.scale', 'trg_pos_enc_table', 'trg_word_emb_table', 'trg_word_emb_table_beta1_pow_acc_0', 'trg_word_emb_table_beta2_pow_acc_0', 'trg_word_emb_table_moment1_0', 'trg_word_emb_table_moment2_0'], name: "reduce_sum_0.tmp_0"
type {
type: LOD_TENSOR
lod_tensor {
tensor {
data_type: FP32
dims: 1
}
}
}
persistable: true
, <paddle.fluid.core_avx._Scope object at 0x7f2fbb6eaab0>, [], <paddle.fluid.core_avx.ParallelExecutor.ExecutionStrategy object at 0x7f2fa851a6f8>, <paddle.fluid.core_avx.ParallelExecutor.BuildStrategy object at 0x7f2fa81396c0>, <paddle.fluid.core_avx.Graph object at 0x7f2fa81bc9d0>
from paddleslim.
可以把代码贴的多一点吗?
from paddleslim.
好的,整理了一下,贴上
import argparse
import ast
import copy
import logging
import multiprocessing
import subprocess
import os
import six
import sys
import time
import numpy as np
import paddle.fluid as fluid
import base_reader
from config import *
import data_rader
from model import transformer
import dist_utils
import shutil
import quant.quanter as quant
def train_loop(exe,
train_prog,
startup_prog,
dev_count,
sum_cost,
avg_cost,
token_num,
pyreader):
# Initialize the parameters.
train_data, _ = data_rader.prepare_data_generator(
args,
is_test=False,
count=dev_count,
pyreader=pyreader,
py_reader_provider_wrapper=data_rader.py_reader_provider_wrapper)
# For faster executor
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = int(args.fetch_steps)
build_strategy = fluid.BuildStrategy()
build_strategy.memory_optimize = False
build_strategy.enable_inplace = True
sum_cost.persistable = True
token_num.persistable = True
# Since the token number differs among devices, customize gradient scale to
# use token average cost among multi-devices. and the gradient scale is
# `1 / token_number` for average cost.
# build_strategy.gradient_scale_strategy = fluid.BuildStrategy.GradientScaleStrategy.Customized
build_strategy.fuse_all_optimizer_ops = True
build_strategy.fuse_all_reduce_ops = False
build_strategy.sync_batch_norm = False
if num_trainers > 1 and args.use_py_reader and TrainTaskConfig.use_gpu:
dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
exec_strategy.num_threads = 1
train_exe = fluid.Executor(exe.place)
if args.use_py_reader:
pyreader.start()
data_generator = None
else:
data_generator = train_data()
batch_id = 0
while True:
try:
feed_dict_list = data_rader.prepare_feed_dict_list(data_generator,
init_flag, dev_count)
train_cp = train_prog.with_data_parallel(
loss_name=sum_cost,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
outs = train_exe.run(train_cp,
fetch_list=[sum_cost.name, token_num.name],
feed=feed_dict_list)
batch_id += 1
step_idx += 1
def train(args):
gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
place = fluid.CUDAPlace(gpu_id)
dev_count = get_device_num() #= 3
exe = fluid.Executor(place)
train_prog = fluid.Program()
startup_prog = fluid.Program()
if args.enable_ce:
train_prog.random_seed = 1000
startup_prog.random_seed = 1000
with fluid.program_guard(train_prog, startup_prog):
with fluid.unique_name.guard():
sum_cost, avg_cost, predict, token_num, pyreader = transformer( #网络
ModelHyperParams.src_vocab_size,
ModelHyperParams.trg_vocab_size,
ModelHyperParams.max_length + 1,
ModelHyperParams.n_layer,
ModelHyperParams.n_head,
ModelHyperParams.d_key,
ModelHyperParams.d_value,
ModelHyperParams.d_model,
ModelHyperParams.d_inner_hid,
ModelHyperParams.prepostprocess_dropout,
ModelHyperParams.attention_dropout,
ModelHyperParams.relu_dropout,
ModelHyperParams.preprocess_cmd,
ModelHyperParams.postprocess_cmd,
ModelHyperParams.weight_sharing,
TrainTaskConfig.label_smooth_eps,
ModelHyperParams.bos_idx,
use_py_reader=args.use_py_reader,
is_test=False)
if args.sync:
lr_decay = fluid.layers.learning_rate_scheduler.noam_decay(
ModelHyperParams.d_model, TrainTaskConfig.warmup_steps)
logging.info("before adam")
with fluid.default_main_program()._lr_schedule_guard():
learning_rate = lr_decay * TrainTaskConfig.learning_rate
optimizer = fluid.optimizer.Adam(
learning_rate=learning_rate,
beta1=TrainTaskConfig.beta1,
beta2=TrainTaskConfig.beta2,
epsilon=TrainTaskConfig.eps)
else:
optimizer = fluid.optimizer.SGD(0.003)
optimizer.minimize(avg_cost)
exe.run(startup_prog) # to init pyreader for training
#load 预训练模型参数
fluid.io.load_persistables(
exe, TrainTaskConfig.ckpt_path, main_program=train_prog)
else:
logging.info("init fluid.framework.default_startup_program")
exe.run(startup_prog)
#量化
quant_program = quant.quant_aware(train_prog, exe.place, for_test=False)
startup_prog1 = quant_program
#量化后训练
train_loop(exe, quant_program, startup_prog1, dev_count, sum_cost, avg_cost,
token_num, pyreader)
if name == "main":
train(args)
from paddleslim.
把下面的代码
train_cp = train_prog.with_data_parallel(
loss_name=sum_cost,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
移到while 循环外试试
from paddleslim.
移到while 循环外还是那样提示
from paddleslim.
移到while 循环外还是那样提示
paddle是什么版本?
from paddleslim.
1.7.0
from paddleslim.
Related Issues (20)
- 自动压缩中的蒸馏损失为多个时,配置文件要怎么设置? HOT 2
- act 自动压缩pytorch_yolo实例中,python onnx--> tensorrt int8推理结果异常 HOT 1
- AttributeError: module 'paddleslim' has no attribute 'models' HOT 1
- 配置pruner的时候显示 0 collections
- 新年好!静态模型离线量化wav2lip,结束后发现没有所谓的 scale 文件 HOT 5
- 如何使用sensitive来确定yolov3 mobilenetv3的剪枝率呀? HOT 9
- 想问一下你们刚更新的目标检测模型离线量化示例,它支持旋转框吗?例如ppyoloe-r HOT 4
- [Bug]TypeError: 'float' object is not iterable HOT 1
- 有考虑过新增人脸检测模型的压缩例程嘛? HOT 3
- 请问出现这种情况的原因会是什么?# 自动压缩 autoCompression HOT 4
- 使用paddleslim模型动态剪枝后,如何保存模型呢
- 文档中提供的自动压缩后RT-DETR模型的准确率很低 HOT 5
- 如何固定Softmax这个op的量化参数 HOT 5
- 关于自动压缩yolov8-s,run.py的时候出错。 HOT 3
- 报错:var_tensor.shape[0],tuple index out of range HOT 1
- paddleslim量化的模型如何使用openvino进行推理 HOT 1
- rtdetr进行自动压缩过程中报错如下 HOT 2
- rtdetr nms false HOT 1
- AttributeError HOT 2
- from paddleslim.dygraph.dist中没有import AdaptorBase HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddleslim.