Giter VIP home page Giter VIP logo

Comments (8)

slf12 avatar slf12 commented on May 11, 2024

执行
quant_program = quant.quant_aware(train_prog, exe.place, for_test=False)
val_program = fluid.default_main_program().clone(for_test=True)
再训练quant_program,发觉不能用fluid.ParallelExecutor,提示
AttributeError: 'CompiledProgram' object has no attribute '_enable_dgc'

quant_program的类型是CompiledProgram, 可以用CompiledProgram.with_data_parallel来多卡并行。和ParallelExecutor是不一样的方式,但是功能都是实现多卡并行

from paddleslim.

dlkht avatar dlkht commented on May 11, 2024

用CompiledProgram.with_data_parallel 多卡训练,run部分代码如下
train_cp = train_prog.with_data_parallel(
loss_name=sum_cost,
build_strategy=build_strategy,
exec_strategy=exec_strategy)

            outs = train_exe.run(train_cp,
                fetch_list=[sum_cost.name, token_num.name],
                feed=feed_dict_list)

执行后出现如下错误提示,会是什么原因呢?

Traceback (most recent call last):
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/pydevd.py", line 1664, in
main()
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/qzy3/pycharm1/pycharm-community-2018.1.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/qzy3/transformer/ver7-len80/transf-quant-1/train.py", line 547, in
train(args)
File "/home/qzy3/transformer/ver7-len80/transf-quant-1/train.py", line 534, in train
token_num, pyreader)
File "/home/qzy3/transformer/ver7-len80/transf-quant-1/train.py", line 368, in train_loop
feed=feed_dict_list)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 783, in run
six.reraise(*sys.exc_info())
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 778, in run
use_program_cache=use_program_cache)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/executor.py", line 833, in _run_impl
program._compile(scope, self.place)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/compiler.py", line 424, in _compile
places=self._places)
File "/home/qzy3/.local/lib/python3.7/site-packages/paddle/fluid/compiler.py", line 377, in _compile_data_parallel
self._exec_strategy, self._build_strategy, self._graph)
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. paddle.fluid.core_avx.ParallelExecutor(arg0: List[paddle.fluid.core_avx.Place], arg1: List[str], arg2: str, arg3: paddle.fluid.core_avx._Scope, arg4: List[paddle.fluid.core_avx._Scope], arg5: paddle.fluid.core_avx.ParallelExecutor.ExecutionStrategy, arg6: paddle.fluid.core_avx.ParallelExecutor.BuildStrategy, arg7: paddle::framework::ir::Graph)

Invoked with: [<paddle.fluid.core_avx.Place object at 0x7f2eab27ff10>, <paddle.fluid.core_avx.Place object at 0x7f2eab27fed8>, <paddle.fluid.core_avx.Place object at 0x7f2eab27fea0>], ['@LR_DECAY_COUNTER@', 'accum_0', 'accum_1', 'accum_10', 'accum_11', 'accum_12', 'accum_13', 'accum_14', 'accum_15', 'accum_16', 'accum_17', 'accum_18', 'accum_19', 'accum_2', 'accum_20', 'accum_21', 'accum_22', 'accum_23', 'accum_24', 'accum_25', 'accum_26', 'accum_27', 'accum_28', 'accum_29', 'accum_3', 'accum_30', 'accum_31', 'accum_32', 'accum_33', 'accum_34', 'accum_35', 'accum_36', 'accum_37', 'accum_38', 'accum_39', 'accum_4', 'accum_40', 'accum_41', 'accum_42', 'accum_43', 'accum_44', 'accum_45', 'accum_46', 'accum_47', 'accum_48', 'accum_49', 'accum_5', 'accum_50', 'accum_51', 'accum_52', 'accum_53', 'accum_54', 'accum_55', 'accum_56', 'accum_57', 'accum_58', 'accum_59', 'accum_6', 'accum_60', 'accum_61', 'accum_7', 'accum_8', 'accum_9', 'dropout_11.tmp_0.scale', 'dropout_15.tmp_0.scale', 'dropout_19.tmp_0.scale', 'dropout_23.tmp_0.scale', 'dropout_3.tmp_0.scale', 'dropout_30.tmp_0.scale', 'dropout_36.tmp_0.scale', 'dropout_42.tmp_0.scale', 'dropout_48.tmp_0.scale', 'dropout_54.tmp_0.scale', 'dropout_60.tmp_0.scale', 'dropout_7.tmp_0.scale', 'fc_0.w_0', 'fc_0.w_0_beta1_pow_acc_0', 'fc_0.w_0_beta2_pow_acc_0', 'fc_0.w_0_moment1_0', 'fc_0.w_0_moment2_0', 'fc_1.w_0', 'fc_1.w_0_beta1_pow_acc_0', 'fc_1.w_0_beta2_pow_acc_0', 'fc_1.w_0_moment1_0', 'fc_1.w_0_moment2_0', 'fc_10.b_0', 'fc_10.b_0_beta1_pow_acc_0', 'fc_10.b_0_beta2_pow_acc_0', 'fc_10.b_0_moment1_0', 'fc_10.b_0_moment2_0', 'fc_10.w_0', 'fc_10.w_0_beta1_pow_acc_0', 'fc_10.w_0_beta2_pow_acc_0', 'fc_10.w_0_moment1_0', 'fc_10.w_0_moment2_0', 'fc_11.b_0', 'fc_11.b_0_beta1_pow_acc_0', 'fc_11.b_0_beta2_pow_acc_0', 'fc_11.b_0_moment1_0', 'fc_11.b_0_moment2_0', 'fc_11.w_0', 'fc_11.w_0_beta1_pow_acc_0', 'fc_11.w_0_beta2_pow_acc_0', 'fc_11.w_0_moment1_0', 'fc_11.w_0_moment2_0', 'fc_12.w_0', 'fc_12.w_0_beta1_pow_acc_0', 'fc_12.w_0_beta2_pow_acc_0', 'fc_12.w_0_moment1_0', 'fc_12.w_0_moment2_0', 'fc_13.w_0', 'fc_13.w_0_beta1_pow_acc_0', 'fc_13.w_0_beta2_pow_acc_0', 'fc_13.w_0_moment1_0', 'fc_13.w_0_moment2_0', 'fc_14.w_0', 'fc_14.w_0_beta1_pow_acc_0', 'fc_14.w_0_beta2_pow_acc_0', 'fc_14.w_0_moment1_0', 'fc_14.w_0_moment2_0', 'fc_15.w_0', 'fc_15.w_0_beta1_pow_acc_0', 'fc_15.w_0_beta2_pow_acc_0', 'fc_15.w_0_moment1_0', 'fc_15.w_0_moment2_0', 'fc_16.b_0', 'fc_16.b_0_beta1_pow_acc_0', 'fc_16.b_0_beta2_pow_acc_0', 'fc_16.b_0_moment1_0', 'fc_16.b_0_moment2_0', 'fc_16.w_0', 'fc_16.w_0_beta1_pow_acc_0', 'fc_16.w_0_beta2_pow_acc_0', 'fc_16.w_0_moment1_0', 'fc_16.w_0_moment2_0', 'fc_17.b_0', 'fc_17.b_0_beta1_pow_acc_0', 'fc_17.b_0_beta2_pow_acc_0', 'fc_17.b_0_moment1_0', 'fc_17.b_0_moment2_0', 'fc_17.w_0', 'fc_17.w_0_beta1_pow_acc_0', 'fc_17.w_0_beta2_pow_acc_0', 'fc_17.w_0_moment1_0', 'fc_17.w_0_moment2_0', 'fc_18.w_0', 'fc_18.w_0_beta1_pow_acc_0', 'fc_18.w_0_beta2_pow_acc_0', 'fc_18.w_0_moment1_0', 'fc_18.w_0_moment2_0', 'fc_19.w_0', 'fc_19.w_0_beta1_pow_acc_0', 'fc_19.w_0_beta2_pow_acc_0', 'fc_19.w_0_moment1_0', 'fc_19.w_0_moment2_0', 'fc_2.w_0', 'fc_2.w_0_beta1_pow_acc_0', 'fc_2.w_0_beta2_pow_acc_0', 'fc_2.w_0_moment1_0', 'fc_2.w_0_moment2_0', 'fc_20.w_0', 'fc_20.w_0_beta1_pow_acc_0', 'fc_20.w_0_beta2_pow_acc_0', 'fc_20.w_0_moment1_0', 'fc_20.w_0_moment2_0', 'fc_21.w_0', 'fc_21.w_0_beta1_pow_acc_0', 'fc_21.w_0_beta2_pow_acc_0', 'fc_21.w_0_moment1_0', 'fc_21.w_0_moment2_0', 'fc_22.b_0', 'fc_22.b_0_beta1_pow_acc_0', 'fc_22.b_0_beta2_pow_acc_0', 'fc_22.b_0_moment1_0', 'fc_22.b_0_moment2_0', 'fc_22.w_0', 'fc_22.w_0_beta1_pow_acc_0', 'fc_22.w_0_beta2_pow_acc_0', 'fc_22.w_0_moment1_0', 'fc_22.w_0_moment2_0', 'fc_23.b_0', 'fc_23.b_0_beta1_pow_acc_0', 'fc_23.b_0_beta2_pow_acc_0', 'fc_23.b_0_moment1_0', 'fc_23.b_0_moment2_0', 'fc_23.w_0', 'fc_23.w_0_beta1_pow_acc_0', 'fc_23.w_0_beta2_pow_acc_0', 'fc_23.w_0_moment1_0', 'fc_23.w_0_moment2_0', 'fc_24.w_0', 'fc_24.w_0_beta1_pow_acc_0', 'fc_24.w_0_beta2_pow_acc_0', 'fc_24.w_0_moment1_0', 'fc_24.w_0_moment2_0', 'fc_25.w_0', 'fc_25.w_0_beta1_pow_acc_0', 'fc_25.w_0_beta2_pow_acc_0', 'fc_25.w_0_moment1_0', 'fc_25.w_0_moment2_0', 'fc_26.w_0', 'fc_26.w_0_beta1_pow_acc_0', 'fc_26.w_0_beta2_pow_acc_0', 'fc_26.w_0_moment1_0', 'fc_26.w_0_moment2_0', 'fc_27.w_0', 'fc_27.w_0_beta1_pow_acc_0', 'fc_27.w_0_beta2_pow_acc_0', 'fc_27.w_0_moment1_0', 'fc_27.w_0_moment2_0', 'fc_28.b_0', 'fc_28.b_0_beta1_pow_acc_0', 'fc_28.b_0_beta2_pow_acc_0', 'fc_28.b_0_moment1_0', 'fc_28.b_0_moment2_0', 'fc_28.w_0', 'fc_28.w_0_beta1_pow_acc_0', 'fc_28.w_0_beta2_pow_acc_0', 'fc_28.w_0_moment1_0', 'fc_28.w_0_moment2_0', 'fc_29.b_0', 'fc_29.b_0_beta1_pow_acc_0', 'fc_29.b_0_beta2_pow_acc_0', 'fc_29.b_0_moment1_0', 'fc_29.b_0_moment2_0', 'fc_29.w_0', 'fc_29.w_0_beta1_pow_acc_0', 'fc_29.w_0_beta2_pow_acc_0', 'fc_29.w_0_moment1_0', 'fc_29.w_0_moment2_0', 'fc_3.w_0', 'fc_3.w_0_beta1_pow_acc_0', 'fc_3.w_0_beta2_pow_acc_0', 'fc_3.w_0_moment1_0', 'fc_3.w_0_moment2_0', 'fc_30.w_0', 'fc_30.w_0_beta1_pow_acc_0', 'fc_30.w_0_beta2_pow_acc_0', 'fc_30.w_0_moment1_0', 'fc_30.w_0_moment2_0', 'fc_31.w_0', 'fc_31.w_0_beta1_pow_acc_0', 'fc_31.w_0_beta2_pow_acc_0', 'fc_31.w_0_moment1_0', 'fc_31.w_0_moment2_0', 'fc_32.w_0', 'fc_32.w_0_beta1_pow_acc_0', 'fc_32.w_0_beta2_pow_acc_0', 'fc_32.w_0_moment1_0', 'fc_32.w_0_moment2_0', 'fc_33.w_0', 'fc_33.w_0_beta1_pow_acc_0', 'fc_33.w_0_beta2_pow_acc_0', 'fc_33.w_0_moment1_0', 'fc_33.w_0_moment2_0', 'fc_34.b_0', 'fc_34.b_0_beta1_pow_acc_0', 'fc_34.b_0_beta2_pow_acc_0', 'fc_34.b_0_moment1_0', 'fc_34.b_0_moment2_0', 'fc_34.w_0', 'fc_34.w_0_beta1_pow_acc_0', 'fc_34.w_0_beta2_pow_acc_0', 'fc_34.w_0_moment1_0', 'fc_34.w_0_moment2_0', 'fc_35.b_0', 'fc_35.b_0_beta1_pow_acc_0', 'fc_35.b_0_beta2_pow_acc_0', 'fc_35.b_0_moment1_0', 'fc_35.b_0_moment2_0', 'fc_35.w_0', 'fc_35.w_0_beta1_pow_acc_0', 'fc_35.w_0_beta2_pow_acc_0', 'fc_35.w_0_moment1_0', 'fc_35.w_0_moment2_0', 'fc_36.w_0', 'fc_36.w_0_beta1_pow_acc_0', 'fc_36.w_0_beta2_pow_acc_0', 'fc_36.w_0_moment1_0', 'fc_36.w_0_moment2_0', 'fc_37.w_0', 'fc_37.w_0_beta1_pow_acc_0', 'fc_37.w_0_beta2_pow_acc_0', 'fc_37.w_0_moment1_0', 'fc_37.w_0_moment2_0', 'fc_38.w_0', 'fc_38.w_0_beta1_pow_acc_0', 'fc_38.w_0_beta2_pow_acc_0', 'fc_38.w_0_moment1_0', 'fc_38.w_0_moment2_0', 'fc_39.w_0', 'fc_39.w_0_beta1_pow_acc_0', 'fc_39.w_0_beta2_pow_acc_0', 'fc_39.w_0_moment1_0', 'fc_39.w_0_moment2_0', 'fc_4.b_0', 'fc_4.b_0_beta1_pow_acc_0', 'fc_4.b_0_beta2_pow_acc_0', 'fc_4.b_0_moment1_0', 'fc_4.b_0_moment2_0', 'fc_4.w_0', 'fc_4.w_0_beta1_pow_acc_0', 'fc_4.w_0_beta2_pow_acc_0', 'fc_4.w_0_moment1_0', 'fc_4.w_0_moment2_0', 'fc_40.w_0', 'fc_40.w_0_beta1_pow_acc_0', 'fc_40.w_0_beta2_pow_acc_0', 'fc_40.w_0_moment1_0', 'fc_40.w_0_moment2_0', 'fc_41.w_0', 'fc_41.w_0_beta1_pow_acc_0', 'fc_41.w_0_beta2_pow_acc_0', 'fc_41.w_0_moment1_0', 'fc_41.w_0_moment2_0', 'fc_42.w_0', 'fc_42.w_0_beta1_pow_acc_0', 'fc_42.w_0_beta2_pow_acc_0', 'fc_42.w_0_moment1_0', 'fc_42.w_0_moment2_0', 'fc_43.w_0', 'fc_43.w_0_beta1_pow_acc_0', 'fc_43.w_0_beta2_pow_acc_0', 'fc_43.w_0_moment1_0', 'fc_43.w_0_moment2_0', 'fc_44.b_0', 'fc_44.b_0_beta1_pow_acc_0', 'fc_44.b_0_beta2_pow_acc_0', 'fc_44.b_0_moment1_0', 'fc_44.b_0_moment2_0', 'fc_44.w_0', 'fc_44.w_0_beta1_pow_acc_0', 'fc_44.w_0_beta2_pow_acc_0', 'fc_44.w_0_moment1_0', 'fc_44.w_0_moment2_0', 'fc_45.b_0', 'fc_45.b_0_beta1_pow_acc_0', 'fc_45.b_0_beta2_pow_acc_0', 'fc_45.b_0_moment1_0', 'fc_45.b_0_moment2_0', 'fc_45.w_0', 'fc_45.w_0_beta1_pow_acc_0', 'fc_45.w_0_beta2_pow_acc_0', 'fc_45.w_0_moment1_0', 'fc_45.w_0_moment2_0', 'fc_46.w_0', 'fc_46.w_0_beta1_pow_acc_0', 'fc_46.w_0_beta2_pow_acc_0', 'fc_46.w_0_moment1_0', 'fc_46.w_0_moment2_0', 'fc_47.w_0', 'fc_47.w_0_beta1_pow_acc_0', 'fc_47.w_0_beta2_pow_acc_0', 'fc_47.w_0_moment1_0', 'fc_47.w_0_moment2_0', 'fc_48.w_0', 'fc_48.w_0_beta1_pow_acc_0', 'fc_48.w_0_beta2_pow_acc_0', 'fc_48.w_0_moment1_0', 'fc_48.w_0_moment2_0', 'fc_49.w_0', 'fc_49.w_0_beta1_pow_acc_0', 'fc_49.w_0_beta2_pow_acc_0', 'fc_49.w_0_moment1_0', 'fc_49.w_0_moment2_0', 'fc_5.b_0', 'fc_5.b_0_beta1_pow_acc_0', 'fc_5.b_0_beta2_pow_acc_0', 'fc_5.b_0_moment1_0', 'fc_5.b_0_moment2_0', 'fc_5.w_0', 'fc_5.w_0_beta1_pow_acc_0', 'fc_5.w_0_beta2_pow_acc_0', 'fc_5.w_0_moment1_0', 'fc_5.w_0_moment2_0', 'fc_50.w_0', 'fc_50.w_0_beta1_pow_acc_0', 'fc_50.w_0_beta2_pow_acc_0', 'fc_50.w_0_moment1_0', 'fc_50.w_0_moment2_0', 'fc_51.w_0', 'fc_51.w_0_beta1_pow_acc_0', 'fc_51.w_0_beta2_pow_acc_0', 'fc_51.w_0_moment1_0', 'fc_51.w_0_moment2_0', 'fc_52.w_0', 'fc_52.w_0_beta1_pow_acc_0', 'fc_52.w_0_beta2_pow_acc_0', 'fc_52.w_0_moment1_0', 'fc_52.w_0_moment2_0', 'fc_53.w_0', 'fc_53.w_0_beta1_pow_acc_0', 'fc_53.w_0_beta2_pow_acc_0', 'fc_53.w_0_moment1_0', 'fc_53.w_0_moment2_0', 'fc_54.b_0', 'fc_54.b_0_beta1_pow_acc_0', 'fc_54.b_0_beta2_pow_acc_0', 'fc_54.b_0_moment1_0', 'fc_54.b_0_moment2_0', 'fc_54.w_0', 'fc_54.w_0_beta1_pow_acc_0', 'fc_54.w_0_beta2_pow_acc_0', 'fc_54.w_0_moment1_0', 'fc_54.w_0_moment2_0', 'fc_55.b_0', 'fc_55.b_0_beta1_pow_acc_0', 'fc_55.b_0_beta2_pow_acc_0', 'fc_55.b_0_moment1_0', 'fc_55.b_0_moment2_0', 'fc_55.w_0', 'fc_55.w_0_beta1_pow_acc_0', 'fc_55.w_0_beta2_pow_acc_0', 'fc_55.w_0_moment1_0', 'fc_55.w_0_moment2_0', 'fc_56.w_0', 'fc_56.w_0_beta1_pow_acc_0', 'fc_56.w_0_beta2_pow_acc_0', 'fc_56.w_0_moment1_0', 'fc_56.w_0_moment2_0', 'fc_57.w_0', 'fc_57.w_0_beta1_pow_acc_0', 'fc_57.w_0_beta2_pow_acc_0', 'fc_57.w_0_moment1_0', 'fc_57.w_0_moment2_0', 'fc_58.w_0', 'fc_58.w_0_beta1_pow_acc_0', 'fc_58.w_0_beta2_pow_acc_0', 'fc_58.w_0_moment1_0', 'fc_58.w_0_moment2_0', 'fc_59.w_0', 'fc_59.w_0_beta1_pow_acc_0', 'fc_59.w_0_beta2_pow_acc_0', 'fc_59.w_0_moment1_0', 'fc_59.w_0_moment2_0', 'fc_6.w_0', 'fc_6.w_0_beta1_pow_acc_0', 'fc_6.w_0_beta2_pow_acc_0', 'fc_6.w_0_moment1_0', 'fc_6.w_0_moment2_0', 'fc_60.w_0', 'fc_60.w_0_beta1_pow_acc_0', 'fc_60.w_0_beta2_pow_acc_0', 'fc_60.w_0_moment1_0', 'fc_60.w_0_moment2_0', 'fc_61.w_0', 'fc_61.w_0_beta1_pow_acc_0', 'fc_61.w_0_beta2_pow_acc_0', 'fc_61.w_0_moment1_0', 'fc_61.w_0_moment2_0', 'fc_62.w_0', 'fc_62.w_0_beta1_pow_acc_0', 'fc_62.w_0_beta2_pow_acc_0', 'fc_62.w_0_moment1_0', 'fc_62.w_0_moment2_0', 'fc_63.w_0', 'fc_63.w_0_beta1_pow_acc_0', 'fc_63.w_0_beta2_pow_acc_0', 'fc_63.w_0_moment1_0', 'fc_63.w_0_moment2_0', 'fc_64.b_0', 'fc_64.b_0_beta1_pow_acc_0', 'fc_64.b_0_beta2_pow_acc_0', 'fc_64.b_0_moment1_0', 'fc_64.b_0_moment2_0', 'fc_64.w_0', 'fc_64.w_0_beta1_pow_acc_0', 'fc_64.w_0_beta2_pow_acc_0', 'fc_64.w_0_moment1_0', 'fc_64.w_0_moment2_0', 'fc_65.b_0', 'fc_65.b_0_beta1_pow_acc_0', 'fc_65.b_0_beta2_pow_acc_0', 'fc_65.b_0_moment1_0', 'fc_65.b_0_moment2_0', 'fc_65.w_0', 'fc_65.w_0_beta1_pow_acc_0', 'fc_65.w_0_beta2_pow_acc_0', 'fc_65.w_0_moment1_0', 'fc_65.w_0_moment2_0', 'fc_66.w_0', 'fc_66.w_0_beta1_pow_acc_0', 'fc_66.w_0_beta2_pow_acc_0', 'fc_66.w_0_moment1_0', 'fc_66.w_0_moment2_0', 'fc_67.w_0', 'fc_67.w_0_beta1_pow_acc_0', 'fc_67.w_0_beta2_pow_acc_0', 'fc_67.w_0_moment1_0', 'fc_67.w_0_moment2_0', 'fc_68.w_0', 'fc_68.w_0_beta1_pow_acc_0', 'fc_68.w_0_beta2_pow_acc_0', 'fc_68.w_0_moment1_0', 'fc_68.w_0_moment2_0', 'fc_69.w_0', 'fc_69.w_0_beta1_pow_acc_0', 'fc_69.w_0_beta2_pow_acc_0', 'fc_69.w_0_moment1_0', 'fc_69.w_0_moment2_0', 'fc_7.w_0', 'fc_7.w_0_beta1_pow_acc_0', 'fc_7.w_0_beta2_pow_acc_0', 'fc_7.w_0_moment1_0', 'fc_7.w_0_moment2_0', 'fc_70.w_0', 'fc_70.w_0_beta1_pow_acc_0', 'fc_70.w_0_beta2_pow_acc_0', 'fc_70.w_0_moment1_0', 'fc_70.w_0_moment2_0', 'fc_71.w_0', 'fc_71.w_0_beta1_pow_acc_0', 'fc_71.w_0_beta2_pow_acc_0', 'fc_71.w_0_moment1_0', 'fc_71.w_0_moment2_0', 'fc_72.w_0', 'fc_72.w_0_beta1_pow_acc_0', 'fc_72.w_0_beta2_pow_acc_0', 'fc_72.w_0_moment1_0', 'fc_72.w_0_moment2_0', 'fc_73.w_0', 'fc_73.w_0_beta1_pow_acc_0', 'fc_73.w_0_beta2_pow_acc_0', 'fc_73.w_0_moment1_0', 'fc_73.w_0_moment2_0', 'fc_74.b_0', 'fc_74.b_0_beta1_pow_acc_0', 'fc_74.b_0_beta2_pow_acc_0', 'fc_74.b_0_moment1_0', 'fc_74.b_0_moment2_0', 'fc_74.w_0', 'fc_74.w_0_beta1_pow_acc_0', 'fc_74.w_0_beta2_pow_acc_0', 'fc_74.w_0_moment1_0', 'fc_74.w_0_moment2_0', 'fc_75.b_0', 'fc_75.b_0_beta1_pow_acc_0', 'fc_75.b_0_beta2_pow_acc_0', 'fc_75.b_0_moment1_0', 'fc_75.b_0_moment2_0', 'fc_75.w_0', 'fc_75.w_0_beta1_pow_acc_0', 'fc_75.w_0_beta2_pow_acc_0', 'fc_75.w_0_moment1_0', 'fc_75.w_0_moment2_0', 'fc_76.w_0', 'fc_76.w_0_beta1_pow_acc_0', 'fc_76.w_0_beta2_pow_acc_0', 'fc_76.w_0_moment1_0', 'fc_76.w_0_moment2_0', 'fc_77.w_0', 'fc_77.w_0_beta1_pow_acc_0', 'fc_77.w_0_beta2_pow_acc_0', 'fc_77.w_0_moment1_0', 'fc_77.w_0_moment2_0', 'fc_78.w_0', 'fc_78.w_0_beta1_pow_acc_0', 'fc_78.w_0_beta2_pow_acc_0', 'fc_78.w_0_moment1_0', 'fc_78.w_0_moment2_0', 'fc_79.w_0', 'fc_79.w_0_beta1_pow_acc_0', 'fc_79.w_0_beta2_pow_acc_0', 'fc_79.w_0_moment1_0', 'fc_79.w_0_moment2_0', 'fc_8.w_0', 'fc_8.w_0_beta1_pow_acc_0', 'fc_8.w_0_beta2_pow_acc_0', 'fc_8.w_0_moment1_0', 'fc_8.w_0_moment2_0', 'fc_80.w_0', 'fc_80.w_0_beta1_pow_acc_0', 'fc_80.w_0_beta2_pow_acc_0', 'fc_80.w_0_moment1_0', 'fc_80.w_0_moment2_0', 'fc_81.w_0', 'fc_81.w_0_beta1_pow_acc_0', 'fc_81.w_0_beta2_pow_acc_0', 'fc_81.w_0_moment1_0', 'fc_81.w_0_moment2_0', 'fc_82.w_0', 'fc_82.w_0_beta1_pow_acc_0', 'fc_82.w_0_beta2_pow_acc_0', 'fc_82.w_0_moment1_0', 'fc_82.w_0_moment2_0', 'fc_83.w_0', 'fc_83.w_0_beta1_pow_acc_0', 'fc_83.w_0_beta2_pow_acc_0', 'fc_83.w_0_moment1_0', 'fc_83.w_0_moment2_0', 'fc_84.b_0', 'fc_84.b_0_beta1_pow_acc_0', 'fc_84.b_0_beta2_pow_acc_0', 'fc_84.b_0_moment1_0', 'fc_84.b_0_moment2_0', 'fc_84.w_0', 'fc_84.w_0_beta1_pow_acc_0', 'fc_84.w_0_beta2_pow_acc_0', 'fc_84.w_0_moment1_0', 'fc_84.w_0_moment2_0', 'fc_85.b_0', 'fc_85.b_0_beta1_pow_acc_0', 'fc_85.b_0_beta2_pow_acc_0', 'fc_85.b_0_moment1_0', 'fc_85.b_0_moment2_0', 'fc_85.w_0', 'fc_85.w_0_beta1_pow_acc_0', 'fc_85.w_0_beta2_pow_acc_0', 'fc_85.w_0_moment1_0', 'fc_85.w_0_moment2_0', 'fc_86.w_0', 'fc_86.w_0_beta1_pow_acc_0', 'fc_86.w_0_beta2_pow_acc_0', 'fc_86.w_0_moment1_0', 'fc_86.w_0_moment2_0', 'fc_87.w_0', 'fc_87.w_0_beta1_pow_acc_0', 'fc_87.w_0_beta2_pow_acc_0', 'fc_87.w_0_moment1_0', 'fc_87.w_0_moment2_0', 'fc_88.w_0', 'fc_88.w_0_beta1_pow_acc_0', 'fc_88.w_0_beta2_pow_acc_0', 'fc_88.w_0_moment1_0', 'fc_88.w_0_moment2_0', 'fc_89.w_0', 'fc_89.w_0_beta1_pow_acc_0', 'fc_89.w_0_beta2_pow_acc_0', 'fc_89.w_0_moment1_0', 'fc_89.w_0_moment2_0', 'fc_9.w_0', 'fc_9.w_0_beta1_pow_acc_0', 'fc_9.w_0_beta2_pow_acc_0', 'fc_9.w_0_moment1_0', 'fc_9.w_0_moment2_0', 'fc_90.w_0', 'fc_90.w_0_beta1_pow_acc_0', 'fc_90.w_0_beta2_pow_acc_0', 'fc_90.w_0_moment1_0', 'fc_90.w_0_moment2_0', 'fc_91.w_0', 'fc_91.w_0_beta1_pow_acc_0', 'fc_91.w_0_beta2_pow_acc_0', 'fc_91.w_0_moment1_0', 'fc_91.w_0_moment2_0', 'fc_92.w_0', 'fc_92.w_0_beta1_pow_acc_0', 'fc_92.w_0_beta2_pow_acc_0', 'fc_92.w_0_moment1_0', 'fc_92.w_0_moment2_0', 'fc_93.w_0', 'fc_93.w_0_beta1_pow_acc_0', 'fc_93.w_0_beta2_pow_acc_0', 'fc_93.w_0_moment1_0', 'fc_93.w_0_moment2_0', 'fc_94.b_0', 'fc_94.b_0_beta1_pow_acc_0', 'fc_94.b_0_beta2_pow_acc_0', 'fc_94.b_0_moment1_0', 'fc_94.b_0_moment2_0', 'fc_94.w_0', 'fc_94.w_0_beta1_pow_acc_0', 'fc_94.w_0_beta2_pow_acc_0', 'fc_94.w_0_moment1_0', 'fc_94.w_0_moment2_0', 'fc_95.b_0', 'fc_95.b_0_beta1_pow_acc_0', 'fc_95.b_0_beta2_pow_acc_0', 'fc_95.b_0_moment1_0', 'fc_95.b_0_moment2_0', 'fc_95.w_0', 'fc_95.w_0_beta1_pow_acc_0', 'fc_95.w_0_beta2_pow_acc_0', 'fc_95.w_0_moment1_0', 'fc_95.w_0_moment2_0', 'fc_96.w_0', 'fc_96.w_0_beta1_pow_acc_0', 'fc_96.w_0_beta2_pow_acc_0', 'fc_96.w_0_moment1_0', 'fc_96.w_0_moment2_0', 'layer_norm_0.b_0', 'layer_norm_0.b_0_beta1_pow_acc_0', 'layer_norm_0.b_0_beta2_pow_acc_0', 'layer_norm_0.b_0_moment1_0', 'layer_norm_0.b_0_moment2_0', 'layer_norm_0.tmp_2.scale', 'layer_norm_0.w_0', 'layer_norm_0.w_0_beta1_pow_acc_0', 'layer_norm_0.w_0_beta2_pow_acc_0', 'layer_norm_0.w_0_moment1_0', 'layer_norm_0.w_0_moment2_0', 'layer_norm_1.b_0', 'layer_norm_1.b_0_beta1_pow_acc_0', 'layer_norm_1.b_0_beta2_pow_acc_0', 'layer_norm_1.b_0_moment1_0', 'layer_norm_1.b_0_moment2_0', 'layer_norm_1.tmp_2.scale', 'layer_norm_1.w_0', 'layer_norm_1.w_0_beta1_pow_acc_0', 'layer_norm_1.w_0_beta2_pow_acc_0', 'layer_norm_1.w_0_moment1_0', 'layer_norm_1.w_0_moment2_0', 'layer_norm_10.b_0', 'layer_norm_10.b_0_beta1_pow_acc_0', 'layer_norm_10.b_0_beta2_pow_acc_0', 'layer_norm_10.b_0_moment1_0', 'layer_norm_10.b_0_moment2_0', 'layer_norm_10.tmp_2.scale', 'layer_norm_10.w_0', 'layer_norm_10.w_0_beta1_pow_acc_0', 'layer_norm_10.w_0_beta2_pow_acc_0', 'layer_norm_10.w_0_moment1_0', 'layer_norm_10.w_0_moment2_0', 'layer_norm_11.b_0', 'layer_norm_11.b_0_beta1_pow_acc_0', 'layer_norm_11.b_0_beta2_pow_acc_0', 'layer_norm_11.b_0_moment1_0', 'layer_norm_11.b_0_moment2_0', 'layer_norm_11.tmp_2.scale', 'layer_norm_11.w_0', 'layer_norm_11.w_0_beta1_pow_acc_0', 'layer_norm_11.w_0_beta2_pow_acc_0', 'layer_norm_11.w_0_moment1_0', 'layer_norm_11.w_0_moment2_0', 'layer_norm_12.b_0', 'layer_norm_12.b_0_beta1_pow_acc_0', 'layer_norm_12.b_0_beta2_pow_acc_0', 'layer_norm_12.b_0_moment1_0', 'layer_norm_12.b_0_moment2_0', 'layer_norm_12.tmp_2.scale', 'layer_norm_12.w_0', 'layer_norm_12.w_0_beta1_pow_acc_0', 'layer_norm_12.w_0_beta2_pow_acc_0', 'layer_norm_12.w_0_moment1_0', 'layer_norm_12.w_0_moment2_0', 'layer_norm_13.b_0', 'layer_norm_13.b_0_beta1_pow_acc_0', 'layer_norm_13.b_0_beta2_pow_acc_0', 'layer_norm_13.b_0_moment1_0', 'layer_norm_13.b_0_moment2_0', 'layer_norm_13.tmp_2.scale', 'layer_norm_13.w_0', 'layer_norm_13.w_0_beta1_pow_acc_0', 'layer_norm_13.w_0_beta2_pow_acc_0', 'layer_norm_13.w_0_moment1_0', 'layer_norm_13.w_0_moment2_0', 'layer_norm_14.b_0', 'layer_norm_14.b_0_beta1_pow_acc_0', 'layer_norm_14.b_0_beta2_pow_acc_0', 'layer_norm_14.b_0_moment1_0', 'layer_norm_14.b_0_moment2_0', 'layer_norm_14.tmp_2.scale', 'layer_norm_14.w_0', 'layer_norm_14.w_0_beta1_pow_acc_0', 'layer_norm_14.w_0_beta2_pow_acc_0', 'layer_norm_14.w_0_moment1_0', 'layer_norm_14.w_0_moment2_0', 'layer_norm_15.b_0', 'layer_norm_15.b_0_beta1_pow_acc_0', 'layer_norm_15.b_0_beta2_pow_acc_0', 'layer_norm_15.b_0_moment1_0', 'layer_norm_15.b_0_moment2_0', 'layer_norm_15.tmp_2.scale', 'layer_norm_15.w_0', 'layer_norm_15.w_0_beta1_pow_acc_0', 'layer_norm_15.w_0_beta2_pow_acc_0', 'layer_norm_15.w_0_moment1_0', 'layer_norm_15.w_0_moment2_0', 'layer_norm_16.b_0', 'layer_norm_16.b_0_beta1_pow_acc_0', 'layer_norm_16.b_0_beta2_pow_acc_0', 'layer_norm_16.b_0_moment1_0', 'layer_norm_16.b_0_moment2_0', 'layer_norm_16.tmp_2.scale', 'layer_norm_16.w_0', 'layer_norm_16.w_0_beta1_pow_acc_0', 'layer_norm_16.w_0_beta2_pow_acc_0', 'layer_norm_16.w_0_moment1_0', 'layer_norm_16.w_0_moment2_0', 'layer_norm_17.b_0', 'layer_norm_17.b_0_beta1_pow_acc_0', 'layer_norm_17.b_0_beta2_pow_acc_0', 'layer_norm_17.b_0_moment1_0', 'layer_norm_17.b_0_moment2_0', 'layer_norm_17.tmp_2.scale', 'layer_norm_17.w_0', 'layer_norm_17.w_0_beta1_pow_acc_0', 'layer_norm_17.w_0_beta2_pow_acc_0', 'layer_norm_17.w_0_moment1_0', 'layer_norm_17.w_0_moment2_0', 'layer_norm_18.b_0', 'layer_norm_18.b_0_beta1_pow_acc_0', 'layer_norm_18.b_0_beta2_pow_acc_0', 'layer_norm_18.b_0_moment1_0', 'layer_norm_18.b_0_moment2_0', 'layer_norm_18.tmp_2.scale', 'layer_norm_18.w_0', 'layer_norm_18.w_0_beta1_pow_acc_0', 'layer_norm_18.w_0_beta2_pow_acc_0', 'layer_norm_18.w_0_moment1_0', 'layer_norm_18.w_0_moment2_0', 'layer_norm_19.b_0', 'layer_norm_19.b_0_beta1_pow_acc_0', 'layer_norm_19.b_0_beta2_pow_acc_0', 'layer_norm_19.b_0_moment1_0', 'layer_norm_19.b_0_moment2_0', 'layer_norm_19.tmp_2.scale', 'layer_norm_19.w_0', 'layer_norm_19.w_0_beta1_pow_acc_0', 'layer_norm_19.w_0_beta2_pow_acc_0', 'layer_norm_19.w_0_moment1_0', 'layer_norm_19.w_0_moment2_0', 'layer_norm_2.b_0', 'layer_norm_2.b_0_beta1_pow_acc_0', 'layer_norm_2.b_0_beta2_pow_acc_0', 'layer_norm_2.b_0_moment1_0', 'layer_norm_2.b_0_moment2_0', 'layer_norm_2.tmp_2.scale', 'layer_norm_2.w_0', 'layer_norm_2.w_0_beta1_pow_acc_0', 'layer_norm_2.w_0_beta2_pow_acc_0', 'layer_norm_2.w_0_moment1_0', 'layer_norm_2.w_0_moment2_0', 'layer_norm_20.b_0', 'layer_norm_20.b_0_beta1_pow_acc_0', 'layer_norm_20.b_0_beta2_pow_acc_0', 'layer_norm_20.b_0_moment1_0', 'layer_norm_20.b_0_moment2_0', 'layer_norm_20.tmp_2.scale', 'layer_norm_20.w_0', 'layer_norm_20.w_0_beta1_pow_acc_0', 'layer_norm_20.w_0_beta2_pow_acc_0', 'layer_norm_20.w_0_moment1_0', 'layer_norm_20.w_0_moment2_0', 'layer_norm_21.b_0', 'layer_norm_21.b_0_beta1_pow_acc_0', 'layer_norm_21.b_0_beta2_pow_acc_0', 'layer_norm_21.b_0_moment1_0', 'layer_norm_21.b_0_moment2_0', 'layer_norm_21.tmp_2.scale', 'layer_norm_21.w_0', 'layer_norm_21.w_0_beta1_pow_acc_0', 'layer_norm_21.w_0_beta2_pow_acc_0', 'layer_norm_21.w_0_moment1_0', 'layer_norm_21.w_0_moment2_0', 'layer_norm_22.b_0', 'layer_norm_22.b_0_beta1_pow_acc_0', 'layer_norm_22.b_0_beta2_pow_acc_0', 'layer_norm_22.b_0_moment1_0', 'layer_norm_22.b_0_moment2_0', 'layer_norm_22.tmp_2.scale', 'layer_norm_22.w_0', 'layer_norm_22.w_0_beta1_pow_acc_0', 'layer_norm_22.w_0_beta2_pow_acc_0', 'layer_norm_22.w_0_moment1_0', 'layer_norm_22.w_0_moment2_0', 'layer_norm_23.b_0', 'layer_norm_23.b_0_beta1_pow_acc_0', 'layer_norm_23.b_0_beta2_pow_acc_0', 'layer_norm_23.b_0_moment1_0', 'layer_norm_23.b_0_moment2_0', 'layer_norm_23.tmp_2.scale', 'layer_norm_23.w_0', 'layer_norm_23.w_0_beta1_pow_acc_0', 'layer_norm_23.w_0_beta2_pow_acc_0', 'layer_norm_23.w_0_moment1_0', 'layer_norm_23.w_0_moment2_0', 'layer_norm_24.b_0', 'layer_norm_24.b_0_beta1_pow_acc_0', 'layer_norm_24.b_0_beta2_pow_acc_0', 'layer_norm_24.b_0_moment1_0', 'layer_norm_24.b_0_moment2_0', 'layer_norm_24.tmp_2.scale', 'layer_norm_24.w_0', 'layer_norm_24.w_0_beta1_pow_acc_0', 'layer_norm_24.w_0_beta2_pow_acc_0', 'layer_norm_24.w_0_moment1_0', 'layer_norm_24.w_0_moment2_0', 'layer_norm_25.b_0', 'layer_norm_25.b_0_beta1_pow_acc_0', 'layer_norm_25.b_0_beta2_pow_acc_0', 'layer_norm_25.b_0_moment1_0', 'layer_norm_25.b_0_moment2_0', 'layer_norm_25.tmp_2.scale', 'layer_norm_25.w_0', 'layer_norm_25.w_0_beta1_pow_acc_0', 'layer_norm_25.w_0_beta2_pow_acc_0', 'layer_norm_25.w_0_moment1_0', 'layer_norm_25.w_0_moment2_0', 'layer_norm_26.b_0', 'layer_norm_26.b_0_beta1_pow_acc_0', 'layer_norm_26.b_0_beta2_pow_acc_0', 'layer_norm_26.b_0_moment1_0', 'layer_norm_26.b_0_moment2_0', 'layer_norm_26.tmp_2.scale', 'layer_norm_26.w_0', 'layer_norm_26.w_0_beta1_pow_acc_0', 'layer_norm_26.w_0_beta2_pow_acc_0', 'layer_norm_26.w_0_moment1_0', 'layer_norm_26.w_0_moment2_0', 'layer_norm_27.b_0', 'layer_norm_27.b_0_beta1_pow_acc_0', 'layer_norm_27.b_0_beta2_pow_acc_0', 'layer_norm_27.b_0_moment1_0', 'layer_norm_27.b_0_moment2_0', 'layer_norm_27.tmp_2.scale', 'layer_norm_27.w_0', 'layer_norm_27.w_0_beta1_pow_acc_0', 'layer_norm_27.w_0_beta2_pow_acc_0', 'layer_norm_27.w_0_moment1_0', 'layer_norm_27.w_0_moment2_0', 'layer_norm_28.b_0', 'layer_norm_28.b_0_beta1_pow_acc_0', 'layer_norm_28.b_0_beta2_pow_acc_0', 'layer_norm_28.b_0_moment1_0', 'layer_norm_28.b_0_moment2_0', 'layer_norm_28.tmp_2.scale', 'layer_norm_28.w_0', 'layer_norm_28.w_0_beta1_pow_acc_0', 'layer_norm_28.w_0_beta2_pow_acc_0', 'layer_norm_28.w_0_moment1_0', 'layer_norm_28.w_0_moment2_0', 'layer_norm_29.b_0', 'layer_norm_29.b_0_beta1_pow_acc_0', 'layer_norm_29.b_0_beta2_pow_acc_0', 'layer_norm_29.b_0_moment1_0', 'layer_norm_29.b_0_moment2_0', 'layer_norm_29.tmp_2.scale', 'layer_norm_29.w_0', 'layer_norm_29.w_0_beta1_pow_acc_0', 'layer_norm_29.w_0_beta2_pow_acc_0', 'layer_norm_29.w_0_moment1_0', 'layer_norm_29.w_0_moment2_0', 'layer_norm_3.b_0', 'layer_norm_3.b_0_beta1_pow_acc_0', 'layer_norm_3.b_0_beta2_pow_acc_0', 'layer_norm_3.b_0_moment1_0', 'layer_norm_3.b_0_moment2_0', 'layer_norm_3.tmp_2.scale', 'layer_norm_3.w_0', 'layer_norm_3.w_0_beta1_pow_acc_0', 'layer_norm_3.w_0_beta2_pow_acc_0', 'layer_norm_3.w_0_moment1_0', 'layer_norm_3.w_0_moment2_0', 'layer_norm_30.b_0', 'layer_norm_30.b_0_beta1_pow_acc_0', 'layer_norm_30.b_0_beta2_pow_acc_0', 'layer_norm_30.b_0_moment1_0', 'layer_norm_30.b_0_moment2_0', 'layer_norm_30.tmp_2.scale', 'layer_norm_30.w_0', 'layer_norm_30.w_0_beta1_pow_acc_0', 'layer_norm_30.w_0_beta2_pow_acc_0', 'layer_norm_30.w_0_moment1_0', 'layer_norm_30.w_0_moment2_0', 'layer_norm_31.b_0', 'layer_norm_31.b_0_beta1_pow_acc_0', 'layer_norm_31.b_0_beta2_pow_acc_0', 'layer_norm_31.b_0_moment1_0', 'layer_norm_31.b_0_moment2_0', 'layer_norm_31.tmp_2.scale', 'layer_norm_31.w_0', 'layer_norm_31.w_0_beta1_pow_acc_0', 'layer_norm_31.w_0_beta2_pow_acc_0', 'layer_norm_31.w_0_moment1_0', 'layer_norm_31.w_0_moment2_0', 'layer_norm_4.b_0', 'layer_norm_4.b_0_beta1_pow_acc_0', 'layer_norm_4.b_0_beta2_pow_acc_0', 'layer_norm_4.b_0_moment1_0', 'layer_norm_4.b_0_moment2_0', 'layer_norm_4.tmp_2.scale', 'layer_norm_4.w_0', 'layer_norm_4.w_0_beta1_pow_acc_0', 'layer_norm_4.w_0_beta2_pow_acc_0', 'layer_norm_4.w_0_moment1_0', 'layer_norm_4.w_0_moment2_0', 'layer_norm_5.b_0', 'layer_norm_5.b_0_beta1_pow_acc_0', 'layer_norm_5.b_0_beta2_pow_acc_0', 'layer_norm_5.b_0_moment1_0', 'layer_norm_5.b_0_moment2_0', 'layer_norm_5.tmp_2.scale', 'layer_norm_5.w_0', 'layer_norm_5.w_0_beta1_pow_acc_0', 'layer_norm_5.w_0_beta2_pow_acc_0', 'layer_norm_5.w_0_moment1_0', 'layer_norm_5.w_0_moment2_0', 'layer_norm_6.b_0', 'layer_norm_6.b_0_beta1_pow_acc_0', 'layer_norm_6.b_0_beta2_pow_acc_0', 'layer_norm_6.b_0_moment1_0', 'layer_norm_6.b_0_moment2_0', 'layer_norm_6.tmp_2.scale', 'layer_norm_6.w_0', 'layer_norm_6.w_0_beta1_pow_acc_0', 'layer_norm_6.w_0_beta2_pow_acc_0', 'layer_norm_6.w_0_moment1_0', 'layer_norm_6.w_0_moment2_0', 'layer_norm_7.b_0', 'layer_norm_7.b_0_beta1_pow_acc_0', 'layer_norm_7.b_0_beta2_pow_acc_0', 'layer_norm_7.b_0_moment1_0', 'layer_norm_7.b_0_moment2_0', 'layer_norm_7.tmp_2.scale', 'layer_norm_7.w_0', 'layer_norm_7.w_0_beta1_pow_acc_0', 'layer_norm_7.w_0_beta2_pow_acc_0', 'layer_norm_7.w_0_moment1_0', 'layer_norm_7.w_0_moment2_0', 'layer_norm_8.b_0', 'layer_norm_8.b_0_beta1_pow_acc_0', 'layer_norm_8.b_0_beta2_pow_acc_0', 'layer_norm_8.b_0_moment1_0', 'layer_norm_8.b_0_moment2_0', 'layer_norm_8.tmp_2.scale', 'layer_norm_8.w_0', 'layer_norm_8.w_0_beta1_pow_acc_0', 'layer_norm_8.w_0_beta2_pow_acc_0', 'layer_norm_8.w_0_moment1_0', 'layer_norm_8.w_0_moment2_0', 'layer_norm_9.b_0', 'layer_norm_9.b_0_beta1_pow_acc_0', 'layer_norm_9.b_0_beta2_pow_acc_0', 'layer_norm_9.b_0_moment1_0', 'layer_norm_9.b_0_moment2_0', 'layer_norm_9.tmp_2.scale', 'layer_norm_9.w_0', 'layer_norm_9.w_0_beta1_pow_acc_0', 'layer_norm_9.w_0_beta2_pow_acc_0', 'layer_norm_9.w_0_moment1_0', 'layer_norm_9.w_0_moment2_0', 'src_pos_enc_table', 'src_word_emb_table', 'src_word_emb_table_beta1_pow_acc_0', 'src_word_emb_table_beta2_pow_acc_0', 'src_word_emb_table_moment1_0', 'src_word_emb_table_moment2_0', 'state_0', 'state_1', 'state_10', 'state_11', 'state_12', 'state_13', 'state_14', 'state_15', 'state_16', 'state_17', 'state_18', 'state_19', 'state_2', 'state_20', 'state_21', 'state_22', 'state_23', 'state_24', 'state_25', 'state_26', 'state_27', 'state_28', 'state_29', 'state_3', 'state_30', 'state_31', 'state_32', 'state_33', 'state_34', 'state_35', 'state_36', 'state_37', 'state_38', 'state_39', 'state_4', 'state_40', 'state_41', 'state_42', 'state_43', 'state_44', 'state_45', 'state_46', 'state_47', 'state_48', 'state_49', 'state_5', 'state_50', 'state_51', 'state_52', 'state_53', 'state_54', 'state_55', 'state_56', 'state_57', 'state_58', 'state_59', 'state_6', 'state_60', 'state_61', 'state_7', 'state_8', 'state_9', 'transpose_11.tmp_0.scale', 'transpose_15.tmp_0.scale', 'transpose_19.tmp_0.scale', 'transpose_23.tmp_0.scale', 'transpose_27.tmp_0.scale', 'transpose_3.tmp_0.scale', 'transpose_31.tmp_0.scale', 'transpose_35.tmp_0.scale', 'transpose_39.tmp_0.scale', 'transpose_43.tmp_0.scale', 'transpose_47.tmp_0.scale', 'transpose_51.tmp_0.scale', 'transpose_55.tmp_0.scale', 'transpose_59.tmp_0.scale', 'transpose_63.tmp_0.scale', 'transpose_67.tmp_0.scale', 'transpose_7.tmp_0.scale', 'transpose_71.tmp_0.scale', 'trg_pos_enc_table', 'trg_word_emb_table', 'trg_word_emb_table_beta1_pow_acc_0', 'trg_word_emb_table_beta2_pow_acc_0', 'trg_word_emb_table_moment1_0', 'trg_word_emb_table_moment2_0'], name: "reduce_sum_0.tmp_0"
type {
type: LOD_TENSOR
lod_tensor {
tensor {
data_type: FP32
dims: 1
}
}
}
persistable: true
, <paddle.fluid.core_avx._Scope object at 0x7f2fbb6eaab0>, [], <paddle.fluid.core_avx.ParallelExecutor.ExecutionStrategy object at 0x7f2fa851a6f8>, <paddle.fluid.core_avx.ParallelExecutor.BuildStrategy object at 0x7f2fa81396c0>, <paddle.fluid.core_avx.Graph object at 0x7f2fa81bc9d0>

from paddleslim.

slf12 avatar slf12 commented on May 11, 2024

可以把代码贴的多一点吗?

from paddleslim.

dlkht avatar dlkht commented on May 11, 2024

好的,整理了一下,贴上

import argparse
import ast
import copy
import logging
import multiprocessing
import subprocess
import os
import six
import sys
import time
import numpy as np
import paddle.fluid as fluid
import base_reader
from config import *
import data_rader
from model import transformer
import dist_utils
import shutil
import quant.quanter as quant

def train_loop(exe,
train_prog,
startup_prog,
dev_count,
sum_cost,
avg_cost,
token_num,
pyreader):
# Initialize the parameters.

train_data, _ = data_rader.prepare_data_generator(
    args,
    is_test=False,
    count=dev_count,
    pyreader=pyreader,
    py_reader_provider_wrapper=data_rader.py_reader_provider_wrapper)

# For faster executor
exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_iteration_per_drop_scope = int(args.fetch_steps)
build_strategy = fluid.BuildStrategy()
build_strategy.memory_optimize = False
build_strategy.enable_inplace = True

sum_cost.persistable = True
token_num.persistable = True

# Since the token number differs among devices, customize gradient scale to
# use token average cost among multi-devices. and the gradient scale is
# `1 / token_number` for average cost.
# build_strategy.gradient_scale_strategy = fluid.BuildStrategy.GradientScaleStrategy.Customized
build_strategy.fuse_all_optimizer_ops = True

build_strategy.fuse_all_reduce_ops = False
build_strategy.sync_batch_norm = False

if num_trainers > 1 and args.use_py_reader and TrainTaskConfig.use_gpu:
    dist_utils.prepare_for_multi_process(exe, build_strategy, train_prog)
    exec_strategy.num_threads = 1

train_exe = fluid.Executor(exe.place)
           
if args.use_py_reader:
    pyreader.start()
    data_generator = None
else:
    data_generator = train_data()

batch_id = 0
while True:
    try:
            feed_dict_list = data_rader.prepare_feed_dict_list(data_generator,
                                                               init_flag, dev_count)
            train_cp = train_prog.with_data_parallel(
                loss_name=sum_cost,
                build_strategy=build_strategy,
                exec_strategy=exec_strategy)

            outs = train_exe.run(train_cp,
                fetch_list=[sum_cost.name, token_num.name],
                feed=feed_dict_list)
            
            batch_id += 1
            step_idx += 1

def train(args):

gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
place = fluid.CUDAPlace(gpu_id)
dev_count = get_device_num()      #= 3

exe = fluid.Executor(place)

train_prog = fluid.Program()
startup_prog = fluid.Program()

if args.enable_ce:
    train_prog.random_seed = 1000
    startup_prog.random_seed = 1000

with fluid.program_guard(train_prog, startup_prog):            
    with fluid.unique_name.guard():
        sum_cost, avg_cost, predict, token_num, pyreader = transformer(          #网络
            ModelHyperParams.src_vocab_size,
            ModelHyperParams.trg_vocab_size,
            ModelHyperParams.max_length + 1,
            ModelHyperParams.n_layer,
            ModelHyperParams.n_head,
            ModelHyperParams.d_key,
            ModelHyperParams.d_value,
            ModelHyperParams.d_model,
            ModelHyperParams.d_inner_hid,
            ModelHyperParams.prepostprocess_dropout,
            ModelHyperParams.attention_dropout,
            ModelHyperParams.relu_dropout,
            ModelHyperParams.preprocess_cmd,
            ModelHyperParams.postprocess_cmd,
            ModelHyperParams.weight_sharing,
            TrainTaskConfig.label_smooth_eps,
            ModelHyperParams.bos_idx,
            use_py_reader=args.use_py_reader,
            is_test=False)

        if args.sync:
            lr_decay = fluid.layers.learning_rate_scheduler.noam_decay(
                ModelHyperParams.d_model, TrainTaskConfig.warmup_steps)
            logging.info("before adam")

            with fluid.default_main_program()._lr_schedule_guard():
                learning_rate = lr_decay * TrainTaskConfig.learning_rate

            optimizer = fluid.optimizer.Adam(
                learning_rate=learning_rate,
                beta1=TrainTaskConfig.beta1,
                beta2=TrainTaskConfig.beta2,
                epsilon=TrainTaskConfig.eps)
        else:
            optimizer = fluid.optimizer.SGD(0.003)
        optimizer.minimize(avg_cost)
 
    
exe.run(startup_prog)  # to init pyreader for training

#load 预训练模型参数
fluid.io.load_persistables(
        exe, TrainTaskConfig.ckpt_path, main_program=train_prog)
else:
    logging.info("init fluid.framework.default_startup_program")
    exe.run(startup_prog)
 
#量化 
quant_program = quant.quant_aware(train_prog, exe.place, for_test=False)

startup_prog1 = quant_program

#量化后训练
train_loop(exe, quant_program, startup_prog1, dev_count, sum_cost, avg_cost,
           token_num, pyreader)

if name == "main":

train(args)

from paddleslim.

slf12 avatar slf12 commented on May 11, 2024

把下面的代码

train_cp = train_prog.with_data_parallel(
                loss_name=sum_cost,
                build_strategy=build_strategy,
                exec_strategy=exec_strategy)

移到while 循环外试试

from paddleslim.

dlkht avatar dlkht commented on May 11, 2024

移到while 循环外还是那样提示

from paddleslim.

slf12 avatar slf12 commented on May 11, 2024

移到while 循环外还是那样提示

paddle是什么版本?

from paddleslim.

dlkht avatar dlkht commented on May 11, 2024

1.7.0

from paddleslim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.