Giter VIP home page Giter VIP logo

modelzoo-pytorch's Introduction

欢迎使用Ascend ModelZoo-PyTorch

为方便更多开发者使用Ascend ModelZoo,我们将持续增加典型网络和相关预训练模型。如果您有任何需求,请在GiteeModelZoo提交issue,我们会及时处理。

如何贡献

本仓子模块如下,可以直接克隆子仓,也可以克隆主仓,在开始贡献之前,请先阅读notice。 谢谢!

目录

目录 说明
ACL_Pytorch 基于昇腾芯片的推理模型参考
PyTorch 基于昇腾芯片的训练模型参考
AscendIE 基于昇腾芯片的推理引擎模型参考

免责声明

Ascend ModelZoo仅提供公共数据集下载和预处理脚本。这些数据集不属于ModelZoo,ModelZoo也不对其质量或维护负责。请确保您具有这些数据集的使用许可。基于这些数据集训练的模型仅可用于非商业研究和教育。

致数据集所有者:

如果您不希望您的数据集公布在ModelZoo上或希望更新ModelZoo中属于您的数据集,请在Github/Gitee提交issue,我们将根据您的issue删除或更新您的数据集。衷心感谢您对我们社区的理解和贡献。

Ascend ModelZoo的license是Apache 2.0.具体内容,请参见LICENSE文件。

modelzoo-pytorch's People

Contributors

18342832465 avatar 530nlz avatar bmzhong avatar bonjourchy avatar cur1ey avatar dilililiwhy avatar gaolinfeng avatar hgsil avatar hymhhh avatar liang-xiubo avatar luckydog-1998 avatar maoxx241 avatar mbaey avatar shikang1994 avatar silentssss avatar terrymaoterui avatar violeta521 avatar wgqyyt avatar wuyy258 avatar xiahaisheng1 avatar xiaochenli601 avatar xiaoloongfang avatar xlx1111 avatar yan-yhy avatar zhanghaobucunzai avatar zhangyifan525 avatar zhaozeyong avatar zhouaugustus avatar zrydom avatar zwx6052 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

modelzoo-pytorch's Issues

使用昇腾910进行ChatGLM单机多卡推理报错

使用PyTorch/built-in/foundation/ChatGLM-6B/utils.py中的样例执行
发生异常: TypeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
new() received an invalid combination of arguments - got (Tensor, requires_grad=bool), but expected one of:

  • (*, torch.device device)
    didn't match because some of the keywords were incorrect: requires_grad
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, *, torch.device device)
  • (object data, *, torch.device device)
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 368, in set_module_tensor_to_device
    new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 253, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 350, in attach_execution_device_hook
    add_hook_to_module(module, AlignDevicesHook(execution_device, skip_keys=skip_keys))
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 357, in attach_execution_device_hook
    attach_execution_device_hook(child, execution_device)
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 517, in attach_align_device_hook_on_blocks
    attach_execution_device_hook(module, execution_device[module_name])
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 546, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 546, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/hooks.py", line 546, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
    [Previous line repeated 2 more times]
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/accelerate/big_modeling.py", line 399, in dispatch_model
    attach_align_device_hook_on_blocks(
    File "/home/ma-user/work/test.py", line 44, in load_model_on_gpus
    model = dispatch_model(model, device_map=device_map)
    File "/home/ma-user/work/test.py", line 51, in
    model = load_model_on_gpus("/home/ma-user/work/chatglm3-6b", 2)
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
    File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
    TypeError: new() received an invalid combination of arguments - got (Tensor, requires_grad=bool), but expected one of:
  • (*, torch.device device)
    didn't match because some of the keywords were incorrect: requires_grad
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, *, torch.device device)
  • (object data, *, torch.device device)

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

siamfc head network export to onnx?

hi,professor:
i look into for visual tracking, in this project, i find siamfc pretrained pth model to onnx format, but just have backbone_net export, there head net also need export, so , how can i export head net to onnx ?
please help!

StableDiffusionXL的UNet模型ONNX转OM失败

代码:https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/foundation_models/stable_diffusionxl
模型权重:https://www.modelscope.cn/models/AI-ModelScope/stable-diffusion-xl-base-1.0/files
Python环境按照(1)https://github.com/Ascend/msadvisor/blob/master/auto-optimizer/README.md (2)https://gitee.com/ascend/ModelZoo-PyTorch/blob/master/ACL_PyTorch/built-in/foundation_models/stable_diffusionxl/requirements.txt 安装后最终pip list的输出如下:

Package              Version
-------------------- ------------
absl-py              1.4.0
addict               2.4.0
attrs                23.1.0
auto-optimizer       0.1.0
auto-tune            0.1.0
blosc2               2.0.0
certifi              2023.7.22
cffi                 1.15.1
charset-normalizer   3.2.0
click                8.1.7
click-aliases        1.0.4
coloredlogs          15.0.1
cycler               0.11.0
Cython               3.0.0
dataflow             0.0.1
decorator            5.1.1
diffusers            0.21.0
easydict             1.9
filelock             3.13.1
flatbuffers          23.5.26
fsspec               2023.10.0
ftfy                 6.1.3
hccl                 0.1.0
hccl-parser          0.1
huggingface-hub      0.18.0
humanfriendly        10.0
idna                 3.4
importlib-metadata   6.8.0
joblib               1.3.2
kiwisolver           1.4.4
lxml                 4.5.2
markdown-it-py       3.0.0
matplotlib           3.2.2
mdurl                0.1.2
mmcv                 2.0.1
mmengine             0.8.4
mpmath               1.3.0
msadvisor            1.0.0
msgpack              1.0.5
numexpr              2.8.5
numpy                1.25.2
onnx                 1.15.0
onnxconverter-common 1.14.0
onnxruntime          1.16.3
op-compile-tool      0.1.0
op-gen               0.1
op-test-frame        0.1
opc-tool             0.1.0
open-clip-torch      2.20.0
opencv-python        4.5.5.64
packaging            23.1
pandas               2.0.3
pathlib2             2.3.7.post1
Pillow               10.0.0
pip                  23.3.2
platformdirs         3.10.0
protobuf             3.20.2
psutil               5.9.5
py-cpuinfo           9.0.0
pycocotools          2.0.7
pycparser            2.21
Pygments             2.16.1
pyparsing            3.1.1
python-dateutil      2.8.2
pytz                 2023.3
PyYAML               6.0.1
regex                2023.10.3
requests             2.31.0
rich                 13.5.2
safetensors          0.4.1
schedule-search      0.0.1
scikit-learn         1.3.0
scipy                1.11.1
sentencepiece        0.1.99
setuptools           69.0.3
six                  1.16.0
skl2onnx             1.16.0
sklearn              0.0
sympy                1.4
tables               3.8.0
te                   0.4.0
termcolor            2.3.0
threadpoolctl        3.2.0
timm                 0.9.12
tokenizers           0.13.3
tomli                2.0.1
torch                1.13.0
torch-npu            1.11.0.post4
torchvision          0.14.1
tqdm                 4.66.1
transformers         4.26.1
typing_extensions    4.8.0
tzdata               2023.3
urllib3              2.0.4
wcwidth              0.2.13
wheel                0.32.1
yapf                 0.40.1
zipp                 3.16.2

当我执行以下命令把UNet模型从onnx转om格式:

atc --framework=5 \
--model=./unet_md.onnx \
--output=./unet \
--input_format=NCHW \
--log=error \
--optypelist_for_implmode="Gelu,Sigmoid" \
--op_select_implmode=high_performance \
--soc_version=Ascend${chip_name}

然后就报错了(clip、vae、ddim都没问题):

ATC start working now, please wait for a moment...
ATC run failed, Please check the detail log, Try 'atc --help' for more information
E19010: No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.0/tome1/FindMax, optype [ai.onnx::11::FindMax]].
Solution: Check the version of the installation package and reinstall the package. For details, see the operator specifications.
TraceBack (most recent call last):
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.0/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.0/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.0/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.1/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.1/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.1/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.1/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/down_blocks.1/attentions.0/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.0/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.0/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.0/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.0/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.1/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.1/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.1/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.1/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/down_blocks.1/attentions.1/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.2/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.3/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.4/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.5/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.6/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.7/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.8/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.0/transformer_blocks.9/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.2/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.3/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.4/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.5/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.6/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.7/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.8/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/down_blocks.2/attentions.1/transformer_blocks.9/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.2/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.3/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.4/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.5/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.6/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.7/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.8/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/mid_block/attentions.0/transformer_blocks.9/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.2/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.3/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.4/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.5/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.6/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.7/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.8/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.0/transformer_blocks.9/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.2/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.3/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.4/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.5/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.6/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.7/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.8/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.1/transformer_blocks.9/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.2/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.3/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.4/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.5/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.6/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.7/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.8/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.0/attentions.2/transformer_blocks.9/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.0/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.0/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.0/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.0/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.1/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.1/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.1/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.1/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/up_blocks.1/attentions.0/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.0/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.0/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.0/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.0/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.1/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.1/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.1/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.1/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/up_blocks.1/attentions.1/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.0/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.0/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.0/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.0/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.0/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.1/tome1/FindMax, optype [ai.onnx::11::FindMax]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.1/tome1/TomeMerged, optype [ai.onnx::11::TomeMerged]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.1/attn1/UnpadFlashAttentionMix, optype [ai.onnx::11::UnpadFlashAttentionMix]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.1/tome1/tome/TomeUnmerge, optype [ai.onnx::11::TomeUnmerge]].
No parser is registered for Op [/up_blocks.1/attentions.2/transformer_blocks.1/ff/net.0/SliceTransGeluMul, optype [ai.onnx::11::SliceTransGeluMul]].
Model parse to graph failed, graph name:unet.[FUNC:ModelParseToGraph][FILE:onnx_parser.cc][LINE:925]
ATC model parse ret fail.[FUNC:ParseGraph][FILE:omg.cc][LINE:780]

ModleZooPyTorch的示例,MobileNetV3_large_100_for_PyTorch单卡训练没问题,8卡训练的途中主进程挂了卡住了,但是还占用了3块NPU, 5块NPU因为主进程挂了异常,然后被释放了

ModelZoo-PyTorch/PyTorch/contrib/cv/classification/MobileNetV3_large_100_for_PyTorch# bash ./test/train_full_8p.sh --data_path=./tiny-imagenet-200

Using NVIDIA APEX AMP. Training in mixed precision.

Using NVIDIA APEX DistributedDataParallel.

\

Scheduled epochs: 12

./tiny-imagenet-200/train

./tiny-imagenet-200/val

/

/

|

/

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 1 [ 23/24 (100%)] Loss: 317.145294 (162.0302) Time: 0.203s, 20132.34/s (0.204s, 20120.48/s) LR: 1

.875e-01 Data: 0.000 (0.000) FPS: 20120.483 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 2 [ 23/24 (100%)] Loss: 12993.997070 (6542.2793) Time: 0.201s, 20416.47/s (0.201s, 20367.03/s) LR

: 1.067e+00 Data: 0.000 (0.000) FPS: 20367.028 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-2.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 3 [ 23/24 (100%)] Loss: 13072.549805 (13072.2168) Time: 0.200s, 20458.09/s (0.201s, 20406.06/s) L

R: 1.000e-05 Data: 0.000 (0.000) FPS: 20406.057 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-2.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-3.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 4 [ 23/24 (100%)] Loss: 13074.948242 (13074.1235) Time: 0.200s, 20464.24/s (0.201s, 20388.30/s) L

R: 1.000e-05 Data: 0.000 (0.000) FPS: 20388.300 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-2.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-3.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-4.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 5 [ 23/24 (100%)] Loss: 13074.291016 (13074.4219) Time: 0.200s, 20436.34/s (0.200s, 20444.53/s) L

R: 1.000e-05 Data: 0.000 (0.000) FPS: 20444.529 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-2.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-3.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-4.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-5.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 6 [ 23/24 (100%)] Loss: 13075.957031 (13075.1313) Time: 0.201s, 20424.17/s (0.201s, 20404.91/s) L

R: 1.000e-05 Data: 0.000 (0.000) FPS: 20404.910 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-2.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-3.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-4.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-5.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-6.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 7 [ 23/24 (100%)] Loss: 13076.270508 (13076.5000) Time: 0.201s, 20398.41/s (0.200s, 20433.25/s) L

R: 1.000e-05 Data: 0.000 (0.000) FPS: 20433.251 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-2.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-3.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-4.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-5.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-6.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-7.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

Train: 8 [ 23/24 (100%)] Loss: 13076.963867 (13077.1807) Time: 0.201s, 20409.10/s (0.201s, 20424.42/s) L

R: 1.000e-05 Data: 0.000 (0.000) FPS: 20424.422 Batch_Size:512.0

Current checkpoints:

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-1.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-2.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-3.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-4.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-5.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-6.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-7.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-8.pth.tar', 100.0)

('./output/train/20230727-025210-mobilenetv3_large_100-224/checkpoint-0.pth.tar', 0.0)

-----------------------------------8卡 train_1.log 有报错

Traceback (most recent call last):

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap

self.run() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 99, in run

self._target(*self._args, **self._kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 61, in wrapper

raise exp 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 58, in wrapper

func(*args, **kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 275, in task_distribute

key, func_name, detail = resource_proxy[TASK_QUEUE].get() 

File "", line 2, in get

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/managers.py", line 819, in _callmeth

od

kind, result = conn.recv() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 250, in recv

buf = self._recv_bytes() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_b

ytes

buf = self._recv(4) 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 383, in _recv

raise EOFError 

EOFError

-----------------------------------8卡 train_2.log 有报错

Traceback (most recent call last):

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap

self.run() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 99, in run

self._target(*self._args, **self._kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 61, in wrapper

raise exp 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 58, in wrapper

func(*args, **kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 275, in task_distribute

key, func_name, detail = resource_proxy[TASK_QUEUE].get() 

File "", line 2, in get

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/managers.py", line 819, in _callmeth

od

kind, result = conn.recv() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 250, in recv

buf = self._recv_bytes() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_b

ytes

buf = self._recv(4) 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 383, in _recv

raise EOFError 

EOFError

/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semap

hore_tracker: There appear to be 91 leaked semaphores to clean up at shutdown

len(cache))

-----------------------------------8卡 train_3.log 还在等待

-----------------------------------8卡 train_4.log 有报错

Traceback (most recent call last):

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap

self.run() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 99, in run

self._target(*self._args, **self._kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 61, in wrapper

raise exp 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 58, in wrapper

func(*args, **kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 275, in task_distribute

key, func_name, detail = resource_proxy[TASK_QUEUE].get() 

File "", line 2, in get

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/managers.py", line 819, in _callmeth

od

kind, result = conn.recv() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 250, in recv

buf = self._recv_bytes() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_b

ytes

buf = self._recv(4) 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 383, in _recv

raise EOFError 

EOFError

/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semap

hore_tracker: There appear to be 91 leaked semaphores to clean up at shutdown

len(cache))

-----------------------------------8卡 train_5.log 有报错

Traceback (most recent call last):

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap

self.run() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 99, in run

self._target(*self._args, **self._kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 61, in wrapper

raise exp 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 58, in wrapper

func(*args, **kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 275, in task_distribute

key, func_name, detail = resource_proxy[TASK_QUEUE].get() 

File "", line 2, in get

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/managers.py", line 819, in _callmeth

od

kind, result = conn.recv() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 250, in recv

buf = self._recv_bytes() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_b

ytes

buf = self._recv(4) 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 383, in _recv

raise EOFError 

EOFError

-----------------------------------8卡 train_6.log 有报错

Traceback (most recent call last):

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap

self.run() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/process.py", line 99, in run

self._target(*self._args, **self._kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 61, in wrapper

raise exp 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 58, in wrapper

func(*args, **kwargs) 

File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py",

line 275, in task_distribute

key, func_name, detail = resource_proxy[TASK_QUEUE].get() 

File "", line 2, in get

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/managers.py", line 819, in _callmeth

od

kind, result = conn.recv() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 250, in recv

buf = self._recv_bytes() 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_b

ytes

buf = self._recv(4) 

File "/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/connection.py", line 383, in _recv

raise EOFError 

EOFError

/root/miniconda3/envs/torch-1.11.0/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semap

hore_tracker: There appear to be 91 leaked semaphores to clean up at shutdown

len(cache))

-----------------------------------8卡 train_7.log 等待中

more /root/ascend/log/debug/plog/plog-83032_20230

727024927917.log

[TRACE] GE(83032,python3):2023-07-27-02:49:27.848.095 [status:INIT] [ge_api.cc:200]83032 GEInitializeImpl:GEI

nitialize start

[TRACE] GE(83032,python3):2023-07-27-02:49:28.073.557 [status:RUNNING] [ge_api.cc:266]83032 GEInitializeImpl:

Initializing environment

[TRACE] GE(83032,python3):2023-07-27-02:49:36.094.724 [status:STOP] [ge_api.cc:309]83032 GEInitializeImpl:GEI

nitialize finished

[TRACE] GE(83032,python3):2023-07-27-02:49:36.095.523 [status:INIT] [ge_api.cc:200]83032 GEInitializeImpl:GEI

nitialize start

[TRACE] HCCL(83032,python3):2023-07-27-02:49:57.407.898 [status:init] [op_base.cc:267][hccl-83032-0-169042619

7-hccl_world_group][7]HcclCommInitRootInfo success,take time [2890202]us, rankNum[8], rank[7],rootInfo identi

fier[10.0.48.200%enp61s0f3_60000_0_1690426193976808], server[10.0.48.200%enp61s0f3], device[7]

几个都是正常的

最后时间的三个

(base) root@hw:/media/sda/datastore/dataset/detect_dataset# more /root/ascend/log/debug/plog/plog-84469_20230

727030241041.log

[ERROR] TBE(84469,python3):2023-07-27-03:02:41.035.597 [../../../../../../latest/python/site-packages/tbe/com

mon/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-package

s/tbe/common/repository_manager/utils/common.py:100][repository_manager] The main process does not exist. We

would kill multiprocess manager process: 84068.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.