Giter VIP home page Giter VIP logo

Comments (4)

LiChenyang-Github avatar LiChenyang-Github commented on August 20, 2024

麻烦提供一下你使用的启动命令。

from damo-streamnet.

 avatar commented on August 20, 2024

麻烦提供一下你使用的启动命令。

  1. 使用预训练模型是这样的启动命令

python tools/train.py -f cfgs/streamnet_l_1200x1920
-c ./models/coco_pretrained_models/yolox_l_drfpn.pth
--experiment-name streamnet_l_1200x1920
-d 1 -b 2 --fp16
会出现如下错误:

2023-07-09 18:17:44 | ERROR | yolox.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (45011), thread 'MainThread' (140231041484608):
Traceback (most recent call last):

File "tools/train.py", line 147, in
args=(exp, args),
│ └ Namespace(batch_size=1, cache=False, ckpt='./models/coco_pretrained_models/yolox_l_drfpn.pth', del_history_ckpt=False, device...
└ ╒═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...

File "/home/qtt/Test/DAMO-StreamNet/yolox/core/launch.py", line 98, in launch
main_func(*args)
│ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x7f894bc7e4d0>

File "tools/train.py", line 123, in main
trainer.train()
│ └ <function Trainer.train at 0x7f8a10ff14d0>
└ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>

File "/home/qtt/Test/DAMO-StreamNet/exps/train_utils/longshort_trainer.py", line 77, in train
self.before_train()
│ └ <function Trainer.before_train at 0x7f8a10fff170>
└ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>

File "/home/qtt/Test/DAMO-StreamNet/exps/train_utils/longshort_trainer.py", line 157, in before_train
model = self.resume_train(model)
│ │ └ YOLOXLONGSHORTV3(
│ │ (long_backbone): DFPPAFPNLONGV3(
│ │ (group_0_jian2): BaseConv(
│ │ (conv): Conv2d(256, 42, kernel_size...
│ └ <function Trainer.resume_train at 0x7f8a10fff560>
└ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>

File "/home/qtt/Test/DAMO-StreamNet/exps/train_utils/longshort_trainer.py", line 325, in resume_train
ckpt = torch.load(ckpt_file, map_location=self.device)["model"]
│ │ │ │ └ 'cuda:0'
│ │ │ └ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>
│ │ └ './models/coco_pretrained_models/yolox_l_drfpn.pth'
│ └ <function load at 0x7f8a12db9200>
└ <module 'torch' from '/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/init.py'>

File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
│ │ │ │ └ {'encoding': 'utf-8'}
│ │ │ └ <module 'pickle' from '/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/pickle.py'>
│ │ └ 'cuda:0'
│ └ <torch._C.PyTorchFileReader object at 0x7f8971993830>
└ <function _load at 0x7f8a12db9560>
File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
│ └ <method 'load' of '_pickle.Unpickler' objects>
└ <_pickle.Unpickler object at 0x7f8970bf7ef0>
File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
│ │ │ │ │ └ 'cuda:0'
│ │ │ │ └ <function _maybe_decode_ascii at 0x7f8a12db9440>
│ │ │ └ '94471090091744'
│ │ └ 256
│ └ <class 'torch.FloatStorage'>
└ <function _load..load_tensor at 0x7f896e992b90>
File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 833, in load_tensor
storage = zip_file.get_storage_from_record(name, size, dtype).storage()
│ │ │ │ └ torch.float32
│ │ │ └ 256
│ │ └ 'data/94471090091744'
│ └ <instancemethod get_storage_from_record at 0x7f8a13909b50>
└ <torch._C.PyTorchFileReader object at 0x7f8971993830>

RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/94471090091744: invalid header or archive is corrupted

  1. 不使用预训练模型是这样的启动命令
    python tools/train.py -f cfgs/streamnet_l_1200x1920
    --experiment-name streamnet_l_1200x1920
    -d 1 -b 2 --fp16

from damo-streamnet.

LiChenyang-Github avatar LiChenyang-Github commented on August 20, 2024

麻烦提供一下你使用的启动命令。

  1. 使用预训练模型是这样的启动命令

python tools/train.py -f cfgs/streamnet_l_1200x1920 -c ./models/coco_pretrained_models/yolox_l_drfpn.pth --experiment-name streamnet_l_1200x1920 -d 1 -b 2 --fp16 会出现如下错误:

2023-07-09 18:17:44 | ERROR | yolox.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (45011), thread 'MainThread' (140231041484608):
Traceback (most recent call last):
File "tools/train.py", line 147, in
args=(exp, args),
│ └ Namespace(batch_size=1, cache=False, ckpt='./models/coco_pretrained_models/yolox_l_drfpn.pth', del_history_ckpt=False, device...
└ ╒═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...

File "/home/qtt/Test/DAMO-StreamNet/yolox/core/launch.py", line 98, in launch
main_func(*args)
│ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x7f894bc7e4d0>

File "tools/train.py", line 123, in main
trainer.train()
│ └ <function Trainer.train at 0x7f8a10ff14d0>
└ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>
File "/home/qtt/Test/DAMO-StreamNet/exps/train_utils/longshort_trainer.py", line 77, in train
self.before_train()
│ └ <function Trainer.before_train at 0x7f8a10fff170>
└ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>
File "/home/qtt/Test/DAMO-StreamNet/exps/train_utils/longshort_trainer.py", line 157, in before_train
model = self.resume_train(model)
│ │ └ YOLOXLONGSHORTV3(
│ │ (long_backbone): DFPPAFPNLONGV3(
│ │ (group_0_jian2): BaseConv(
│ │ (conv): Conv2d(256, 42, kernel_size...
│ └ <function Trainer.resume_train at 0x7f8a10fff560>
└ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>
File "/home/qtt/Test/DAMO-StreamNet/exps/train_utils/longshort_trainer.py", line 325, in resume_train
ckpt = torch.load(ckpt_file, map_location=self.device)["model"]
│ │ │ │ └ 'cuda:0'
│ │ │ └ <exps.train_utils.longshort_trainer.Trainer object at 0x7f8a10febe50>
│ │ └ './models/coco_pretrained_models/yolox_l_drfpn.pth'
│ └ <function load at 0x7f8a12db9200>
└ <module 'torch' from '/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/init.py'>
File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
│ │ │ │ └ {'encoding': 'utf-8'}
│ │ │ └ <module 'pickle' from '/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/pickle.py'>
│ │ └ 'cuda:0'
│ └ <torch._C.PyTorchFileReader object at 0x7f8971993830>
└ <function _load at 0x7f8a12db9560>
File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
│ └ <method 'load' of '_pickle.Unpickler' objects>
└ <_pickle.Unpickler object at 0x7f8970bf7ef0>
File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
│ │ │ │ │ └ 'cuda:0'
│ │ │ │ └ <function _maybe_decode_ascii at 0x7f8a12db9440>
│ │ │ └ '94471090091744'
│ │ └ 256
│ └ <class 'torch.FloatStorage'>
└ <function _load..load_tensor at 0x7f896e992b90>
File "/home/qtt/Software/anaconda3/envs/torch171_py37_cu110/lib/python3.7/site-packages/torch/serialization.py", line 833, in load_tensor
storage = zip_file.get_storage_from_record(name, size, dtype).storage()
│ │ │ │ └ torch.float32
│ │ │ └ 256
│ │ └ 'data/94471090091744'
│ └ <instancemethod get_storage_from_record at 0x7f8a13909b50>
└ <torch._C.PyTorchFileReader object at 0x7f8971993830>
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/94471090091744: invalid header or archive is corrupted

  1. 不使用预训练模型是这样的启动命令
    python tools/train.py -f cfgs/streamnet_l_1200x1920
    --experiment-name streamnet_l_1200x1920
    -d 1 -b 2 --fp16

目前看可能有下面的两个原因:

  1. 模型文件损坏。可以重新下载一下 yolox_l_drfpn.pth 模型,或者提供一下你本地该模型的md5sum值;
  2. pytorch版本的问题。安装pytoch 1.8.1(训练该模型使用的pytorch版本是1.8.1+cu102);

from damo-streamnet.

 avatar commented on August 20, 2024

目前看可能有下面的两个原因:

  1. 模型文件损坏。可以重新下载一下 yolox_l_drfpn.pth 模型,或者提供一下你本地该模型的md5sum值;
  2. pytorch版本的问题。安装pytoch 1.8.1(训练该模型使用的pytorch版本是1.8.1+cu102);

谢谢!模型文件确实也有问题,重下以后可以正常加载了!pytorch我之前用的是1.7.1+cu110,改了pytorch1.8.1可以正常训练了!

from damo-streamnet.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.