opengvlab / internvl-mmdetseg Goto Github PK
View Code? Open in Web Editor NEWTrain InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
Home Page: https://arxiv.org/abs/2312.14238
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
Home Page: https://arxiv.org/abs/2312.14238
作者你好,我已经按照安装步骤搭建了环境,但在ADE20K复现InternViT-6B-Adapter时遇到了deepspeed报错的问题,似乎是该版本不兼容:
/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects--local_rank
argument to be set, please
change it to read fromos.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructionswarnings.warn(
/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:
- 'allow_population_by_field_name' has been renamed to 'populate_by_name'
- 'validate_all' has been renamed to 'validate_default'
warnings.warn(message, UserWarning)
/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/pydantic/_internal/fields.py:151: UserWarning: Field "model_persistence_threshold" has conflict with protected namespace "model".You may be able to resolve this warning by setting
model_config['protected_namespaces'] = ()
.
warnings.warn(
/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/pydantic/_internal/_config.py:322: UserWarning: Valid config keys have changed in V2:
- 'validate_all' has been renamed to 'validate_default'
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/tools/train.py", line 12, in
from mmcv.cnn.utils import revert_sync_batchnorm
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/cnn/init.py", line 14, in
from .builder import MODELS, build_model_from_cfg
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/cnn/builder.py", line 2, in
from ..runner import Sequential
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/runner/init.py", line 3, in
from .base_runner import BaseRunner
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/runner/base_runner.py", line 14, in
import deepspeed
File "/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/deepspeed/init.py", line 17, in
from .runtime.engine import DeepSpeedEngine, DeepSpeedOptimizerCallable, DeepSpeedSchedulerCallable
File "/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 93, in
from deepspeed.inference.config import DtypeEnum
File "/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/deepspeed/inference/config.py", line 88, in
class BaseQuantConfig(DeepSpeedConfigModel):
File "/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/pydantic/_internal/_model_construction.py", line 92, in new
private_attributes = inspect_namespace(
File "/root/anaconda3/envs/internvl-mmdetseg/lib/python3.9/site-packages/pydantic/_internal/_model_construction.py", line 384, in inspect_namespace
raise PydanticUserError(
pydantic.errors.PydanticUserError: A non-annotated attribute was detected:enabled = True
. All model fields require a type annotation; ifenabled
is not meant to be a field, you may be able to resolve this error by annotating it as aClassVar
or updatingmodel_config['ignored_types']
.
然后我将deepspeed版本换到了最新版,这个报错就没有了,但是又出现了新的问题,MSDeformAttn没有成功安装:
[2024-03-20 08:33:25,013] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Please install MSDeformAttn if you want to use ViT-Adapter
Please install MSDeformAttn if you want to use ViT-Adapter
Please install MSDeformAttn if you want to use ViT-Adapter
2024-03-20 08:33:29,409 - mmseg - INFO - Multi-processing start method isNone
2024-03-20 08:33:29,410 - mmseg - INFO - OpenCV num_threads is `128
2024-03-20 08:33:29,463 - mmseg - INFO - Environment info:
...
2024-03-20 08:34:52,731 - mmseg - INFO - _IncompatibleKeys(missing_keys=[], unexpected_keys=['clip_projector.norm1_q.weight', 'clip_projector.norm1_q.bias', 'clip_projector.norm1_k.weight', 'clip_projector.norm1_k.bias', 'clip_projector.norm1_v.weight', 'clip_projector.norm1_v.bias', 'clip_projector.cross_attn.q_bias', 'clip_projector.cross_attn.k_bias', 'clip_projector.cross_attn.v_bias', 'clip_projector.cross_attn.q.weight', 'clip_projector.cross_attn.k.weight', 'clip_projector.cross_attn.v.weight', 'clip_projector.cross_attn.proj.weight', 'clip_projector.cross_attn.proj.bias'])
INFO:mmseg:_IncompatibleKeys(missing_keys=[], unexpected_keys=['clip_projector.norm1_q.weight', 'clip_projector.norm1_q.bias', 'clip_projector.norm1_k.weight', 'clip_projector.norm1_k.bias', 'clip_projector.norm1_v.weight', 'clip_projector.norm1_v.bias', 'clip_projector.cross_attn.q_bias', 'clip_projector.cross_attn.k_bias', 'clip_projector.cross_attn.v_bias', 'clip_projector.cross_attn.q.weight', 'clip_projector.cross_attn.k.weight', 'clip_projector.cross_attn.v.weight', 'clip_projector.cross_attn.proj.weight', 'clip_projector.cross_attn.proj.bias'])
Traceback (most recent call last):
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/utils/registry.py", line 69, in build_from_cfg
return obj_cls(**args)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/mmseg/models/backbones/vit_adapter.py", line 49, in init
self.interactions = nn.Sequential(*[
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/mmseg/models/backbones/vit_adapter.py", line 50, in
InteractionBlock(dim=embed_dim, num_heads=deform_num_heads, n_points=n_points,
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/mmseg/models/backbones/adapter_modules.py", line 165, in init
self.injector = Injector(dim=dim, n_levels=3, num_heads=num_heads, init_values=init_values,
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/mmseg/models/backbones/adapter_modules.py", line 138, in init
self.attn = MSDeformAttn(d_model=dim, n_levels=n_levels, n_heads=num_heads,
NameError: name 'MSDeformAttn' is not definedDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/utils/registry.py", line 69, in build_from_cfg
return obj_cls(**args)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 36, in init
self.backbone = builder.build_backbone(backbone)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/mmseg/models/builder.py", line 23, in build_backbone
return BACKBONES.build(cfg)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/utils/registry.py", line 237, in build
return self.build_func(*args, **kwargs, registry=self)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/utils/registry.py", line 72, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
NameError: InternViTAdapter: name 'MSDeformAttn' is not definedDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/tools/train.py", line 246, in
main()
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/tools/train.py", line 199, in main
model = build_segmentor(
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmsegmentation/mmseg/models/builder.py", line 51, in build_segmentor
return SEGMENTORS.build(
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/utils/registry.py", line 237, in build
return self.build_func(*args, **kwargs, registry=self)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/mnt/gengxz/projects/InternVL_MMDetSeg/mmcv/mmcv/utils/registry.py", line 72, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
NameError: EncoderDecoder: InternViTAdapter: name 'MSDeformAttn' is not define
请问该如何解决?这是否和我更改了deepspeed版本有关?
Dear developer,
First off, I would like to extend my compliments on your remarkable work. It's quite fascinating to see the capabilities of the InternViT-6B model, and we are excited about the possibility of leveraging it as a base visual model to explore its generalization capabilities within our research domain.
However, we've encountered some compatibility challenges due to version dependencies. The InternViT-mmdetseg code repository specifies a dependency on mmcv<2.0.0, while our project currently operates on mmdet==3.3.0 and torch>=2.0.0. Additionally, the need to compile the deformable attention operator and integrate the DeepSpeed library presents further complexity. As we lack experience with DeepSpeed and have custom code tailored for newer versions of mmcv and mmdet, aligning with the repository’s dependencies could potentially disrupt our current workflow.
Versioning is an intricate and often cumbersome matter, and we aspire to navigate around it where feasible. One consideration is to forgo DeepSpeed and initiate the backbone implementation using InternViT in a manner akin to how backbones are managed within the MMPretrain repository. Before we forge ahead with this approach, we seek confirmation on its practicality.
Any adjustments that would enable this with a minimal overhead are within our scope of acceptance.
To provide a clear overview, here is our current environment setup:
Your insights or suggestions would be highly valued to ensure the smoothest integration possible. Thank you in advance for your time and consideration.
Best regards,
I would like to express my gratitude for your excellent work.
First, I confirmed that training was successful using the InternVIT-6B backbone and MMSegmentation.
I have encountered issues while training with the InternVIT-6B backbone and MMdetection.
During the training process, the loss values converge to NaN.
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
2024-04-17 08:37:31,145 - mmdet - INFO - Epoch [1][10/39089] lr: 3.751e-08, eta: 16 days, 22:58:00, time: 3.123, data_time: 0.366, memory: 24685, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 27.3672, loss_bbox: nan, loss: nan, grad_norm: nan
INFO:mmdet:Epoch [1][10/39089] lr: 3.751e-08, eta: 16 days, 22:58:00, time: 3.123, data_time: 0.366, memory: 24685, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 27.3672, loss_bbox: nan, loss: nan, grad_norm: nan
Additionally, upon tracing the flow of the code, the feature values from the VIT Backbone are derived correctly.
However, after the update for the first iteration,
the weight values of the up1, up2, up3, up4 layers in the Neck (FPN) are updated to INF value,
during the updating process. As a result, the loss values turn out to be NaN.
despite following the guide provided by MMdetection on solving the "Loss goes Nan" issue, problems still occur.
(https://mmdetection.readthedocs.io/en/v2.16.0/faq.html)
I look forward to your solutions. Thank you.
2024-04-17 08:34:17,594 - mmdet - INFO - Environment info:
sys.platform: linux
Python: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA A40
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.13.1+cu117
PyTorch compiling details: PyTorch built with:
TorchVision: 0.14.1+cu117
OpenCV: 4.9.0
MMCV: 1.7.0
MMCV Compiler: GCC 9.4
MMCV CUDA Compiler: 11.7
MMDetection: 2.25.3+7df6b87
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.