Comments (7)
That's the same problem as mine.
At early steps, there are many positive boxes, which results in the super large cuda memory problem.
from se-ssd.
Please use a pre-trained model to initialize both teacher and student SSD.
from se-ssd.
That's the same problem as mine.
At early steps, there are many positive boxes, which results in the super large cuda memory problem.
Has the problem been solved?
from se-ssd.
done
from se-ssd.
Traceback (most recent call last):
File "train.py", line 124, in
main()
File "train.py", line 119, in main
train_detector(model, datasets, cfg, distributed=distributed, validate=args.validate,logger=logger,)
File "/media/ubuntu-502/pan1/tony_data/SESSD/SE-SSD-code/det3d/torchie/apis/train_sessd.py", line 327, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
File "/media/ubuntu-502/pan1/tony_data/SESSD/SE-SSD-code/det3d/torchie/trainer/trainer_sessd.py", line 472, in run
epoch_runner(data_loaders[0], data_loaders[1], self.epoch, **kwargs)
File "/media/ubuntu-502/pan1/tony_data/SESSD/SE-SSD-code/det3d/torchie/trainer/trainer_sessd.py", line 355, in train
self.call_hook("after_train_iter") # optim_hook: backprop;
File "/media/ubuntu-502/pan1/tony_data/SESSD/SE-SSD-code/det3d/torchie/trainer/trainer_sessd.py", line 210, in call_hook
getattr(hook, fn_name)(self) # self is the param (trainer/runner) of func hook.fn_name
File "/media/ubuntu-502/pan1/tony_data/SESSD/SE-SSD-code/det3d/core/utils/dist_utils.py", line 53, in after_train_iter
runner.outputs["loss"].backward()
File "/media/ubuntu-502/pan1/tony_data/SESSD/sessd/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/media/ubuntu-502/pan1/tony_data/SESSD/sessd/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
I had downloaded this pre-trained model, but it appears above problem. I set this batch_size=1, but it can't be solved. Who have a good idea.
from se-ssd.
hi @zyxcambridge , how did you solve the problem? I set "load from = se-ssd-model.pth", but it still crash at 50 epoch
from se-ssd.
I encountered the same problem. I added the following code to trained_sessd.py in line 297 to solve this problem
"torch.backends.cudnn.enabled = False"
rusult show in:
if distributed:
model = apex.parallel.convert_syncbn_model(model)
model = DistributedDataParallel(
model.cuda(cfg.local_rank),
device_ids=[cfg.local_rank],
output_device=cfg.local_rank,
# broadcast_buffers=False,
find_unused_parameters=True,
)
else:
model = model.cuda()
torch.backends.cudnn.enabled = False
logger.info(f"model structure: {model}")
model_ema = copy.deepcopy(model)
from se-ssd.
Related Issues (20)
- det3d.visualization.kitti_data_vis.kitti.kitti_object import show_lidar_with_boxes_rect
- How SE SSD Evaluate 3 D Bounding Box IOU ?
- 明明是匹配的都是10.1版本,还显示NO supported GPU capabilities found。
- Why use intensity transform in sa_da
- Error in training HOT 3
- About student loss HOT 1
- Density or reflectivity? HOT 1
- About where is "kitti_infos_train.pkl" HOT 1
- Training a teacher network (custom dataset) HOT 1
- The problem about run train.py HOT 1
- how to update ema variables in different models, such as a big model and a small model ; HOT 2
- About the division of KITTI training dataset HOT 3
- question about ssl training HOT 2
- SE-SSD ALL IN TensorRT HOT 2
- RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'mat2'
- Install error "relocation R_X86_64_PC32" when "python setup.py build develop"
- SA-DA HOT 3
- Problems in installing spconv! HOT 2
- Could you update the requirements file with version of packages HOT 2
- Sa_ Da_ V2.py file call
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from se-ssd.