jittor / jdet Goto Github PK

JDet is an object detection benchmark based on Jittor. Mainly focus on aerial image object detection (oriented object detection).

License: Apache License 2.0

Python 98.16% Shell 0.02% Cuda 1.83%

aerial-image-detection deep-learning jittor object-detection oriented-object-detection

jdet's Introduction

Jittor: a Just-in-time(JIT) deep learning framework

Quickstart | Install | Tutorial | 简体中文

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators. The whole framework and meta-operators are compiled just-in-time. A powerful op compiler and tuner are integrated into Jittor. It allowed us to generate high-performance code with specialized for your model. Jittor also contains a wealth of high-performance model libraries, including: image recognition, detection, segmentation, generation, differentiable rendering, geometric learning, reinforcement learning, etc. .

The front-end language is Python. Module Design and Dynamic Graph Execution is used in the front-end, which is the most popular design for deeplearning framework interface. The back-end is implemented by high performance language, such as CUDA,C++.

Quickstart

We provide some jupyter notebooks to help you quick start with Jittor.

Install

Jittor environment requirements:

OS	CPU	Python	Compiler	(Optional) GPU platform
Linux (Ubuntu, CentOS, Arch, UOS, KylinOS, ...)	x86 x86_64 ARM loongson	>= 3.7	g++ >=5.4	Nvidia CUDA >= 10.0, cuDNN or AMD ROCm >= 4.0 or Hygon DCU DTK >= 22.04
macOS (>= 10.14 Mojave)	intel Apple Silicon	>= 3.7	clang >= 8.0	-
Windows 10 & 11	x86_64	>= 3.8	-	Nvidia CUDA >= 10.2 cuDNN

Jittor offers three ways to install: pip, docker, or manual.

Pip install

sudo apt install python3.7-dev libomp-dev
python3.7 -m pip install jittor
# or install from github(latest version)
# python3.7 -m pip install git+https://github.com/Jittor/jittor.git
python3.7 -m jittor.test.test_example

macOS install

Please first install additional dependencies with homebrew.

brew install libomp

Then you can install jittor through pip and run the example.

python3.7 -m pip install jittor
python3.7 -m jittor.test.test_example

Currently jittor only supports CPU on macOS.

Windows install

# check your python version(>=3.8)
python --version
python -m pip install jittor
# if conda is used
conda install pywin32

In Windows, jittor will automatically detect and install CUDA, please make sure your NVIDIA driver support CUDA 10.2 or above, or you can manually let jittor install CUDA for you:

python -m jittor_utils.install_cuda

Docker Install

We provide a Docker installation method to save you from configuring the environment. The Docker installation method is as follows:

# CPU only(Linux)
docker run -it --network host jittor/jittor
# CPU and CUDA(Linux)
docker run -it --network host --gpus all jittor/jittor-cuda
# CPU only(Mac and Windows)
docker run -it -p 8888:8888 jittor/jittor

manual install

We will show how to install Jittor in Ubuntu 16.04 step by step, Other Linux distributions may have similar commands.

Step 1: Choose your back-end compiler

# g++
sudo apt install g++ build-essential libomp-dev

# OR clang++-8
wget -O - https://raw.githubusercontent.com/Jittor/jittor/master/script/install_llvm.sh > /tmp/llvm.sh
bash /tmp/llvm.sh 8

Step 2: Install Python and python-dev

Jittor need python version >= 3.7.

sudo apt install python3.7 python3.7-dev

Step 3: Run Jittor

The whole framework is compiled Just-in-time. Let's install jittor via pip

git clone https://github.com/Jittor/jittor.git
sudo pip3.7 install ./jittor
export cc_path="clang++-8"
# if other compiler is used, change cc_path
# export cc_path="g++"
# export cc_path="icc"

# run a simple test
python3.7 -m jittor.test.test_example

if the test is passed, your Jittor is ready.

Optional Step 4: Enable CUDA

Using CUDA in Jittor is very simple, Just setup environment value nvcc_path

# replace this var with your nvcc location 
export nvcc_path="/usr/local/cuda/bin/nvcc" 
# run a simple cuda test
python3.7 -m jittor.test.test_cuda

if the test is passed, your can use Jittor with CUDA by setting use_cuda flag.

import jittor as jt
jt.flags.use_cuda = 1

Optional Step 5: Test Resnet18 training

To check the integrity of Jittor, you can run Resnet18 training test. Note: 6G GPU RAM is requires in this test.

python3.7 -m jittor.test.test_resnet

if those tests are failed, please report bugs for us, and feel free to contribute ^_^

Tutorial

In the tutorial section, we will briefly explain the basic concept of Jittor.

To train your model with Jittor, there are only three main concepts you need to know:

Var: basic data type of jittor
Operations: Jittor'op is simular with numpy

Var

First, let's get started with Var. Var is the basic data type of jittor. Computation process in Jittor is asynchronous for optimization. If you want to access the data, Var.data can be used for synchronous data accessing.

import jittor as jt
a = jt.float32([1,2,3])
print (a)
print (a.data)
# Output: float32[3,]
# Output: [ 1. 2. 3.]

And we can give the variable a name.

a.name('a')
print(a.name())
# Output: a

Operations

Jittor'op is simular with numpy. Let's try some operations. We create Var a and b via operation jt.float32, and add them. Printing those variables shows they have the same shape and dtype.

import jittor as jt
a = jt.float32([1,2,3])
b = jt.float32([4,5,6])
c = a*b
print(a,b,c)
print(type(a), type(b), type(c))
# Output: float32[3,] float32[3,] float32[3,]
# Output: <class 'jittor_core.Var'> <class 'jittor_core.Var'> <class 'jittor_core.Var'>

Beside that, All the operators we used jt.xxx(Var, ...) have alias Var.xxx(...). For example:

c.max() # alias of jt.max(c)
c.add(a) # alias of jt.add(c, a)
c.min(keepdims=True) # alias of jt.min(c, keepdims=True)

if you want to know all the operation which Jittor supports. try help(jt.ops). All the operation you found in jt.ops.xxx, can be used via alias jt.xxx.

help(jt.ops)
# Output:
#   abs(x: core.Var) -> core.Var
#   add(x: core.Var, y: core.Var) -> core.Var
#   array(data: array) -> core.Var
#   binary(x: core.Var, y: core.Var, op: str) -> core.Var
#   ......

If you want to know more about Jittor, please check out the notebooks below:

Quickstart
Advanced
- Custom Op: write your operator with C++ and CUDA and JIT compile it
- Profiler: Profiling your model
- Jtune: Tool for performance tuning

Those notebooks can be started in your own computer by python3.7 -m jittor.notebook

Contributing

Jittor is still young. It may contain bugs and issues. Please report them in our bug track system. Contributions are welcome. Besides, if you have any ideas about Jittor, please let us know.

You can help Jittor in the following ways:

Citing Jittor in your paper
recommend Jittor to your friends
Contributing code
Contributed tutorials and documentation
File an issue
Answer jittor related questions
Light up the stars
Keep an eye on jittor
......

Contact Us

Website: http://cg.cs.tsinghua.edu.cn/jittor/

Email: [email protected]

File an issue: https://github.com/Jittor/jittor/issues

QQ Group: 836860279

The Team

Jittor is currently maintained by the Tsinghua CSCG Group. If you are also interested in Jittor and want to improve it, Please join us!

Citation

@article{hu2020jittor,
  title={Jittor: a novel deep learning framework with meta-operators and unified graph execution},
  author={Hu, Shi-Min and Liang, Dun and Yang, Guo-Ye and Yang, Guo-Wei and Zhou, Wen-Yang},
  journal={Science China Information Sciences},
  volume={63},
  number={222103},
  pages={1--21},
  year={2020}
}

License

Jittor is Apache 2.0 licensed, as found in the LICENSE.txt file.

jdet's People

Contributors

Stargazers

Watchers

jdet's Issues

如何设置max_epoch

默认的epoch是30，如何设置max_epoch。

SyntaxError: invalid syntax

Extracting jdet-0.2.0.0-py3.7.egg to /environment/miniconda3/lib/python3.7/site-packages File "/environment/miniconda3/lib/python3.7/site-packages/jdet-0.2.0.0-py3.7.egg/jdet/models/roi_heads/rretina_head.py", line 964 outs =

def execute(self,features):
outs =

Tesla v100 for training error

你好，服务器上有两张卡。一张3090，一张v100,在测试同一个程序时，v100报错但是3090没报错，请问这个是什么问题呢？我尝试了调小batchsize不行，也试了使用其他框架在这张卡是训练是正常的

发生异常: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
�[38;5;1m[f 0919 16:27:01.178698 16 executor.cc:665]
Execute fused operator(41/49) failed.
[JIT Source]: /home/lwb/.cache/jittor/jt1.3.5/g++7.5.0/py3.7.13/Linux-5.15.0-4xe4/IntelRXeonRGolxda/default/cu11.1.74_sm_70_86/jit/cutt_transpose__T_1__JIT_1__JIT_cuda_1__index_t_int32_hash_e6e42f8f6f3e9195_op.cc
[OP TYPE]: cutt_transpose
[Input]: float32[10,238950,],
[Output]: float32[238950,10,],
[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3
[Reason]: cudaFuncSetSharedMemConfig(transposePacked<float, 1>, cudaSharedMemBankSizeFourByte ) in file /home/lwb/.cache/jittor/cutt/cutt-1.2/src/calls.h:2, function cuttKernelSetSharedMemConfig
Error message: invalid device function�[m

Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:

export JT_SYNC=1
export trace_py_var=3
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/contrib.py", line 183, in getitem
return getitem(x, slices.where())
File "/home/lwb/work/code/jdet/python/jdet/models/boxes/assigner.py", line 148, in assign_wrt_overlaps
assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1
File "/home/lwb/work/code/jdet/python/jdet/models/boxes/assigner.py", line 108, in assign
assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
File "/home/lwb/work/code/jdet/python/jdet/models/roi_heads/oriented_rpn_head.py", line 314, in _get_targets_single
assign_result = self.assigner.assign(anchors, target_bboxes, target_bboxes_ignore, None if self.sampling else gt_labels)
File "/home/lwb/work/code/jdet/python/jdet/utils/general.py", line 53, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/lwb/work/code/jdet/python/jdet/models/roi_heads/oriented_rpn_head.py", line 383, in get_targets
all_bbox_weights, pos_inds_list, neg_inds_list, sampling_results_list) = multi_apply(self._get_targets_single, anchor_list, valid_flag_list, targets)
File "/home/lwb/work/code/jdet/python/jdet/models/roi_heads/oriented_rpn_head.py", line 464, in loss
labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,num_total_pos, num_total_neg = self.get_targets(anchor_list, valid_flag_list, targets)
File "/home/lwb/work/code/jdet/python/jdet/models/roi_heads/oriented_rpn_head.py", line 494, in execute
losses = self.loss(*outs,targets)
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/init.py", line 951, in call
return self.execute(*args, **kw)
File "/home/lwb/work/code/jdet/python/jdet/models/networks/cascade_orcnn.py", line 47, in execute
proposals_list, rpn_losses = self.rpn(features,targets)
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/init.py", line 951, in call
return self.execute(*args, **kw)
File "/home/lwb/work/code/jdet/python/jdet/runner/runner.py", line 126, in train
losses = self.model(images,targets)
File "/home/lwb/work/code/jdet/python/jdet/runner/runner.py", line 84, in run
self.train()
File "/home/lwb/work/code/jdet/tools/run_net.py", line 47, in main
runner.run()
File "/home/lwb/work/code/jdet/tools/run_net.py", line 56, in
main()
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/lwb/anaconda3/envs/jdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main (Current frame)
"main", mod_spec)

这个是config文件，应该不是代码的问题，因为3090是可以正常跑的

model settings

model = dict(
type='CascadeORCNN',
backbone=dict(
type='Resnet50',
frozen_stages=1,
return_stages=["layer1","layer2","layer3","layer4"],
pretrained= True),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn = dict(
type = "OrientedRPNHead",
in_channels=256,
num_classes=1,
min_bbox_size=0,
nms_thresh=0.8,
nms_pre=2000,
nms_post=2000,
feat_channels=256,
bbox_type='obb',
reg_dim=6,
background_label=0,
reg_decoded_bbox=False,
pos_weight=-1,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='MidpointOffsetCoder',
target_means=[.0, .0, .0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
loss_cls=dict(type='CrossEntropyLossForRcnn', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1,
match_low_quality=True,
assigned_labels_filled=-1,
),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False)
),
roi_head=dict(
type='OBBCascadeRoIHead',
num_stages=3,
stage_loss_weights=[1,0.5,0.25],
bbox_roi_extractor=dict(
type='OrientedSingleRoIExtractor',
roi_layer=dict(type='ROIAlignRotated_v1', output_size=7, sampling_ratio=2),
out_channels=256,
extend_factor=(1.4, 1.2),
featmap_strides=[4, 8, 16, 32]),
bbox_head=[
dict(
type='SharedFCBBoxHeadRbbox',
start_bbox_type='obb',
end_bbox_type='obb',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=11,
bbox_coder=dict(
type='OrientedDeltaXYWHTCoder',
target_means=[0., 0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2, 0.1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss'),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='SharedFCBBoxHeadRbbox',
start_bbox_type='obb',
end_bbox_type='obb',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=11,
bbox_coder=dict(
type='OrientedDeltaXYWHTCoder',
target_means=[0., 0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2, 0.1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss'),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='SharedFCBBoxHeadRbbox',
start_bbox_type='obb',
end_bbox_type='obb',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=11,
bbox_coder=dict(
type='OrientedDeltaXYWHTCoder',
target_means=[0., 0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2, 0.1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss'),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0))]
),
train_cfg = dict(
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1,
iou_calculator=dict(type='BboxOverlaps2D_rotated_v1')),
sampler=dict(
type='RandomSamplerRotated',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.6,
neg_iou_thr=0.6,
min_pos_iou=0.6,
match_low_quality=False,
ignore_iof_thr=-1,
iou_calculator=dict(type='BboxOverlaps2D_rotated_v1')),
sampler=dict(
type='RandomSamplerRotated',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.7,
min_pos_iou=0.7,
match_low_quality=False,
ignore_iof_thr=-1,
iou_calculator=dict(type='BboxOverlaps2D_rotated_v1')),
sampler=dict(
type='RandomSamplerRotated',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)
])
)

dataset = dict(
train=dict(
type="FAIR1M_1_5_Dataset",
dataset_dir='/home/lwb/work/code/jdet/data/Fair1m1_5/split_aug_ss/train_1024_200_1.0',
transforms=[
dict(
type="RotatedResize",
min_size=1024,
max_size=1024
),
dict(
type='RotatedRandomFlip',
direction="horizontal",
prob=0.5),
dict(
type='RotatedRandomFlip',
direction="vertical",
prob=0.5),
# dict(
# type="RandomRotateAug",
# random_rotate_on=True,
# ),
dict(
type = "Pad",
size_divisor=32),
dict(
type = "Normalize",
mean = [123.675, 116.28, 103.53],
std = [58.395, 57.12, 57.375],
to_bgr=False,)

    ],
    batch_size=1,
    num_workers=4,
    shuffle=True,
    filter_empty_gt=False,
    balance_category=False
),
val=dict(
    type="FAIR1M_1_5_Dataset",
    dataset_dir='/home/lwb/work/code/jdet/data/Fair1m1_5/split_aug_ss/train_1024_200_1.0',
    transforms=[
        dict(
            type="RotatedResize",
            min_size=1024,
            max_size=1024
        ),
        dict(
            type = "Pad",
            size_divisor=32),
        dict(
            type = "Normalize",
            mean =  [123.675, 116.28, 103.53],
            std = [58.395, 57.12, 57.375],
            to_bgr=False,),
    ],
    batch_size=2,
    num_workers=4,
    shuffle=False
),
test=dict(
    type="ImageDataset",
    images_dir='/home/lwb/work/code/jdet/data/Fair1m1_5/split_aug_ss/test_1024_200_1.0/images',
    transforms=[
        dict(
            type="RotatedResize",
            min_size=1024,
            max_size=1024
        ),
        dict(   
            type = "Pad",
            size_divisor=32),
        dict(
            type = "Normalize",
            mean =  [123.675, 116.28, 103.53],
            std = [58.395, 57.12, 57.375],
            to_bgr=False,),
    ],
    dataset_type="FAIR1M_1_5",
    num_workers=4,
    batch_size=1,
)

)

optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=0.0001, grad_clip=dict(max_norm=35, norm_type=2))

scheduler = dict(
type='StepLR',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
milestones=[7, 10])

logger = dict(
type="RunLogger")

when we the trained model from cshuan, image is rgb

max_epoch = 12
eval_interval = 100
checkpoint_interval = 1
log_interval = 50

YOLO 数据准备

在使用YOLO网络的时候，报了个train: Scanning '/root/autodl-tmp/data/train/labels.cache' for images and labels... 0 found, 4000 missing, 0 empty, 0 corrupted: 100%|███████████████| 4000/4000 [00:00<?, ?it/s] 这个的错误，我在教程里没发现具体的数据准备的过程，请问这是什么原因，我的目录结构已经按照yolo的那个 readme 准备完成

fair1m_1_5 baseline无法运行

测试环境

windows10 21H2
wsl2 ubuntu 22.04 LTS 4.19.128-microsoft-standard
miniconda3+python3.7
cuda 11.7 显卡型号1060

错误

使用CUDA时

Loading config from: configs/s2anet/s2anet_r50_fpn_1x_fair1m_1_5.py
[w 0809 11:21:49.947748 32 init.py:1344] load parameter fc.weight failed ...
[w 0809 11:21:49.947908 32 init.py:1344] load parameter fc.bias failed ...
[w 0809 11:21:50.017176 32 init.py:1363] load total 267 params, 2 failed
Tue Aug 9 11:21:50 2022 Start running
Traceback (most recent call last):
File "tools/run_net.py", line 56, in
main()
File "tools/run_net.py", line 47, in main
runner.run()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 84, in run
self.train()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 126, in train
losses = self.model(images,targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/networks/s2anet.py", line 35, in execute
outputs = self.bbox_head(features, targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 627, in execute
return self.loss(*outs,*self.parse_targets(targets))
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 360, in loss
sampling=self.sampling)
File "/home/la/JT/JDet/python/jdet/models/boxes/anchor_target.py", line 74, in anchor_target
unmap_outputs=unmap_outputs)
File "/home/la/JT/JDet/python/jdet/utils/general.py", line 53, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/la/JT/JDet/python/jdet/models/boxes/anchor_target.py", line 127, in anchor_target_single
if not inside_flags.any(0):
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 1735, in to_bool
return ori_bool(v.item())
RuntimeError: [f 0809 11:21:58.030555 32 executor.cc:665]
Execute fused operator(26/2009) failed.
[JIT Source]: /home/la/.cache/jittor/jt1.3.5/g++11.2.0/py3.7.13/Linux-4.19.128xc4/IntelRCoreTMi5xbf/default/cu11.7.99/jit/__opkey0_broadcast_to__Tx_float32__DIM_7__BCAST_19__opkey1_reindex__Tx_float32__XDIM_4__YD___hash_8f91e55bdd99985a_op.cc
[OP TYPE]: fused_op:( broadcast_to, reindex, binary.multiply, reduce.add,)
[Input]: float32[64,64,3,3,]backbone.layer1.0.conv2.weight, float32[2,64,256,256,],
[Output]: float32[2,64,256,256,],
[Async Backtrace]: ---
tools/run_net.py:56 <>
tools/run_net.py:47

/home/la/JT/JDet/python/jdet/runner/runner.py:84
/home/la/JT/JDet/python/jdet/runner/runner.py:126
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/JT/JDet/python/jdet/models/networks/s2anet.py:30
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/JT/JDet/python/jdet/models/backbones/resnet.py:166
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/nn.py:2054
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/JT/JDet/python/jdet/models/backbones/resnet.py:84
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py:950 <call>
/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/nn.py:847
[Reason]: [f 0809 11:21:58.030132 32 helper_cuda.h:128] CUDA error at /home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/src/mem/allocator/cuda_managed_allocator.cc:23 code=2( cudaErrorMemoryAllocation ) cudaMallocManaged(&ptr, size)

加入参数--no_cuda

Tue Aug 9 11:15:30 2022 Start running
Traceback (most recent call last):
File "tools/run_net.py", line 56, in
main()
File "tools/run_net.py", line 47, in main
runner.run()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 84, in run
self.train()
File "/home/la/JT/JDet/python/jdet/runner/runner.py", line 126, in train
losses = self.model(images,targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/networks/s2anet.py", line 35, in execute
outputs = self.bbox_head(features, targets)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 625, in execute
outs = multi_apply(self.forward_single, feats, self.anchor_strides)
File "/home/la/JT/JDet/python/jdet/utils/general.py", line 53, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 236, in forward_single
align_feat = self.align_conv(x, refine_anchor.clone(), stride)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 722, in execute
x = self.relu(self.deform_conv(x, offset_tensor))
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 950, in call
return self.execute(*args, **kw)
File "/home/la/JT/JDet/python/jdet/ops/dcn_v1.py", line 696, in execute
self.dilation, self.groups, self.deformable_groups)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 1603, in apply
return func(*args, **kw)
File "/home/la/miniconda3/envs/JT/lib/python3.7/site-packages/jittor/init.py", line 1559, in call
ori_res = self.execute(*args)
File "/home/la/JT/JDet/python/jdet/ops/dcn_v1.py", line 589, in execute
raise NotImplementedError
NotImplementedError

已经进行的操作

搜索了一下， code=2( cudaErrorMemoryAllocation )似乎和内存有关，当程序需要的内存不足时会报错，在本res里面搜索，发现有个issue有类似的错误代码，但不知道如何解决
而加入不使用cuda的参数后报错我也不是很理解，只知道是子类没有实现父类要求一定要实现的接口

cuda问题

你好，我的cuda版本为10.2，在执行python tools/run_net.py --config-file=configs/s2anet/s2anet_r18_fpn_1x_kaggle_rotate_ms.py --task=train后出现以下问题：

Traceback (most recent call last):
  File "tools/run_net.py", line 56, in <module>
    main()
  File "tools/run_net.py", line 46, in main
    runner.run()
  File "/home/xx/xx/s2a/JDet-s2a/python/jdet/runner/runner.py", line 87, in run
    self.train()
  File "/home/xx/xx/s2a/JDet-s2a/python/jdet/runner/runner.py", line 133, in train
    if all_loss > 10:
  File "/home/xx/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/__init__.py", line 1427, in to_bool
    return ori_bool(v.item())
RuntimeError: [f 1214 20:38:47.244717 36 executor.cc:584] 
Execute fused operator(1199/1444) failed. 
[JIT Source]: /home/xx/.cache/jittor/jt1.3.1/g++7.5.0/py3.7.11/Linux-4.15.0-1x19/IntelRXeonRSilxb7/default/cu11.2.152_sm_75/jit/_opkey0_broadcast_to_Tx_float32__DIM_3__BCAST_1__JIT_1__JIT_cuda_1__index_t_int32___opkey1___hash_7c8decc46b39eb60_op.cc 
[OP TYPE]: fused_op:( broadcast_to, broadcast_to, binary.multiply, reduce.add,)
[Input]: float32[2304,128,], float32[256,2304,], 
[Output]: float32[256,128,], 
[Async Backtrace]: --- 
     tools/run_net.py:56 <<module>> 
     tools/run_net.py:46 <main> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/runner/runner.py:87 <run> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/runner/runner.py:131 <train> 
     /home/xx/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/__init__.py:776 <__call__> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/models/networks/s2anet.py:35 <execute> 
     /home/xx/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/__init__.py:776 <__call__> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/models/roi_heads/s2anet_head.py:625 <execute> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/utils/general.py:52 <multi_apply> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/models/roi_heads/s2anet_head.py:236 <forward_single> 
     /home/xx/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/__init__.py:776 <__call__> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/models/roi_heads/s2anet_head.py:722 <execute> 
     /home/xx/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/__init__.py:776 <__call__> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/ops/dcn_v1.py:696 <execute> 
     /home/xx/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/__init__.py:1296 <apply> 
     /home/xx/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/__init__.py:1252 <__call__> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/ops/dcn_v1.py:598 <execute> 
     /home/xx/xx/s2a/JDet-s2a/python/jdet/ops/dcn_v1.py:447 <deform_conv_forward_cuda> 
     /home/x/anaconda3/envs/jdet2/lib/python3.7/site-packages/jittor/nn.py:151 <matmul> 
[Reason]: [f 1214 20:38:47.244082 36 helper_cuda.h:126] CUDA error at /home/xx/.cache/jittor/jt1.3.1/g++7.5.0/py3.7.11/Linux-4.15.0-1x19/IntelRXeonRSilxb7/default/cu11.2.152_sm_75/jit/cublas_matmul_T_float32__Trans_a_N__Trans_b_N__op_S__JIT_1__JIT_cuda_1__index_t_int32__hash_bde4b1ed93632c6c_op.cc:51  code=7( CUBLAS_STATUS_INVALID_VALUE ) cublasSgemm(handle_, CUBLAS_OP_N, CUBLAS_OP_N, k, n, m, &alpha, b->ptr<T>(), 'N' == 'N' ? k : m, a->ptr<T>(), 'N' == 'N' ? m : n, &beta, c->ptr<T>(), k)

请问一个如何解决？谢谢。

BrokenPipeError: [Errno 32] Broken pipe

我根据JDet/projects/retinanet/README.md文件当中的链接https://cloud.tsinghua.edu.cn/f/f12bb566d4be43bfbdc7/下载文件ckpt_30.pkl，并复制到目录JDet/projects/retinanet/work_dirs/retinanet_gaofen/checkpoints当中，运行命令
python run_net.py --config-file=configs/retinanet_gaofen.py --task=train。由于已经存在checkpoint文件，测试过程正常，但最后打包的过程报错。如下所示：
Sat Sep 4 23:39:41 2021 Loading model parameters from work_dirs/retinanet_gaofen/checkpoints/ckpt_30.pkl
Sat Sep 4 23:39:41 2021 Start running
Sat Sep 4 23:39:41 2021 Testing...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1126/1126 [34:38<00:00, 1.85s/it]
Merge results...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36020/36020 [18:55<00:00, 31.72it/s]
已杀死
(base) wenzhao@wenzhao-MS-7C82:~/SoftwareFactory/detection/JDet/JDet/projects/retinanet$ Process ForkPoolWorker-16:
Traceback (most recent call last):
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/queues.py", line 368, in put
self._writer.send_bytes(obj)
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/pool.py", line 136, in worker
put((job, i, (False, wrapped)))
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/queues.py", line 368, in put
self._writer.send_bytes(obj)
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-23:
Traceback (most recent call last):
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/queues.py", line 368, in put
self._writer.send_bytes(obj)
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)
File "/home/wenzhao/anaconda3/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
...
谢谢。

为何 poly2obb 函数中要对 cv2.minAreaRect 返回的角度取负号？

请问一下，为何 JDet 的 poly2obb 函数会对 cv2.minAreaRect 返回的角度取负号（angle = -angle 或 angle = -90 - angle）？

    for poly in polys_np:
        (x, y), (w, h), angle = cv2.minAreaRect(poly)
        if w >= h:
            angle = -angle
        else:
            w, h = h, w
            angle = -90 - angle

自己推理了一下，感觉没必要取负号。作为佐证，贴一段 MMRotate 的 poly2obb_np_le90 函数，相同功能，但没有取负号。

    rbbox = cv2.minAreaRect(bboxps)
    x, y, w, h, a = rbbox[0][0], rbbox[0][1], rbbox[1][0], rbbox[1][1], rbbox[2]
    if w < h:
        w, h = h, w
        a += np.pi / 2

请大佬不吝赐教~

CUDA error

问题描述

执行python3.7 run_net.py --config-file=configs/retinanet_gaofen.py --task=train后报错CUDA error

完整日志

XXX@DESKTOP-8B01LP5:/mnt/e/cpt/JDet-master/projects/retinanet$ python3.7 run_net.py --config-file=configs/retinanet_gaofen.py --task=train

[i 0914 20:35:15.018271 64 compiler.py:869] Jittor(1.2.3.101) src: /home/llc/.local/lib/python3.7/site-packages/jittor
[i 0914 20:35:15.024461 64 compiler.py:870] g++ at /usr/bin/g++(7.5.0)
[i 0914 20:35:15.024553 64 compiler.py:871] cache_path: /home/llc/.cache/jittor/default/g++
[i 0914 20:35:15.319920 64 install_cuda.py:37] cuda_driver_version: [11, 6]
[i 0914 20:35:15.337710 64 init.py:286] Found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc(11.2.152) at /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc.
[i 0914 20:35:15.403338 64 init.py:286] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0914 20:35:15.491815 64 compiler.py:959] py_include: -I/usr/include/python3.7m -I/usr/include/python3.7m
[i 0914 20:35:15.579729 64 compiler.py:961] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0914 20:35:15.719783 64 init.py:178] Total mem: 7.75GB, using 2 procs for compiling.
[i 0914 20:35:16.493494 64 jit_compiler.cc:22] Load cc_path: /usr/bin/g++
[i 0914 20:35:16.493646 64 init.cc:57] Found cuda archs: [75,]
[i 0914 20:35:16.641731 64 compile_extern.py:451] mpicc not found, distribution disabled.
[i 0914 20:35:16.717446 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include/cublas.h
[i 0914 20:35:16.739669 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcublas.so
[i 0914 20:35:16.739794 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcublasLt.so.11
[i 0914 20:35:17.317255 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include/cudnn.h
[i 0914 20:35:17.341903 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn.so.8
[i 0914 20:35:17.341998 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_ops_infer.so.8
[i 0914 20:35:17.349224 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_ops_train.so.8
[i 0914 20:35:17.350055 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_cnn_infer.so.8
[i 0914 20:35:17.395974 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_cnn_train.so.8
[i 0914 20:35:17.411565 64 compiler.py:667] handle pyjt_include/home/llc/.local/lib/python3.7/site-packages/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
[i 0914 20:35:17.923592 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include/curand.h
[i 0914 20:35:17.950855 64 compile_extern.py:20] found /home/llc/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcurand.so
[i 0914 20:35:18.847675 64 cuda_flags.cc:26] CUDA enabled.
Loading config from: configs/retinanet_gaofen.py
[e 0914 20:35:22.246316 64 init.py:996] load parameter rpn_net.retina_cls.weight failed: expect the shape of rpn_net.retina_cls.weight to be [777,256,3,3,], but got [315,256,3,3,]
[e 0914 20:35:22.246449 64 init.py:996] load parameter rpn_net.retina_cls.bias failed: expect the shape of rpn_net.retina_cls.bias to be [777,], but got [315,]
[w 0914 20:35:22.246808 64 init.py:998] load total 311 params, 2 failed
Tue Sep 14 20:35:22 2021 Loading model parameters from weights/yx_init_pretrained.pk_jt.pk
Tue Sep 14 20:35:22 2021 Loading model parameters from work_dirs/retinanet_gaofen/checkpoints/ckpt_30.pkl
Tue Sep 14 20:35:22 2021 Start running
Tue Sep 14 20:35:22 2021 Testing...
0%| | 0/1126 [00:00<?, ?it/s]
[e 0914 20:35:28.524608 64 executor.cc:527]
=== display_memory_info ===
total_cpu_ram: 7.75GB total_cuda_ram: 24GB
hold_vars: 587 lived_vars: 3579 lived_ops: 3546
update queue: 311/311
name: sfrl is_cuda: 1 used: 210.1MB(94.6%) unused: 11.94MB(5.38%) total: 222MB
name: sfrl is_cuda: 1 used: 367.1MB(92%) unused: 31.85MB(7.98%) total: 399MB
name: sfrl is_cuda: 0 used: 367.1MB(92%) unused: 31.85MB(7.98%) total: 399MB
name: sfrl is_cuda: 0 used: 180.5KB(17.6%) unused: 843.5KB(82.4%) total: 1MB
name: temp is_cuda: 0 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
name: temp is_cuda: 1 used: 0 B(-nan%) unused: 0 B(-nan%) total: 0 B
cpu&gpu: 1021MB gpu: 621MB cpu: 400MB
free: cpu(922.4MB) gpu(22.09GB)

[e 0914 20:35:28.525250 64 executor.cc:531] [Error] source file location: /home/llc/.cache/jittor/default/g++/jit/_opkey0:broadcast_to_Tx:float32__DIM=7__BCAST=19__JIT:1__JIT_cuda:1__index_t:int32___opkey...hash:7e74aa6468b00eb_op.c
c
0%| | 0/1126 [00:05<?, ?it/s]
Traceback (most recent call last):
File "run_net.py", line 54, in
main()
File "run_net.py", line 45, in main
runner.run()
File "/usr/local/lib/python3.7/dist-packages/jdet-0.1.0.0-py3.7.egg/jdet/runner/runner.py", line 89, in run
self.test()
File "/home/llc/.local/lib/python3.7/site-packages/jittor/init.py", line 89, in inner
ret = func(*args, **kw)
File "/home/llc/.local/lib/python3.7/site-packages/jittor/init.py", line 257, in inner
ret = func(*args, **kw)
File "/usr/local/lib/python3.7/dist-packages/jdet-0.1.0.0-py3.7.egg/jdet/runner/runner.py", line 197, in test
result = self.model(images,targets)
File "/home/llc/.local/lib/python3.7/site-packages/jittor/init.py", line 737, in call
return self.execute(*args, **kw)
File "/usr/local/lib/python3.7/dist-packages/jdet-0.1.0.0-py3.7.egg/jdet/models/networks/retinanet.py", line 64, in execute
results,losses = self.rpn_net(features, targets)
File "/home/llc/.local/lib/python3.7/site-packages/jittor/init.py", line 737, in call
return self.execute(*args, **kw)
File "/usr/local/lib/python3.7/dist-packages/jdet-0.1.0.0-py3.7.egg/jdet/models/roi_heads/retina_head.py", line 351, in execute
results = self.get_bboxes(all_proposals,all_bbox_pred,all_cls_score,targets)
File "/usr/local/lib/python3.7/dist-packages/jdet-0.1.0.0-py3.7.egg/jdet/models/roi_heads/retina_head.py", line 231, in get_bboxes
jt.sync([bbox_j, score_j])
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.sync)).

Types of your inputs are:
self = module,
args = (list, ),

The function declarations are:
void sync(const vector<VarHolder*>& vh=vector<VarHolder*>(), bool device_sync=false)

Failed reason:[f 0914 20:35:28.525344 64 executor.cc:533] Execute fused operator(116/574) failed: [Op(0x2d46dcb0:0:0:1:i1:o1:s0,broadcast_to->0x2de7ec90),Op(0x2d32f690:0:0:1:i1:o1:s0,reindex->0x2d433bc0),Op(0x2e5f0e30:0:0:1:i2:o1:s0
,binary.multiply->0x2de764c0),Op(0x2e5f3e30:0:0:1:i1:o1:s0,reduce.add->0x2de7a8c0),]

Reason: [f 0914 20:35:28.524532 64 helper_cuda.h:126] CUDA error at /home/llc/.local/lib/python3.7/site-packages/jittor/src/mem/allocator/cuda_managed_allocator.cc:23 code=2( cudaErrorMemoryAllocation ) cudaMallocManaged(&ptr, size
)

Within your framework, is pretrained weights available? Where can i get it?

对Oriented R-CNN在FAIR数据集上训练时没有很好收敛

使用基于 projects/oriented_rcnn/configs/oriented_rcnn_r50_fpn_1x_dota_with_flip_rotate_balance_cate.py 扩展的配置文件对Oriented R-CNN在FAIR数据集上进行训练时并没有很好收敛。12个Epoch之后mAP为0.02左右。调整学习率并没有什么改善。请问是配置文件里除了数据集与 num_classes 之外还有什么其他地方需要调整吗？

我现在使用的配置文件如下：

_base_ = "oriented_rcnn_r50_fpn_1x_dota_with_flip_rotate_balance_cate.py"

model = dict(
    bbox_head=dict(
        num_classes=37,
    )
)

dataset = dict(
    train=dict(
        type="FAIRDataset",
        dataset_dir="/scratch/xxx/FAIR1M2/processed_1024/trainval_1024_200_1.0",
        batch_size=8,
        num_workers=18,
    ),
    val=dict(
        type="FAIRDataset",
        dataset_dir="/scratch/xxx/FAIR1M2/processed_1024/trainval_1024_200_1.0",
        batch_size=8,
        num_workers=18,
    ),
    test=dict(
        dataset_type="FAIR",
        images_dir="/scratch/xxx/FAIR1M2/processed_1024/test_1024_200_1.0/images/",
        batch_size=8,
        num_workers=18,
    ),
)

max_epoch = 12

谢谢。

zip error: Nothing to do! (try: zip -rj -q submit_zips/s2anet_r50_fpn_1x_dota.zip . -i work_dirs/s2anet_r50_fpn_1x_dota_bs2_test/test/submit_12/after_nms/*)

Loading config from: configs/s2anet_r50_fpn_1x_dota.py
[w 0828 08:21:51.503818 84 init.py:980] load parameter fc.weight failed ...
[w 0828 08:21:51.503950 84 init.py:980] load parameter fc.bias failed ...
[w 0828 08:21:51.504010 84 init.py:998] load total 267 params, 2 failed
Sat Aug 28 08:21:55 2021 Loading model parameters from work_dirs/s2anet_r50_fpn_1x_dota_bs2_test/checkpoints/ckpt_12.pkl
Sat Aug 28 08:21:55 2021 Testing...
0it [00:02, ?it/s]
Merge results...
0it [00:00, ?it/s]
zip..

zip error: Nothing to do! (try: zip -rj -q submit_zips/s2anet_r50_fpn_1x_dota.zip . -i work_dirs/s2anet_r50_fpn_1x_dota_bs2_test/test/submit_12/after_nms/*)
test时出现了这个问题

Where is the code for COBB?

使用run_net.py with task='test' 调用Oriented_RCNN模型在FAIR1M_1_5数据运行时出现 list index out of range

问题报错如下：
Traceback (most recent call last):
File "/home/hexf/data/jdet_anno/tools/run_net.py", line 57, in
main()
File "/home/hexf/data/jdet_anno/tools/run_net.py", line 52, in main
runner.test()
File "/home/hexf/.conda/envs/jdet/lib/python3.8/site-packages/jittor/init.py", line 112, in inner
ret = func(*args, **kw)
File "/home/hexf/.conda/envs/jdet/lib/python3.8/site-packages/jittor/init.py", line 280, in inner
ret = func(*args, **kw)
File "/home/hexf/.conda/envs/jdet/lib/python3.8/site-packages/jdet-0.2.0.0-py3.8.egg/jdet/runner/runner.py", line 227, in test
data_merge_result(save_file,self.work_dir,self.epoch,self.cfg.name,dataset_type,self.cfg.dataset.test.images_dir)
File "/home/hexf/.conda/envs/jdet/lib/python3.8/site-packages/jdet-0.2.0.0-py3.8.egg/jdet/data/devkits/data_merge.py", line 68, in data_merge_result
data_merge(result_pkl, save_path, final_path,dataset_type)
File "/home/hexf/.conda/envs/jdet/lib/python3.8/site-packages/jdet-0.2.0.0-py3.8.egg/jdet/data/devkits/data_merge.py", line 53, in data_merge
prepare_data(result_pkl,save_path, classes)
File "/home/hexf/.conda/envs/jdet/lib/python3.8/site-packages/jdet-0.2.0.0-py3.8.egg/jdet/data/devkits/data_merge.py", line 37, in prepare_data
classname = classes[label]
IndexError: list index out of range
task='train'时可以不会出现问题，但是单独进行test时出现问题

NotImplementedError from jdet/ops/dcn_v1.py

Hi,

我在运行 https://github.com/Jittor/JDet/blob/master/docs/fair1m_1_5.md 里的 s2anet baseline 时遇到以下问题。
我已经按说明预处理了 FAIR1m1.5 数据。
修改了 configs/s2anet/s2anet_r50_fpn_1x_fair1m_1_5.py 文件里的数据路径，其他地方没有改动。

运行 python tools/run_net.py --config-file configs/s2anet/s2anet_r50_fpn_1x_fair1m_1_5.py --no_cuda 得到以下输出

[i 0816 14:20:39.752772 76 compiler.py:955] Jittor(1.3.5.3) src: /home/guangzhi/.local/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor
[i 0816 14:20:39.757749 76 compiler.py:956] g++ at /usr/bin/g++(12.1.1)
[i 0816 14:20:39.757837 76 compiler.py:957] cache_path: /home/guangzhi/.cache/jittor/jt1.3.5/g++12.1.1/py3.7.13/Linux-5.10.136xc2/AMDRyzen73700Xx25/default
[i 0816 14:20:39.761451 76 __init__.py:411] Found nvcc(11.7.99) at /opt/cuda/bin/nvcc.
[i 0816 14:20:39.768425 76 __init__.py:411] Found addr2line(2.39) at /usr/bin/addr2line.
[i 0816 14:20:39.877018 76 compiler.py:1010] cuda key:cu11.7.99_sm_
[i 0816 14:20:40.065266 76 __init__.py:227] Total mem: 62.78GB, using 16 procs for compiling.
[i 0816 14:20:40.115608 76 jit_compiler.cc:28] Load cc_path: /usr/bin/g++
[i 0816 14:20:40.117752 76 init.cc:62] Found cuda archs: []
[i 0816 14:20:40.179477 76 __init__.py:411] Found mpicc(4.1.4) at /usr/bin/mpicc.
[i 0816 14:20:40.255035 76 compile_extern.py:30] found /opt/cuda/include/cublas.h
[i 0816 14:20:40.259419 76 compile_extern.py:30] found /opt/cuda/lib64/libcublas.so
[i 0816 14:20:40.259539 76 compile_extern.py:30] found /opt/cuda/lib64/libcublasLt.so.11
[i 0816 14:20:40.273131 76 compile_extern.py:30] found /usr/include/cudnn.h
[i 0816 14:20:40.280202 76 compile_extern.py:30] found /usr/lib/libcudnn.so.8
[i 0816 14:20:40.280335 76 compile_extern.py:30] found /usr/lib/libcudnn_ops_infer.so.8
[i 0816 14:20:40.281289 76 compile_extern.py:30] found /usr/lib/libcudnn_ops_train.so.8
[i 0816 14:20:40.282017 76 compile_extern.py:30] found /usr/lib/libcudnn_cnn_infer.so.8
[i 0816 14:20:40.309235 76 compile_extern.py:30] found /usr/lib/libcudnn_cnn_train.so.8
[i 0816 14:20:40.330474 76 compile_extern.py:30] found /opt/cuda/include/curand.h
[i 0816 14:20:40.335617 76 compile_extern.py:30] found /opt/cuda/lib64/libcurand.so
[i 0816 14:20:40.342631 76 compile_extern.py:30] found /opt/cuda/include/cufft.h
[i 0816 14:20:40.350033 76 compile_extern.py:30] found /opt/cuda/lib64/libcufft.so
Loading config from:  configs/s2anet/s2anet_r50_fpn_1x_fair1m_1_5.py
[w 0816 14:20:41.049850 76 __init__.py:1344] load parameter fc.weight failed ...
[w 0816 14:20:41.049903 76 __init__.py:1344] load parameter fc.bias failed ...
[w 0816 14:20:41.050578 76 __init__.py:1363] load total 267 params, 2 failed
Tue Aug 16 14:20:41 2022 Start running
Traceback (most recent call last):
  File "tools/run_net.py", line 56, in <module>
    main()
  File "tools/run_net.py", line 47, in main
    runner.run()
  File "/home/guangzhi/codes/JDet/python/jdet/runner/runner.py", line 84, in run
    self.train()
  File "/home/guangzhi/codes/JDet/python/jdet/runner/runner.py", line 126, in train
    losses = self.model(images,targets)
  File "/home/guangzhi/.local/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/__init__.py", line 950, in __call__
    return self.execute(*args, **kw)
  File "/home/guangzhi/codes/JDet/python/jdet/models/networks/s2anet.py", line 35, in execute
    outputs = self.bbox_head(features, targets)
  File "/home/guangzhi/.local/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/__init__.py", line 950, in __call__
    return self.execute(*args, **kw)
  File "/home/guangzhi/codes/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 625, in execute
    outs = multi_apply(self.forward_single, feats, self.anchor_strides)
  File "/home/guangzhi/codes/JDet/python/jdet/utils/general.py", line 53, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/home/guangzhi/codes/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 236, in forward_single
    align_feat = self.align_conv(x, refine_anchor.clone(), stride)
  File "/home/guangzhi/.local/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/__init__.py", line 950, in __call__
    return self.execute(*args, **kw)
  File "/home/guangzhi/codes/JDet/python/jdet/models/roi_heads/s2anet_head.py", line 722, in execute
    x = self.relu(self.deform_conv(x, offset_tensor))
  File "/home/guangzhi/.local/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/__init__.py", line 950, in __call__
    return self.execute(*args, **kw)
  File "/home/guangzhi/codes/JDet/python/jdet/ops/dcn_v1.py", line 696, in execute
    self.dilation, self.groups, self.deformable_groups)
  File "/home/guangzhi/.local/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/__init__.py", line 1603, in apply
    return func(*args, **kw)
  File "/home/guangzhi/.local/anaconda3/envs/jdet/lib/python3.7/site-packages/jittor/__init__.py", line 1559, in __call__
    ori_res = self.execute(*args)
  File "/home/guangzhi/codes/JDet/python/jdet/ops/dcn_v1.py", line 589, in execute
    raise NotImplementedError
NotImplementedError

此外运行以下几个测试均没有提示错误：

python -m jittor.test.test_example
python -m jittor.test.test_resnet
python -m jittor.test.test_array
python -m jittor.test.test_cudnn_op

运行环境：

OS: Manjaro Linux
python: anaconda 3.7.13
jittor: 1.3.5.3
jdet: 0.2.0.0
GCC: 12.1.1
nvcc: 11.7.99

求帮助！

多个gpu训练出现卡住情况

模型多卡训练卡住，等待了很久

您好，我修好了，更新一下jdet就可以用啦，感谢！

dota数据处理报错munmap_chunk(): invalid pointer

您好，我遇到些问题，想请教下
我的配置文件type='DOTA1_5'
source_dataset_path='/media/asus/299D817A2D97AD94/ok_PROJs/JDet/datasets/DOTA1_5/'
target_dataset_path='/media/asus/299D817A2D97AD94/ok_PROJs/JDet/datasets/processed_DOTA1_5/'
路径没有错误，DOTA1_5下面是挑选出来的部分数据，我在其他程序上测过，没有问题。

运行数据处理，报错：
(sky) asus@asus-System-Product-Name:/media/asus/299D817A2D97AD94/ok_PROJs/JDet$ python tools/preprocess.py --config-file configs/preprocess/dota1_5_preprocess_config.py
[i 1118 17:31:28.681917 08 compiler.py:944] Jittor(1.3.1.18) src: /home/asus/anaconda3/envs/sky/lib/python3.7/site-packages/jittor
[i 1118 17:31:28.689649 08 compiler.py:945] g++ at /usr/bin/g++(7.5.0)
[i 1118 17:31:28.689716 08 compiler.py:946] cache_path: /home/asus/.cache/jittor/jt1.3.1/g++7.5.0/py3.7.9/Linux-5.4.0-89xd6/AMDRyzen93950Xxea/default
[i 1118 17:31:28.697931 08 init.py:372] Found nvcc(10.1.105) at /usr/local/cuda-10.1/bin/nvcc.
[i 1118 17:31:28.754584 08 init.py:372] Found gdb(8.1.0) at /usr/bin/gdb.
[i 1118 17:31:28.763441 08 init.py:372] Found addr2line(2.30) at /usr/bin/addr2line.
[i 1118 17:31:28.866922 08 compiler.py:997] cuda key:cu10.1.105_sm_75
[i 1118 17:31:29.054781 08 init.py:187] Total mem: 62.79GB, using 16 procs for compiling.
[i 1118 17:31:29.114659 08 jit_compiler.cc:27] Load cc_path: /usr/bin/g++
[i 1118 17:31:29.215793 08 init.cc:61] Found cuda archs: [75,]
[i 1118 17:31:29.230639 08 compile_extern.py:497] mpicc not found, distribution disabled.
[i 1118 17:31:29.271848 08 compile_extern.py:29] found /usr/local/cuda-10.1/include/cublas.h
[i 1118 17:31:29.283580 08 compile_extern.py:29] found /usr/lib/x86_64-linux-gnu/libcublas.so
[i 1118 17:31:29.283738 08 compile_extern.py:29] found /usr/lib/x86_64-linux-gnu/libcublasLt.so.10
[i 1118 17:31:29.420154 08 compile_extern.py:29] found /usr/local/cuda-10.1/include/cudnn.h
[i 1118 17:31:29.437553 08 compile_extern.py:29] found /usr/local/cuda-10.1/lib64/libcudnn.so
[i 1118 17:31:30.539454 08 compile_extern.py:29] found /usr/local/cuda-10.1/include/curand.h
[i 1118 17:31:30.572073 08 compile_extern.py:29] found /usr/local/cuda-10.1/lib64/libcurand.so
munmap_chunk(): invalid pointer
已放弃 (核心已转储)
(sky) asus@asus-System-Product-Name:/media/asus/299D817A2D97AD94/ok_PROJs/JDet$ python -V
Python 3.7.9

1

你好，训练的时候出现了下面这个问题

[Reason]: [f 0306 21:21:39.946686 04 cudnn_conv__Tx_float32__Ty_float32__Tw_float16__XFORMAT_abcd__WFORMAT_oihw__YFORMAT_abcd_____hash_2ece8b8f815db190_op.cc:407] Check failed: best_algo_idx!=-1 Something wrong... Could you please report this issue?

model.load预训练权重加载不进来

怎问model.load预训练权重加载不进来怎么回事？

projects/retinanet模型可以使用多个GPU进行训练吗？（针对retinanet_gofen.py配置文件）

ZeroDivisionError: float division by zero

Traceback (most recent call last):
File "/root/anaconda3/envs/Jit/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/root/anaconda3/envs/Jit/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/root/ZY/JDet-master/python/jdet/data/devkits/result_merge.py", line 230, in mergesingle
nameboxnmsdict = nmsbynamedict(nameboxdict, nms, nms_threshold_0)
File "/root/ZY/JDet-master/python/jdet/data/devkits/result_merge.py", line 171, in nmsbynamedict
keep = nms(np.array(nameboxdict[imgname]), thresh)
File "/root/ZY/JDet-master/python/jdet/data/devkits/result_merge.py", line 105, in py_cpu_nms_poly_fast
iou = iou_poly(polys[i], polys[tmp_order[j]])
File "/root/ZY/JDet-master/python/jdet/ops/nms_poly.py", line 310, in iou_poly
iou = inter_area/(poly1.area+poly2.area-inter_area)
ZeroDivisionError: float division by zero

test时出现的报错

压缩包存放路径问题

运行process.py文件处理数据集的时候，请问下载压缩包的路径怎么修改？

AttributeError: 'jittor_core.Var' object has no attribute 'sort'

Wed Aug 25 15:01:57 2021 Validating....
0%| | 33/41429 [01:43<36:05:37, 3.14s/it]
Traceback (most recent call last):
File "tools/run_net.py", line 47, in
main()
File "tools/run_net.py", line 40, in main
runner.run()
File "/root/ZY/JDet-master/python/jdet/runner/runner.py", line 85, in run
self.val()
File "/root/anaconda3/envs/Jit/lib/python3.7/site-packages/jittor/init.py", line 89, in inner
ret = func(*args, **kw)
File "/root/anaconda3/envs/Jit/lib/python3.7/site-packages/jittor/init.py", line 257, in inner
ret = func(*args, **kw)
File "/root/ZY/JDet-master/python/jdet/runner/runner.py", line 153, in val
result = self.model(images,targets)
File "/root/anaconda3/envs/Jit/lib/python3.7/site-packages/jittor/init.py", line 737, in call
return self.execute(*args, **kw)
File "/root/ZY/JDet-master/python/jdet/models/networks/s2anet.py", line 35, in execute
outputs = self.bbox_head(features, targets)
File "/root/anaconda3/envs/Jit/lib/python3.7/site-packages/jittor/init.py", line 737, in call
return self.execute(*args, **kw)
File "/root/ZY/JDet-master/python/jdet/models/roi_heads/s2anet_head.py", line 627, in execute
return self.get_bboxes(*outs,self.parse_targets(targets,is_train=False))
File "/root/ZY/JDet-master/python/jdet/models/roi_heads/s2anet_head.py", line 538, in get_bboxes
scale_factor, cfg, rescale)
File "/root/ZY/JDet-master/python/jdet/models/roi_heads/s2anet_head.py", line 597, in get_bboxes_single
cfg.max_per_img)
File "/root/ZY/JDet-master/python/jdet/ops/nms_rotated.py", line 589, in multiclass_nms_rotated
_, inds = scores.sort(descending=True)
AttributeError: 'jittor_core.Var' object has no attribute 'sort'

训练了6小时左右出现这个错误

AttributeError: module 'jittor.nn' has no attribute 'Conv3d'

使用docker安装的jittor,version是1.2.2.59。不仅nn.Conv3d没有，而且nn.BatchNorm3d，nn.InstanceNorm3d也都没有。

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1091)

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1091)
在运行## Data Preprocessing时遇到上述错误，请问是什么原因呢

请问怎么计算 map ?

请问配置文件中的ms是指数据开始的preprocess不同吗，能否在训练过程中实现ms？

测试时出现错误

你好，当我运行python run_net.py --config-file=configs/retinanet_config.py --task=test测试时出现以下错误：

Caught segfault at address 0x2b95dd8ffe00, thread_name: '', flush log...
[i 0611 14:45:27.094591 36 tracer.cc:149] stack trace for pid= 78086
[New LWP 78219]
[New LWP 78218]
[New LWP 78217]
[New LWP 78216]
[New LWP 78215]
[New LWP 78214]
[New LWP 78213]
[New LWP 78212]
[New LWP 78211]
[New LWP 78210]
[New LWP 78209]
[New LWP 78208]
[New LWP 78207]
[New LWP 78206]
[New LWP 78205]
[New LWP 78204]
[New LWP 78203]
[New LWP 78202]
[New LWP 78201]
[New LWP 78173]
[New LWP 78172]
[New LWP 78171]
[New LWP 78170]
[New LWP 78169]
[New LWP 78168]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /home/xxx/anaconda3/envs/jdet/bin/../lib/libstdc++.so.6]
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /home/xxx/anaconda3/envs/jdet/bin/../lib/libgcc_s.so.1]
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /home/xxx/anaconda3/envs/jdet/lib/libgomp.so.1]
Missing separate debuginfo for /home/xxx/anaconda3/envs/jdet/lib/python3.9/site-packages/numpy/core/../../numpy.libs/libgfortran-040039e1.so.5.0.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5b/be74eb6855e0a2c043c0bec2f484bf3e9f14c0.debug
Missing separate debuginfo for /home/xxx/anaconda3/envs/jdet/lib/python3.9/site-packages/numpy/core/../../numpy.libs/libquadmath-96973f99.so.0.0.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/54/9b4c82347785459571c79239872ad31509dcf4.debug
Missing separate debuginfo for /home/xxx/anaconda3/envs/jdet/lib/python3.9/site-packages/PIL/../Pillow.libs/libXau-00ec42fe.so.6.0.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/31/af390be616b7cb3e805b5fb014bd097ee08dfd.debug
Missing separate debuginfo for /home/xxx/anaconda3/envs/jdet/lib/python3.9/site-packages/cv2/../opencv_python.libs/libopenblas-r0-f650aae0.3.3.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/7c/23b5dcd1a14cd7d5dbfa81e8a02ac76b4e4450.debug
Missing separate debuginfo for /home/xxx/anaconda3/envs/jdet/lib/python3.9/site-packages/cv2/../opencv_python.libs/libbz2-a273e504.so.1.0.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/0c/85c0386f0cf41ea39969cf7f58a558d1ad3235.debug
Missing separate debuginfo for /home/xxx/anaconda3/envs/jdet/lib/python3.9/site-packages/cv2/../opencv_python.libs/libgfortran-91cc3cb1.so.3.0.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/53/827b36771c70cc3514402ac6e419f0533ad60c.debug
0x00002b948bc3d1c9 in waitpid () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x2b948b074880 (LWP 78086))]
#0  0x00002b948bc3d1c9 in waitpid () from /lib64/libc.so.6
#1  0x00002b9493098f28 in jittor::print_trace() () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/jit_utils_core.cpython-39-x86_64-linux-gnu.so
#2  0x00002b9493094f65 in jittor::segfault_sigaction(int, siginfo_t*, void*) () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/jit_utils_core.cpython-39-x86_64-linux-gnu.so
#3  <signal handler called>
#4  0x00002b9770aa4ad8 in jittor::CodeOp::jit_run() () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/cu11.2.152_sm_70/jit/code__IN_SIZE_2__in0_dim_1__in0_type_int32__in1_dim_2__in1_type_float32__OUT_SIZE_1__out0____hash_37c6b9432a4203e_op.so
#5  0x00002b949c57a8ac in jittor::Profiler::record_and_run(void (*)(jittor::Op*), jittor::Op*, char const*) () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/cu11.2.152_sm_70/jittor_core.cpython-39-x86_64-linux-gnu.so
#6  0x00002b949c420ddb in jittor::Op::jit_run(jittor::JitKey&) () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/cu11.2.152_sm_70/jittor_core.cpython-39-x86_64-linux-gnu.so
#7  0x00002b949c5826c1 in jittor::Executor::run_sync(std::vector<jittor::Var*, std::allocator<jittor::Var*> >, bool, bool) () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/cu11.2.152_sm_70/jittor_core.cpython-39-x86_64-linux-gnu.so
#8  0x00002b949c41e710 in jittor::Op::init() () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/cu11.2.152_sm_70/jittor_core.cpython-39-x86_64-linux-gnu.so
#9  0x00002b949c35e0f0 in jittor::jit_op_maker::make_where(jittor::Var*, jittor::NanoString) () from /home/xxx/.cache/jittor/jt1.3.4/g++9.3.0/py3.9.12/Linux-3.10.0-8x74/IntelRXeonRGolx16/default/cu11.2.152_sm_70/jittor_core.cpython-39-x86_64-linux-gnu.so
Undefined command: "py-bt".  Try "help".
Segfault, exit
Aborted

请问这个错误应该如何解决？
谢谢。

FP16混合精度训练

请问jittor支持FP16混合精度训练吗？有没有相关的示例代码和文档？感谢

数据预处理过程内存占用过高以至报错

执行 python tools/preprocess.py --config-file configs/preprocess/dota1_5_preprocess_config.py 时,
内存占用量在命令开始执行之后若干秒便开始激增,
直到吃满所有系统内存.

系统环境:

Windows 专业版 21H1 19043.1706
Python 3.10.7
内存 32G
显卡 3070

开始部分日志:

(数据集目录名是 dota-1.0, 里面的数据使用的是 1.5 版本)

D:\project-jittor\jdet>python tools\preprocess.py --config-file configs/preprocess/dota1_5_preprocess_config.py
[i 0930 15:16:14.694000 84 compiler.py:955] Jittor(1.3.5.16) src: c:\users\pc\appdata\local\programs\python\python310\lib\site-packages\jittor-1.3.5.16-py3.10.egg\jittor
[i 0930 15:16:14.740000 84 compiler.py:956] cl at C:\Users\pc\.cache\jittor\msvc\VC\_\_\_\_\_\bin\cl.exe(19.29.30133)
[i 0930 15:16:14.741000 84 compiler.py:957] cache_path: C:\Users\pc\.cache\jittor\jt1.3.5\cl\py3.10.7\Windows-10-10.xb1\IntelRCoreTMi7x95\default
[i 0930 15:16:14.745000 84 install_cuda.py:88] cuda_driver_version: [11, 7, 0]
[i 0930 15:16:14.796000 84 __init__.py:411] Found C:\Users\pc\.cache\jittor\jtcuda\cuda11.2_cudnn8_win\bin\nvcc.exe(11.2.67) at C:\Users\pc\.cache\jittor\jtcuda\cuda11.2_cudnn8_win\bin\nvcc.exe.
[i 0930 15:16:14.911000 84 compiler.py:1010] cuda key:cu11.2.67
[i 0930 15:16:14.913000 84 __init__.py:227] Total mem: 31.91GB, using 10 procs for compiling.
[i 0930 15:16:15.796000 84 jit_compiler.cc:28] Load cc_path: C:\Users\pc\.cache\jittor\msvc\VC\_\_\_\_\_\bin\cl.exe
[i 0930 15:16:15.797000 84 init.cc:62] Found cuda archs: [86,]
[i 0930 15:16:15.892000 84 compile_extern.py:517] mpicc not found, distribution disabled.
[w 0930 15:16:15.941000 84 compile_extern.py:200] CUDA related path found in LD_LIBRARY_PATH or PATH(['', 'C', '\\Users\\pc\\.cache\\jittor\\jtcuda\\cuda11.2_cudnn8_win\\lib64', '', 'C', '\\Users\\pc\\.cache\\jittor\\mkl\\dnnl_win_2.2.0_cpu_vcomp\\bin', '', 'C', '\\Users\\pc\\.cache\\jittor\\mkl\\dnnl_win_2.2.0_cpu_vcomp\\lib', '', 'C', '\\Users\\pc\\.cache\\jittor\\jt1.3.5\\cl\\py3.10.7\\Windows-10-10.xb1\\IntelRCoreTMi7x95\\default', '', 'C', '\\Users\\pc\\.cache\\jittor\\jt1.3.5\\cl\\py3.10.7\\Windows-10-10.xb1\\IntelRCoreTMi7x95\\default\\cu11.2.67', '', 'C', '\\Users\\pc\\.cache\\jittor\\jtcuda\\cuda11.2_cudnn8_win\\bin', '', 'C', '\\Users\\pc\\.cache\\jittor\\jtcuda\\cuda11.2_cudnn8_win\\lib\\x64', '', 'C', '\\Users\\pc\\.cache\\jittor\\msvc\\win10_kits\\lib\\ucrt\\x64', '', 'C', '\\Users\\pc\\.cache\\jittor\\msvc\\win10_kits\\lib\\um\\x64', '', 'C', '\\Users\\pc\\.cache\\jittor\\msvc\\VC\\lib', '', 'c', '\\users\\pc\\appdata\\local\\programs\\python\\python310\\libs', 'C', '\\Users\\pc\\.cache\\jittor\\msvc\\VC\\_\\_\\_\\_\\_\\bin', 'C', '\\Users\\pc\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\opencv_python-4.6.0.66-py3.10-win-amd64.egg\\cv2\\../../x64/vc14/bin', 'C', '\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.7\\bin', 'C', '\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.7\\libnvvp', 'D', '\\Release', 'D', '\\ffmpeg\\bin', 'C', '\\ProgramData\\Oracle\\Java\\javapath', 'C', '\\Program Files\\Java\\jdk1.8.0_131\\bin', 'C', '\\Program Files\\Java\\jdk1.8.0_131\\jre\\bin', 'C', '\\Windows\\system32', 'C', '\\Windows', 'C', '\\Windows\\System32\\Wbem', 'C', '\\Windows\\System32\\WindowsPowerShell\\v1.0\\', 'C', '\\Windows\\System32\\OpenSSH\\', 'C', '\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common', 'C', '\\Program Files\\NVIDIA Corporation\\NVIDIA NvDLISR', 'C', '\\Program Files\\NVIDIA Corporation\\Nsight Compute 2022.2.1\\', 'C', '\\Users\\pc\\AppData\\Local\\Programs\\Python\\Python310\\Scripts\\', 'C', '\\Users\\pc\\AppData\\Local\\Programs\\Python\\Python310\\', 'C', '\\Program Files\\MySQL\\MySQL Shell 8.0\\bin\\', 'C', '\\Users\\pc\\AppData\\Local\\Microsoft\\WindowsApps', '', 'C', '\\Users\\pc\\AppData\\Local\\Programs\\Microsoft VS Code\\bin']), This path may cause jittor found the wrong libs, please unset LD_LIBRARY_PATH and remove cuda lib path in Path.
Or you can let jittor install cuda for you: `python3.x -m jittor_utils.install_cuda`
Loading config from:  configs/preprocess/dota1_5_preprocess_config.py
{'type': 'DOTA1_5', 'source_dataset_path': 'D:\\project-jittor\\dataset\\dota-1.0', 'target_dataset_path': 'D:\\project-jittor\\dataset\\dota-1.0-processed', 'tasks': [{'label': 'trainval', 'config': {'subimage_size': 600, 'overlap_size': 150, 'multi_scale': [1.0], 'horizontal_flip': False, 'vertical_flip': False, 'rotation_angles': [0.0]}}, {'label': 'test', 'config': {'subimage_size': 600, 'overlap_size': 150, 'multi_scale': [1.0], 'horizontal_flip': False, 'vertical_flip': False, 'rotation_angles': [0.0]}}], 'name': 'dota1_5_preprocess_config', 'work_dir': 'work_dirs/dota1_5_preprocess_config'}
==============
processing trainval
fatal   fatal   : : Memory allocation failureMemory allocation failure

fatal   : Memory allocation failure
系统无法执行指定的程序。
内存资源不足，无法处理此命令。
内存资源不足，无法处理此命令。
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\project-jittor\jdet\tools\preprocess.py", line 5, in <module>
    from jdet.config import init_cfg, get_cfg
  File "d:\project-jittor\jdet\python\jdet\__init__.py", line 1, in <module>
    from . import models
  File "d:\project-jittor\jdet\python\jdet\models\__init__.py", line 1, in <module>
    from .networks import *
  File "d:\project-jittor\jdet\python\jdet\models\networks\__init__.py", line 1, in <module>
    from .rcnn import RCNN
  File "d:\project-jittor\jdet\python\jdet\models\networks\rcnn.py", line 2, in <module>
    import jittor as jt
  File "C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\site-packages\jittor-1.3.5.16-py3.10.egg\jittor\__init__.py", line 32, in <module>
    from typing import List, Tuple
  File "C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\typing.py", line 2178, in <module>
    class SupportsInt(Protocol):
  File "c:\users\pc\appdata\local\programs\python\python310\lib\abc.py", line 106, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
  File "C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\typing.py", line 1554, in __init_subclass__
    cls._is_protocol = any(b is Protocol for b in cls.__bases__)
MemoryError: Out of memory interning an attribute name
1 / 10
/ 0
1 / 0
C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\site-packages\shapely-2.0a1-py3.10-win-amd64.egg\shapely\set_operations.py:132: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\site-packages\shapely-2.0a1-py3.10-win-amd64.egg\shapely\set_operations.py:132: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\site-packages\shapely-2.0a1-py3.10-win-amd64.egg\shapely\set_operations.py:132: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
2 / 0
2 / 0
3 / 0
4 / 0

在这后面有很多类似的 "数字 / 0" 日志, 以及下面这样的日志:

C:\Users\pc\AppData\Local\Programs\Python\Python310\lib\site-packages\shapely-2.0a1-py3.10-win-amd64.egg\shapely\set_operations.py:132: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)

命令执行过程中会弹出系统错误弹窗: nvcc.exe 应用程序无法正常启动(0xc0000142).请单击"确定"关闭应用程序.

再往后有大量类似的报错日志:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 236, in prepare
Traceback (most recent call last):
      File "<string>", line 1, in <module>
_fixup_main_from_path(data['init_main_from_path'])
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 289, in run_path
  File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    return _run_module_code(code, init_globals, run_name,
exitcode = _main(fd, parent_sentinel)  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 96, in _run_module_code

      File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 125, in _main
_run_code(code, mod_globals, init_globals,
prepare(preparation_data)  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 86, in _run_code

      File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 236, in prepare
exec(code, run_globals)
_fixup_main_from_path(data['init_main_from_path'])  File "D:\project-jittor\jdet\tools\preprocess.py", line 5, in <module>

      File "c:\users\pc\appdata\local\programs\python\python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
from jdet.config import init_cfg, get_cfg
      File "d:\project-jittor\jdet\python\jdet\__init__.py", line 1, in <module>
main_content = runpy.run_path(main_path,
from . import models  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 289, in run_path

  File "d:\project-jittor\jdet\python\jdet\models\__init__.py", line 1, in <module>
    return _run_module_code(code, init_globals, run_name,
  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "c:\users\pc\appdata\local\programs\python\python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\project-jittor\jdet\tools\preprocess.py", line 5, in <module>
    from jdet.config import init_cfg, get_cfg
  File "d:\project-jittor\jdet\python\jdet\__init__.py", line 1, in <module>
    from . import models
  File "d:\project-jittor\jdet\python\jdet\models\__init__.py", line 1, in <module>
        from .networks import *from .networks import *

  File "d:\project-jittor\jdet\python\jdet\models\networks\__init__.py", line 1, in <module>
  File "d:\project-jittor\jdet\python\jdet\models\networks\__init__.py", line 1, in <module>

MemoryError: Out of memory interning an attribute name

ImportError: DLL load failed while importing cv2: 页面文件太小，无法完成操作。

请问如此高的内存占用是否是正常情况?
有无限制内存使用量的配置项?
还是说目前只能使用更高内存的设备运行预处理指令?

RotationInvariantPooling(256, 8) 硬编码？

Hi,

我学习 s2anet 实现的时候发现在 models/roi_heads/s2anet_head.py 里，S2ANetHead 定义中有 (https://github.com/Jittor/JDet/blob/2fd9ef0833f2edc79ee8a30aadca503ce5054a36/python/jdet/models/roi_heads/s2anet_head.py#L160)：

self.or_pool = RotationInvariantPooling(256, 8)

我猜这里的 256 是不是硬编码了，应该换成 self.feat_channels?

无法开始训练

运行训练代码时出现了如下错误

Traceback (most recent call last):
  File "tools/run_net.py", line 47, in <module>
    main()
  File "tools/run_net.py", line 40, in main
    runner.run()
  File "/workspace/JDet/python/jdet/runner/runner.py", line 81, in run
    self.train()
  File "/workspace/JDet/python/jdet/runner/runner.py", line 101, in train
    batch_size = len(targets)**jt.mpi.world_size()
AttributeError: 'NoneType' object has no attribute 'world_size'

根据Jittor官网上的说明，jt.mpi在没有mpi环境时是None，那么单卡环境下如何运行JDet呢

已解决

已解决。

python preprocess.py --config-file dota_preprocess_ config.py 报错

错误如下
Traceback (most recent call last):
File "preprocess.py", line 4, in
from jdet.config import init_cfg, get_cfg
File "/home/vr1/anaconda3/envs/env_ybgu/lib/python3.7/site-packages/jdet-0.1-py3.7.egg/jdet/init.py", line 1, in
ImportError: cannot import name 'models' from 'jdet' (/home/vr1/anaconda3/envs/env_ybgu/lib/python3.7/site-packages/jdet-0.1-py3.7.egg/jdet/init.py)
搜索并未找到解决办法，还请作者指点。

请教

大佬，请教一下，实时性较好的旋转框检测，有何建议呢
十分感谢！