Giter VIP home page Giter VIP logo

dota_yolov2's Introduction

DOTA_YOLOv2 provides the data convertion code, parameter files while training DOTA using YOLOv2, and the trained model is also provided. So it's convenient for you to use them.

Our code is tested on official darknet@(commit f6d8617) with cuda-8.0 and cudnn-6.0 on Ubuntu 16.04.1 LTS.

Installation

  • install darknet
    See Installing Darknet for instructions.
  • development kit
    The Development kit provides the following functions. You can easily install it following the instructions.
    • Load and visulize the data.
    • Evaluate the result.
    • Split and merge the picture and label.

Training YOLO on DOTA

  • Get the DOTA Dataset

  • Convert the Label Format
    In DOTA, the annotation format is:

        x1 y1 x2 y2 x3 y3 x4 y4 category difficult
    

    While Darknet wants a .txt file for each image with a line for each ground truth object in the image that looks like:

        category-id x y width height
    

    Where x, y, width, and height are relative to the image's width and height. Here, you can refer to data_transform/YOLO_Transform.py to convert the format.

    Note that this code is for the image of size 1024*1024. If not, you should modify it accroding to your image size. For DOTA, you can refer to DOTA_devkit/ImgSplit.py to split the images and labels.

  • Modify Cfg for Your Data
    You have to change the cfg/dota.data config file to point to your data:

        classes=15
        train  = /home/yh/dota/dota_data/YOLO/train/train.txt
        valid  = /home/yh/dota/dota_data/YOLO/test/test.txt
        names = data/dota.names
        backup = /home/yh/dota/darknet/dota-backup
    

    You should replace the path here with the path where you put your corresponding file. And the text files like train.txt or test.txt list the image files for training or test. Notice that we use the full path of the image instead of the file name.

  • Train the Model

        wget https://pjreddie.com/media/files/darknet19_448.conv.23
        sh train-dota.sh 
    
  • Evaluate the Results
    You can download the pre-trained model on DOTA from Baidu Drive or Google Drive, and use it to test all the test images.

        sh valid-dota.sh 
    

    Then you will obtain 15 files stored in the results/ subdirectory, and each file contains all the results for a specific category.Each file is in the following format:

        imgname score xmin ymin xmax ymax 
    

    If you have split the images before, please first use DOTA_devkit/ResultMerge.py to merge the results.

    For DOTA, You can submit your results on the Evaluation Server for evaluation. See the official website of DOTA for details.

dota_yolov2's People

Contributors

ringringyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dota_yolov2's Issues

mAP of YOLO V2

I tried YOLO V2 on DOTA but the mAP is much lower than that in the paper. I was wondering if anyone can achieve the result and how to set the configuration for training. Thanks a lot.

How to visualize the results

I have run the pretrained model on test images from DOTA dataset. I have found few text files in results folder. Could someone tell me how we can visualize the results on test images ?

puzzled by partial sum calculation in gemm

hello,
in forward_convolutional_layer sub-function, it calls "gemm(0,0,m,n,k,1,a,k,b,n,1,c,n);", for my understanding, 'a' point to weight, 'b' point to input data, and 'c' is output. in general convolution calculation, each weight value in a conv kernel is multiplied with corresponding input data and calculate partial sum, that's to say, each weight value will multiply different value and do partial sum. but in gemm_nn sub-function, the same weight value 'A[i*lda+k]' multiply different 'B' value, and do partial sum, i am puzzled by this, could you help me ?

void gemm_nn(int M, int N, int K, float ALPHA, 
        float *A, int lda, 
        float *B, int ldb,
        float *C, int ldc)
{
    int i,j,k;
    for(i = 0; i < M; ++i){
        for(k = 0; k < K; ++k){
            register float A_PART = ALPHA*A[i*lda+k];
            for(j = 0; j < N; ++j){
                C[i*ldc+j] += A_PART*B[k*ldb+j];
            }
        }
    }
}

参数位置

你好,请问标出来的框的位置(即框的四个顶点的坐标)能在哪里找到呢?

anchors设置

您好!
想请问一下如何根据数据集更改anchors?
例如现在yolo-dota.cfg中给出的anchors是基于1024x1024得到的,假设我想训练yolov3,其中的anchors如何设置呢?

valid时报错

Couldn't open file: /home/yh/dota/dota_data/YOLO/test/test.txt
请问这个报错是来自哪个文件呢?

anchor size

I may have a question related to your anchor size. How do you modify the anchor size in your cfg files? I found not all images after splitting in training are 1024*1024 so that it may need to resize images when training yolo. will it have effect on anchor sizes?

Anyone tried with YOLOv3 or YOLOv4?

I just want to ask that, is there anyone who tried DOTA v1.0 with YOLOv3 or YOLOv4?

I am currently trying to train my model with YOLOv4-tiny but the mAP values are much less than the declared ones in the website. Could there be an incompatibility or other problem with the higher version YOLO algorithms?

关于如何制作yolo标签

我看了一下DOTA中有些例子的标注,有的bbox是歪着的。在Yolo的label里面只有x,y,w,h,没有bbox倾斜角度的信息,应该是默认bbox是正的。 我看了一下data_transform下的代码感觉也没有处理这个角度信息的地方。所以想问一下这个东西需要注意么,对训练结果影响如何?
谢谢

请问为什么训练下载完权重后进程就卡死不动了?

(ykw) MacdeMacBook-Air:darknet mac$ ./darknet detector train /Users/mac/Desktop/DOTA_YOLOv2-master/cfg/dota.data /Users/mac/Desktop/DOTA_YOLOv2-master/cfg/yolo-dota.cfg /Users/mac/Desktop/DOTA_YOLOv2-master/darknet19_448.conv.23 | tee bod-dota.txt
layer filters size input output
0 conv 32 3 x 3 / 1 1024 x1024 x 3 -> 1024 x1024 x 32 1.812 BFLOPs
1 max 2 x 2 / 2 1024 x1024 x 32 -> 512 x 512 x 32
2 conv 64 3 x 3 / 1 512 x 512 x 32 -> 512 x 512 x 64 9.664 BFLOPs
3 max 2 x 2 / 2 512 x 512 x 64 -> 256 x 256 x 64
4 conv 128 3 x 3 / 1 256 x 256 x 64 -> 256 x 256 x 128 9.664 BFLOPs
5 conv 64 1 x 1 / 1 256 x 256 x 128 -> 256 x 256 x 64 1.074 BFLOPs
6 conv 128 3 x 3 / 1 256 x 256 x 64 -> 256 x 256 x 128 9.664 BFLOPs
7 max 2 x 2 / 2 256 x 256 x 128 -> 128 x 128 x 128
8 conv 256 3 x 3 / 1 128 x 128 x 128 -> 128 x 128 x 256 9.664 BFLOPs
9 conv 128 1 x 1 / 1 128 x 128 x 256 -> 128 x 128 x 128 1.074 BFLOPs
10 conv 256 3 x 3 / 1 128 x 128 x 128 -> 128 x 128 x 256 9.664 BFLOPs
11 max 2 x 2 / 2 128 x 128 x 256 -> 64 x 64 x 256
12 conv 512 3 x 3 / 1 64 x 64 x 256 -> 64 x 64 x 512 9.664 BFLOPs
13 conv 256 1 x 1 / 1 64 x 64 x 512 -> 64 x 64 x 256 1.074 BFLOPs
14 conv 512 3 x 3 / 1 64 x 64 x 256 -> 64 x 64 x 512 9.664 BFLOPs
15 conv 256 1 x 1 / 1 64 x 64 x 512 -> 64 x 64 x 256 1.074 BFLOPs
16 conv 512 3 x 3 / 1 64 x 64 x 256 -> 64 x 64 x 512 9.664 BFLOPs
17 max 2 x 2 / 2 64 x 64 x 512 -> 32 x 32 x 512
18 conv 1024 3 x 3 / 1 32 x 32 x 512 -> 32 x 32 x1024 9.664 BFLOPs
19 conv 512 1 x 1 / 1 32 x 32 x1024 -> 32 x 32 x 512 1.074 BFLOPs
20 conv 1024 3 x 3 / 1 32 x 32 x 512 -> 32 x 32 x1024 9.664 BFLOPs
21 conv 512 1 x 1 / 1 32 x 32 x1024 -> 32 x 32 x 512 1.074 BFLOPs
22 conv 1024 3 x 3 / 1 32 x 32 x 512 -> 32 x 32 x1024 9.664 BFLOPs
23 conv 1024 3 x 3 / 1 32 x 32 x1024 -> 32 x 32 x1024 19.327 BFLOPs
24 conv 1024 3 x 3 / 1 32 x 32 x1024 -> 32 x 32 x1024 19.327 BFLOPs
25 route 16
26 reorg / 2 64 x 64 x 512 -> 32 x 32 x2048
27 route 26 24
28 conv 1024 3 x 3 / 1 32 x 32 x3072 -> 32 x 32 x1024 57.982 BFLOPs
29 conv 100 1 x 1 / 1 32 x 32 x1024 -> 32 x 32 x 100 0.210 BFLOPs
30 detection
mask_scale: Using default '1.000000'
Loading weights from /Users/mac/Desktop/DOTA_YOLOv2-master/darknet19_448.conv.23...yolo-dota
Done!

test结果很差&调整lr的问题

@ringringyi 您好!很感谢您的代码,训练过程中出现了一些问题:
1.按照说明加载了yolo-dota.cfg,只有单卡GPU,NVIDIA GT TITAN BLACK(算力3.5),训练1024大小的DOTA train数据集,因此设置:
batch=1
subdivisions=1
网络cfg中学习率是
learning_rate=0.00005
max_batches = 2000000
policy=constant
训练时已经加载了darknet19_448.conv.23。
训练约12h,共训练了109000张图然后停止了训练,因为得到的训练结果如下:
Region Avg IOU: 0.618628, Class: 0.671129, Obj: 0.088273, No Obj: 0.003471, Avg Recall: 0.666667, count: 3
Region Avg IOU: 0.350919, Class: 0.315243, Obj: 0.000529, No Obj: 0.002533, Avg Recall: 0.500000, count: 2
Region Avg IOU: 0.507692, Class: 0.390687, Obj: 0.003199, No Obj: 0.001610, Avg Recall: 0.600000, count: 5
Region Avg IOU: 0.586214, Class: 0.206978, Obj: 0.030189, No Obj: 0.002246, Avg Recall: 1.000000, count: 3
Region Avg IOU: 0.744804, Class: 0.428495, Obj: 0.018022, No Obj: 0.003115, Avg Recall: 1.000000, count: 2
Region Avg IOU: 0.219129, Class: 0.057150, Obj: 0.000013, No Obj: 0.002348, Avg Recall: 0.000000, count: 2

109280: 169.920197, 45.703640 avg, 0.000050 rate, 0.414891 seconds, 109280 images
109281: 26.782028, 43.811478 avg, 0.000050 rate, 0.408204 seconds, 109281 images
109282: 20.342304, 41.464561 avg, 0.000050 rate, 0.419111 seconds, 109282 images
109283: 24.117226, 39.729828 avg, 0.000050 rate, 0.408410 seconds, 109283 images
109284: 26.526783, 38.409523 avg, 0.000050 rate, 0.413556 seconds, 109284 images
109285: 15.803896, 36.148960 avg, 0.000050 rate, 0.419731 seconds, 109285 images
109286: 50.919559, 37.626019 avg, 0.000050 rate, 0.411063 seconds, 109286 images
109287: 0.020407, 33.865456 avg, 0.000050 rate, 0.420056 seconds, 109287 images
109288: 29.913115, 33.470222 avg, 0.000050 rate, 0.413893 seconds, 109288 images
109289: 28.218138, 32.945015 avg, 0.000050 rate, 0.425859 seconds, 109289 images
109290: 0.749020, 29.725416 avg, 0.000050 rate, 0.407215 seconds, 109290 images
109291: 0.018226, 26.754698 avg, 0.000050 rate, 0.415694 seconds, 109291 images
109292: 12.341995, 25.313427 avg, 0.000050 rate, 0.411180 seconds, 109292 images

我将输出结果分开了,第一部分去掉了报nan的行(大约有30%报nan),cls和Obj在训练了10w张图片后仍然结果不高;第二部分的loss值和avg loss值在30以上。
2.停止训练后用10w次训练得到的weights对一张416大小的图像做了测试,效果不佳。下图是thresh=0.1的输出。
image
thresh>0.1时没有输出。
想请教这种状况是正常的吗,怎么可以调整?

如何merge和展示最终结果

我得到了split图的result文件,其中内容格式是imgname score xmin ymin xmax ymax。
问题一:请问我用DOTA_devkit/ResultMerge.py中的哪个函数完成merge?是mergebyrec吗?
问题二:我如何将最终的检测结果画到对应的图上?restored.showAnns是针对task1(定向边界框)的吗?,如何才可以画水平边界框结果?
谢谢

cfg文件中的anchors误写问题

在cfg/yolo-dota.cfg文件中,设置anchors时应该是笔误吧:
anchors = 1.36,0.89, 1.99,2.14, 1.13,1.56, 2.66,2.92, 3.71, 4,31
最后一个应该是4.31,而不是4,31。
这个问题会造成检测报错。
image-20210921163426092

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.