Code release for Fine-Grained Visual Classiﬁcation via Progressive Multi-Granularity Training of Jigsaw Patches (ECCV2020)

License: MIT License

Python 100.00%

fgvc jigsaw fine-grained-classification fine-grained fine-grained-recognition fine-grained-visual-categorization pmg

pmg-progressive-multi-granularity-training's Introduction

Progressive Multi-Granularity Training

Code release for Fine-Grained Visual Classiﬁcation via Progressive Multi-Granularity Training of Jigsaw Patches (ECCV2020)

Requirement

python 3.6

PyTorch >= 1.3.1

torchvision >= 0.4.2

Training

Download datatsets for FGVC (e.g. CUB-200-2011, Standford Cars, FGVC-Aircraft, etc) and organize the structure as follows:

dataset
├── train
│   ├── class_001
|   |      ├── 1.jpg
|   |      ├── 2.jpg
|   |      └── ...
│   ├── class_002
|   |      ├── 1.jpg
|   |      ├── 2.jpg
|   |      └── ...
│   └── ...
└── test
    ├── class_001
    |      ├── 1.jpg
    |      ├── 2.jpg
    |      └── ...
    ├── class_002
    |      ├── 1.jpg
    |      ├── 2.jpg
    |      └── ...
    └── ...

Train from scratch with train.py.

Citation

Please cite our paper if you use PMG in your work.

@InProceedings{du2020fine,
  title={Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches},
  author={Du, Ruoyi and Chang, Dongliang and Bhunia, Ayan Kumar and Xie, Jiyang and Song, Yi-Zhe and Ma, Zhanyu and Guo, Jun},
  booktitle = {European Conference on Computer Vision},
  year={2020}
}

Contact

Thanks for your attention! If you have any suggestion or question, you can leave a message here or contact us directly:

pmg-progressive-multi-granularity-training's People

Contributors

Stargazers

Watchers

pmg-progressive-multi-granularity-training's Issues

What does this function mean? "def jigsaw_generator(images, n)"

In utils.py ,line 42: def jigsaw_generator(images, n).
What does this function mean?

Why should the size of patch be smaller than that of receptive field?

Hi,

Thanks for the impressive research.

I am a little bit confused about the content related to the choice of n in Sec 3.3: "the size of the patches should be smaller than the receptive field of the corresponding stage, otherwise, the performance of the jigsaw puzzle generator will be reduced". Could you please do a detailed explaination why this condition is required? Thanks a lot.

如何使用训练好的模型去分类数据集？这里测试我遇到了一些问题。

`# -- coding:utf-8 --

@time :2019.03.15

@ide : pycharm

@author :lxztju

@github : https://github.com/lxztju

from utils import test
import torch
import os
from PIL import Image
import pandas as pd
from tqdm import tqdm
import numpy as np
from collections import Counter
from torchvision import transforms, models

def predict(model):
# 读入模型
model = torch.load(trained_model)
model.eval()
print('..... Finished loading model! ......')
##将模型放置在gpu上运行
if torch.cuda.is_available():
model.cuda()
pred_list, _id = [], []
transform_test = transforms.Compose([
transforms.Scale((550, 550)),
transforms.CenterCrop(448),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
images = os.listdir(test_image_path)
for image in tqdm(images):
img_path = os.path.join(test_image_path, image)
img = Image.open(img_path).convert('RGB')
if torch.cuda.is_available():
img = transform_test(img).unsqueeze(0).cuda()
with torch.no_grad():
_, _, _, output_concat = model(img)
_, predicted = torch.max(output_concat.data, 1)
print(img_path + predicted)

if name == "main":
trained_model = "./weight/model.pth"
test_image_path = "/data/server77_data_b/fangzheng_crop"
predict(trained_model)
`

RuntimeError: running_mean should contain 16384 elements not 1024

help me

excuse me,how to test????

Convergence time

Hello!
How many epoches does the training set iterate before it converges? In stanford cars dataset.

Question about accuracy

Thanks for your code. I have some questions about your experiments
#########################

In train.py:

optimizer = optim.SGD([
{'params': net.classifier_concat.parameters(), 'lr': 0.002},
{'params': net.conv_block1.parameters(), 'lr': 0.002},
{'params': net.classifier1.parameters(), 'lr': 0.002},
{'params': net.conv_block2.parameters(), 'lr': 0.002},
{'params': net.classifier2.parameters(), 'lr': 0.002},
{'params': net.conv_block3.parameters(), 'lr': 0.002},
{'params': net.classifier3.parameters(), 'lr': 0.002},
# {'params': net.features.parameters(), 'lr': 0.0002}

],
    momentum=0.9, weight_decay=5e-4)

if I do not commented out {'params': net.features.parameters(), 'lr': 0.0002}, loss has always been 5.3, unable to drop.

if I commented it, I can not obtain your accuracy in your paper. it only 88.17742

#################################################################
True
==> Preparing data..
/home/wjz/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py:208: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
==> Building model..

About"./bird/train"

the original dataset "CUB200-2011" without /bird/train, how do u split train and test

worse scores

dear author：
I got worse scores in my machine with backbone of pretrained resnet50 on CUB(84.06), 20 epoch, another is even worse(81.42), on the final stage, training loss is near to zero and training acc is 100.

can you solve my problem? thanks

About Visualization

Thanks for your meaningful work! I got into some trouble when I try to visualize the last stages’ convolution layer by Grad-CAM (like below). If possible could you share with me a copy of the visualization code?

email: [email protected]

hello, i can not find

about "progressively add new stages"

（model.py line:60)

def forward(self, x):
xf1, xf2, xf3, xf4, xf5 = self.features(x)

    xl1 = self.conv_block1(xf3)
    xl2 = self.conv_block2(xf4)
    xl3 = self.conv_block3(xf5)
  。。。

In the code, model loads and trains all stages of the baseline model simultaneously, not as the paper says"train the low stage first and then progressively add new stages for training".
Did I get it wrong? Please help me understand better.

关于t-sne可视化

感谢您优秀的工作！
拜读了您的论文之后，有一些困惑想向您请教。
您论文中关于CUB鸟类的t-sne可视化图像

1.我也尝试t-sne可视化cub最后一个卷积层输出的特征分布，但是tsne中每个点对应一个测试样本embedding的话，点显得比较稀疏，请问您的可视化结果中的每个点也是对应一个测试样本输入主干网络提取的最后一层卷积层的特征吗？
2.期待您分享t-sne可视化代码部分！能否分享一些t-sne绘图的技巧，比如类似您论文中t-sne聚类团分布。

非常感谢！祝：研究工作顺利！

识别结果predict

请问如何将识别的结果显示出来？类似于目标检测那样框起来并写上label

CUDA out of memory

您好，当我运行程序时，GPU会爆显存！

transforms.Normalize

why use x-0.5 / 0.5 instead of ‘[0.485, 0.456, 0.406], [0.229, 0.224, 0.225]‘？

advisory

Hello, author, have you tried other networks such as inceptionv3 as a backbone network?

How can I train and test on a single gpu？

I noticed you used multiple GPUs to train.
However, I Just have 1 GPU.
How can I train and test on a single GPU？

关于gpu的问题

GPU

device = torch.device("cuda:0,1")
net.to(device)

cudnn.benchmark = True

这里确定是net,而不是netp吗？还有为什么在python1.7中，“cuda:0,1"这样的写法不行？

accuracy

hello:
In the process of reproduction, I did not achieve the results in your paper, is there any training skills?

net setting

How can i get same scores in your paper , can you provide hyperparameters and net settings? thanks

About back propagation

I found it is very time-consuming to train the model, why not add together all the losses then only do gradient descent once in an epoch? Is there any difference? Thank you

Result of inference convidence

Thank you for awesome repo.
I want to get the convenience value of output，
so i print the ('output_concat.data',output_concat.data)
output_concat.data tensor([[ 0.6667, 2.4874, -0.2616, -0.8905, -1.0161, -1.3324, 0.9716, -3.1546,
2.6772],
[-1.0338, -2.5260, 0.4580, 0.6618, 1.2473, 1.4716, -0.4507, 2.6250,
-2.7728]], device='cuda:0')
some of the confidence value is negative， and their sum is not 1.
wish ur reply

about paramters

How many parameters does this trained model have? In megabytes.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

pris-cv / pmg-progressive-multi-granularity-training Goto Github PK