In paper you've said For a fair comparison, we train al

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Training data about modnet HOT 8 CLOSED

zhkkke commented on June 8, 2024

Training data

from modnet.

Comments (8)

ZHKKKe commented on June 8, 2024 4

For your questions:

Q1: What is the initial image resolution you crop from?
The resolutions of the original images in our training dataset are between 1000x1000~2000x2000. When generating cropped samples, we first resize the image to ensure the short side is 512, and then we randomly crop the long side to 512. In this way, most parts of the portraits will be included for learning human semantics.

Q2: Are you cropping randomly 512x512 during training or is there a strategy to generate 5 crops?
We generate all training samples (including cropped and BG replaced) before training because the online generation will make training very slow. As mentioned in Q1, we crop the samples randomly.

Q3: And during inference you resize image to 512x512 instead of cropping, correct?
During inference, we resize the image to ensure the short side is 512 (~~the size of the long side is arbitary~~ we also slightly adjust the size of the long side to ensure that it can be divided by 32 [thanks for being corrected by @Vozf]) as the input. After inference, we resize the image to its original size to calculate metrics.

Q4: Maybe you can share the toolkit you used for manual annotating?
We use Photoshop to annotate labels. On Youtube, there are many tutorials about how to do it. However, it takes a certain amount of practice to complete precise labeling.

Q5: Also is there any intuition on annotating exactly 3k images?
No... Our training dataset is smaller than the previous work only because the cost of annotation exceeds our expectations. Sorry.

I hope these explanations are helpful to you. :)

from modnet.

Vozf commented on June 8, 2024 1

Thanks for quick response. The license issues are understandable. Looking forward to the release.

from modnet.

ZHKKKe commented on June 8, 2024

Hi! Thanks for your attention!

It is our private training dataset of human matting.
I am sorry that we cannot publish this training dataset due to some permissions issues.
However, we will soon release an online demo, pre-trained model, validation benchmark, and training code.
Sorry again that we cannot make our training dataset public.

from modnet.

Vozf commented on June 8, 2024

I've dug into training setup and got some more questions If you don't mind.
Regarding the image preprocessing, in paper you've mentioned

For each foreground, we generate 5 samples
by random cropping and 10 samples by compositing the
backgrounds from the OpenImage dataset

What is the initial image resolution you crop from?
Are you cropping randomly 512x512 during training or is there a strategy to generate 5 crops? And during inference you resize image to 512x512 instead of cropping, correct?

Also some questions about annotations.
Maybe you can share the toolkit you used for manual annotating? Matte annotation is a bit tricky to do with classic segmentation tools.
Also is there any intuition on annotating exactly 3k images? Similar datasets for training usually consist of considerably more images for example DUTS consists of 15k images and Supervisely contains about 6.5k

from modnet.

Vozf commented on June 8, 2024

Yeah, this is very useful, thanks again.
Regarding 3rd question and arbitary size. Doesn't mobilenet need to have image size divisible by 2**5(256, 320, 512 etc)? I believe it will throw shape mismatch exception if non divisible shape is fed as an input(257, 319 etc.)? Is this situation handled somehow or am I wrong there?

from modnet.

ZHKKKe commented on June 8, 2024

@Vozf
Ohh... Yes. You are correct. The side size should can be divided by 32 (we will guarantee this when resizing). I forgot it just now.

from modnet.

Vozf commented on June 8, 2024

Great, thanks for clarification. Great paper. Looking forward to try it myself.

from modnet.

ZHKKKe commented on June 8, 2024

You are welcome.

from modnet.

Training data about modnet HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent