Giter VIP home page Giter VIP logo

Comments (8)

ZHKKKe avatar ZHKKKe commented on June 8, 2024 4

For your questions:

Q1: What is the initial image resolution you crop from?
The resolutions of the original images in our training dataset are between 1000x1000~2000x2000. When generating cropped samples, we first resize the image to ensure the short side is 512, and then we randomly crop the long side to 512. In this way, most parts of the portraits will be included for learning human semantics.

Q2: Are you cropping randomly 512x512 during training or is there a strategy to generate 5 crops?
We generate all training samples (including cropped and BG replaced) before training because the online generation will make training very slow. As mentioned in Q1, we crop the samples randomly.

Q3: And during inference you resize image to 512x512 instead of cropping, correct?
During inference, we resize the image to ensure the short side is 512 (the size of the long side is arbitary we also slightly adjust the size of the long side to ensure that it can be divided by 32 [thanks for being corrected by @Vozf]) as the input. After inference, we resize the image to its original size to calculate metrics.

Q4: Maybe you can share the toolkit you used for manual annotating?
We use Photoshop to annotate labels. On Youtube, there are many tutorials about how to do it. However, it takes a certain amount of practice to complete precise labeling.

Q5: Also is there any intuition on annotating exactly 3k images?
No... Our training dataset is smaller than the previous work only because the cost of annotation exceeds our expectations. Sorry.

I hope these explanations are helpful to you. :)

from modnet.

Vozf avatar Vozf commented on June 8, 2024 1

Thanks for quick response. The license issues are understandable. Looking forward to the release.

from modnet.

ZHKKKe avatar ZHKKKe commented on June 8, 2024

Hi! Thanks for your attention!

It is our private training dataset of human matting.
I am sorry that we cannot publish this training dataset due to some permissions issues.
However, we will soon release an online demo, pre-trained model, validation benchmark, and training code.
Sorry again that we cannot make our training dataset public.

from modnet.

Vozf avatar Vozf commented on June 8, 2024

I've dug into training setup and got some more questions If you don't mind.
Regarding the image preprocessing, in paper you've mentioned

For each foreground, we generate 5 samples
by random cropping and 10 samples by compositing the
backgrounds from the OpenImage dataset

What is the initial image resolution you crop from?
Are you cropping randomly 512x512 during training or is there a strategy to generate 5 crops? And during inference you resize image to 512x512 instead of cropping, correct?

Also some questions about annotations.
Maybe you can share the toolkit you used for manual annotating? Matte annotation is a bit tricky to do with classic segmentation tools.
Also is there any intuition on annotating exactly 3k images? Similar datasets for training usually consist of considerably more images for example DUTS consists of 15k images and Supervisely contains about 6.5k

from modnet.

Vozf avatar Vozf commented on June 8, 2024

Yeah, this is very useful, thanks again.
Regarding 3rd question and arbitary size. Doesn't mobilenet need to have image size divisible by 2**5(256, 320, 512 etc)? I believe it will throw shape mismatch exception if non divisible shape is fed as an input(257, 319 etc.)? Is this situation handled somehow or am I wrong there?

from modnet.

ZHKKKe avatar ZHKKKe commented on June 8, 2024

@Vozf
Ohh... Yes. You are correct. The side size should can be divided by 32 (we will guarantee this when resizing). I forgot it just now.

from modnet.

Vozf avatar Vozf commented on June 8, 2024

Great, thanks for clarification. Great paper. Looking forward to try it myself.

from modnet.

ZHKKKe avatar ZHKKKe commented on June 8, 2024

You are welcome.

from modnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.