Comments (8)
For your questions:
Q1: What is the initial image resolution you crop from?
The resolutions of the original images in our training dataset are between 1000x1000~2000x2000. When generating cropped samples, we first resize the image to ensure the short side is 512, and then we randomly crop the long side to 512. In this way, most parts of the portraits will be included for learning human semantics.
Q2: Are you cropping randomly 512x512 during training or is there a strategy to generate 5 crops?
We generate all training samples (including cropped and BG replaced) before training because the online generation will make training very slow. As mentioned in Q1, we crop the samples randomly.
Q3: And during inference you resize image to 512x512 instead of cropping, correct?
During inference, we resize the image to ensure the short side is 512 (the size of the long side is arbitary we also slightly adjust the size of the long side to ensure that it can be divided by 32 [thanks for being corrected by @Vozf]) as the input. After inference, we resize the image to its original size to calculate metrics.
Q4: Maybe you can share the toolkit you used for manual annotating?
We use Photoshop to annotate labels. On Youtube, there are many tutorials about how to do it. However, it takes a certain amount of practice to complete precise labeling.
Q5: Also is there any intuition on annotating exactly 3k images?
No... Our training dataset is smaller than the previous work only because the cost of annotation exceeds our expectations. Sorry.
I hope these explanations are helpful to you. :)
from modnet.
Thanks for quick response. The license issues are understandable. Looking forward to the release.
from modnet.
Hi! Thanks for your attention!
It is our private training dataset of human matting.
I am sorry that we cannot publish this training dataset due to some permissions issues.
However, we will soon release an online demo, pre-trained model, validation benchmark, and training code.
Sorry again that we cannot make our training dataset public.
from modnet.
I've dug into training setup and got some more questions If you don't mind.
Regarding the image preprocessing, in paper you've mentioned
For each foreground, we generate 5 samples
by random cropping and 10 samples by compositing the
backgrounds from the OpenImage dataset
What is the initial image resolution you crop from?
Are you cropping randomly 512x512 during training or is there a strategy to generate 5 crops? And during inference you resize image to 512x512 instead of cropping, correct?
Also some questions about annotations.
Maybe you can share the toolkit you used for manual annotating? Matte annotation is a bit tricky to do with classic segmentation tools.
Also is there any intuition on annotating exactly 3k images? Similar datasets for training usually consist of considerably more images for example DUTS consists of 15k images and Supervisely contains about 6.5k
from modnet.
Yeah, this is very useful, thanks again.
Regarding 3rd question and arbitary size. Doesn't mobilenet need to have image size divisible by 2**5(256, 320, 512 etc)? I believe it will throw shape mismatch exception if non divisible shape is fed as an input(257, 319 etc.)? Is this situation handled somehow or am I wrong there?
from modnet.
@Vozf
Ohh... Yes. You are correct. The side size should can be divided by 32 (we will guarantee this when resizing). I forgot it just now.
from modnet.
Great, thanks for clarification. Great paper. Looking forward to try it myself.
from modnet.
You are welcome.
from modnet.
Related Issues (20)
- MODNet离线demo——WebCam调用摄像头延迟 HOT 1
- Will you please release the DEMO trained model? HOT 1
- Anyone who would kindly share their training code? HOT 5
- Training Code HOT 8
- 如何设置只对人像进行识别抠图 HOT 2
- Initial performance very poor after training on P3M dataset.
- 你好,我是清华的一名学生,论文提到分割部分的backbone是mobileV2吗,这部分是否可以换成其他网络,有一些细节比较困惑,真心请教,感激不尽 HOT 1
- Whether it is possible to replace the backbone of the semantic segmentation part and change it to swintrainsformer
- inaccurate
- Fine-tuning MODNet on a custom dataset using SOC HOT 1
- How to get the label(groundtruth) of other image?
- 请问mobilenetv2_human_seg.ckpt和其他几个模型的区别在哪儿?
- I think the boundary mask is calculated wrong HOT 1
- modnet 下面模型怎么安装使用? HOT 1
- Performance different on colab demo and website demo
- 关于e-ASPP HOT 4
- edit the LRBranch and HRBranch after changed backbone HOT 3
- Single image portrait cutout. Running on GPU is not as fast as CPU
- 7M的模型在哪里? HOT 1
- 求教
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from modnet.