zhkkke / modnet Goto Github PK
View Code? Open in Web Editor NEWA Trimap-Free Portrait Matting Solution in Real Time [AAAI 2022]
License: Apache License 2.0
A Trimap-Free Portrait Matting Solution in Real Time [AAAI 2022]
License: Apache License 2.0
I'm talking about this website - https://gradio.app/g/modnet
It would be much better, if users will be able to download png file without background at all.
Is it possible in future?
Will you update it later?
预训练模型的GD地址由于现在较难科学上网,无法打开,希望可以更新到度盘或者直接放在github上
I want to know about how you get the label of detail branch.
Due to figure.2 in your paper, my code is as follow:
image = cv2.imread("/mnt/SegmentData/labels/00000.jpg")
kernel = np.ones((80, 80), np.uint8)
dil = cv2.dilate(image, kernel, iterations=1)
ero = cv2.erode(image, kernel, iterations=1))
dil_ero = dil - ero
out = dil_ero * image
But cannot get the similar result as you show as details_dp in Figure2.
It would be helpful if you could correct my codes. Thanks
您好,有个问题想请教一下
关于 compositional loss,您论文中说到:It measures the absolute difference between the input image I and the composited image obtained from αp, the ground truth foreground, and the ground truth background.
这里的ground truth foreground, ground truth background指的是什么呢?您是在求解loss的时候用到了trimap么?
刚入门matting的小白,可能问题有点多
The Gradio demo app you created does not appear to be working. It says "Launching Your Gradio Interface". "You will receive an email notification when the process is complete. You may close this page."
I don't know what this means. It also wasn't saying this yesterday.
when I tred to complie image_matting, here is the error that I got. Does anyone know how to fix it ? I am kinda new for programming. Thx.
作者你好,
我刚才试了试开源的模型,发现头发抠的非常精细,但是其它地方比较容易多抠出一些东西,特别是背景略复杂的时候。请问加上人体segmentation会不会好些?正好我在你分享的google drive链接中看到这个:mobilenetv2_human_seg.ckpt。请问是做这件事的吗?请问mobilenetv2_human_seg.ckpt怎么使用呢?
谢谢!
MODNET is an impressive work in solving image and video matting problems.
This work has a non-commercial license. Could you please provide a more convenient license like MIT or Apache license that allows for commercial use. This would make it easier for this work to be used in open source projects. Non-commercial License work is not compatible for use in open source projects. You can read more about the incompatibility of non-commercial work from here. I am unable to make use of this work in my open source project because of the non-commercial license.
多谢分享,在学习过程中,发现还没有提供pre-trained MODNet文件,这个在哪里可以下载,或者什么时候可以提供?多谢
The README file says that this project is released under the CC-BY-NC-SA 4.0 license.
Does this affect the images produced with the network? Usually, the license of a tool does not affect the license of the output of the tool, except if specified in the license. For example, the license for Adobe's Deep Image Matting dataset explicitly states that models trained on their images can only be used for non-commercial purposes, so I just wanted to ask to make sure this is not the case here.
Hi , Thanks for your great work !
I am confused . what the difference between the video inference and the single image inference? except that where the picture came from. and the difference between the two ckpt file: modnet_webcam_portrait_matting.ckpt, modener_photographic_portrait_matting.ckpt. Thank you!
很感谢您的工作,请问什么时候可以发布代码和预训练模型呢?期待!
作者您好:
想问下400段的video clips 是按连续帧抽的吗?谢谢
请问作者,看到你们有添加trimp的预测结果,trimp怎么添加进预测呢?
The colab notebook uses user's webcam video. How can we provide a pre-recorded video file for inference?
抱歉又来打扰
在训练过程中hd_loss的值一直在 0.06
附近,按照你在另一个问题中回答改了一下损失函数
num_mask = torch.sum(mask)
hd_loss = torch.sum((torch.abs(pred_detail - matte) + torch.abs(pred_detail.detach() - matte)) * mask) / num_mask
在后续训练中发现还是不能收敛,有什么其他解决办法么
There was an error loading this notebook. Ensure that the file is accessible and try again.
Error loading https://apis.google.com/js/client.js
https://drive.google.com/drive/?action=locate&id=1GANpbKT06aEFiW-Ssx0DQnnEADcXwQG6&authuser=0
Error loading https://apis.google.com/js/client.js
Error: Error loading https://apis.google.com/js/client.js
at HTMLScriptElement.h.onerror (https://colab.research.google.com/v2/external/gapi_loader.js:19:330)
Hi, I noticed that you use IBNorm in your code( combine IN and BN into one layer). What are the benefits of task? Thanks!
Hi!
I'm not very experienced with ML yet and was wondering if you could point me towards ability to run the trained model inference using Tensorflow / TF.js. What would I need to do?
Thanks so much for the paper and the model!
In paper you've said
For a fair comparison, we train all models on the same
dataset, which contains nearly 3000 annotated foregrounds.
What is this dataset?
感谢作者论文
想问一下,上面提到的预训练权重文件我并没有在Human-Segmentation-PyTorch找到,是UNet_MobileNetV2么,还是其他的,谢谢
搞个wx群或者qq 群如何~
Hi, Thank you for open sourcing such a great work.
I tried the model, it works well in detecting person in almost every case, But there's always some noise in real input images. I tried it on an image with no person, but still the model masks a significant segment.
Is it possible to increase confidence score, or some post or preporcessing method to improve the results.
Thank you
Hello, your work is excellent!
But, I want to know how your pre-trained model is trained,and I want to know how your loss function is represented in your code.
Can you publish more your code details about the training process?
Can you contact me through email?
[email protected]
Hello , for image matting the train was done using bgr or rgb format?
i got better result when testing images with Bgr format (opencv format)
But when i checked your code of inference it's seems you're using RGB (Pil format)
so i'm confused maybe in training you sed RGB format.
thks
How to synthesize a new background?
this is a great work, and have some advantages comparing to other models.
does there have TorchScript format model as BackgroundMatting project do? this will do much more check and performance benchmark work on other devices such as mobile, i think this project must has a wide perspective and huge potential.
@ZHKKKe thanks your work, waiting for future progress.
i try to use jit.save but it is not working:
modnet = MODNet(backbone_pretrained=False)
modnet = nn.DataParallel(modnet)
modnet.load_state_dict(torch.load(args.ckpt_path, map_location=torch.device('cpu')))
modnet.eval()
scriptmod = torch.jit.script(modnet)
torch.jit.save(scriptmod, "modnet.pt")
Can you release your training code?
Hi,
Do you plan to release transfer learning options to specify which layers should be trained?
Also, have you tried this network on different objects besides portraits? If yes, did you managed to get comparable results?
Thanks your great work. I'm very interested in the data sets you've built. Can you tell me how to create this dataset? Could you please open the dataset for us? thanks a lot.
您好,关于soc的训练还有几个问题想请问一下。
在soc的时候,需要训练多少个epoch呢?
我自己在复现的时候,是按照论文中的Adam优化器 1e-4 的学习率去微调,我设置了30个epoch。但是发现在微调过程中loss在第一轮是先上升的很快,然后逐渐下降,但下降的并不多。并且在验证集上的iou还在不断下降。
我目前不知道是哪个环节出了问题,还望您指点一下。
非常感谢 !~
Hi,
cool project. Would like to test if we can port it to an ARM64 with a 32GB embedded low power GPU system (NVIDIA AGX Xavier 32GB). This can be used for training and inferencing. A first test with the pretrained in Dec. 2020 would give us a first impression of how the performance is comparable with desktop GPU.
Thanks for your sharing.Nice work~
here is a question about the SOC in your paper.
The self-supervied stage is used in the new domain datasets, so the new or the target datasets are which we will test later?
And another question is when i try to train MODNet, the prediction of dp if just boundary which is not same as your paper .
How to export model to onnx format?
I tried the following with colab demo code , but it showed an error:
TypeError: forward() missing 1 required positional argument: 'inference'
from torch.autograd import Variable
model = MODNet(backbone_pretrained=False)
model = nn.DataParallel(model).cuda()
state_dict = torch.load(pretrained_ckpt)
model.load_state_dict(state_dict)
model.eval()
dummy_input = Variable(torch.randn(1, 3, 512, 512))
torch.onnx.export(model.module, dummy_input, '/content/MODNet/modnet.onnx', export_params = True)
BTW the test results looks amazing !!!
I think the version of torch your using (1.0.0) requires Python 3.6. However, since most people are on 3.8 now, would it be possible to update the script to use a later version of torch?
Hi,
Thanks for this great project!
Are you planning to make a new release with composition of two videos? It would be good
for the pretrained webcam/custom video model, is it recommended to always resize to 672x512 ? or can i use the same preprocessing as for the photographic model (https://github.com/ZHKKKe/MODNet/blob/master/demo/image_matting/colab/inference.py)? for example, if my video is 16/9 instead of 4/3, or god forbid, vertical video, would it be better to use the latter preprocessing that would more closely preserve the aspect ratio?
Hi,
Thank you for this great work. I really appreciate the accuracy. I wanted to segment this object at 30 FPS but relatively the detection speed is very slow. is there any way to use this as a real-time application. ? This repo will rock in the future...!
Hello! I used the model you trained to test a large number of pictures, and found that there is a significant problem. When there is an intermediate background such as a seat and other people’s avatars at both ends of the head and shoulders, the matting effect is poor, I can understand Is the semantic segmentation of portraits in the network inaccurate? Or there is no such case in your improved portrait dataset, which leads to poor segmentation. If this is the problem, can you add some such images to it?
You mentioned that the modnet can use trimap input, how do you do it? If the trimap is not very accurate, can the network adapt to the trimap to some extent?
There is also a portrait semantic segmentation model in the model you provided. How was it trained? It was in the first stage of the network.
when will you share the train code, i am too eager to try it !
Line 27 in bece9b4
Should this line use "self.bnorm_channels" instead of inorm_channels?
Hi,
Thanks for this great project. I tried your colab for image matting, it looks like the boundary is not clear enough for some inputs(also the one in Github readme).
Is there anyway to improve image matting accuracy?
@ZHKKKe
您好
我认为这是一个非常牛逼的项目!
当然,在有时候,一些特殊的图抠的不太理想,但是在正常的图抠的非常精细。
很自然的,我想到了使用更好的数据训练,比如 Adobe Deep Image Matting Dataset 数据集
能否分享一下详细的训练过程,非常感谢!
想问一下前辈,您的opencv用的是什么版本的?我用opencv创建视频流对象的时候总是报错
video_writer = cv2.VideoWriter(result,fourcc,fps,(frame_width,frame_height))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.