zhkkke / modnet Goto Github PK

View Code? Open in Web Editor NEW

3.7K 103.0 627.0 60.77 MB

A Trimap-Free Portrait Matting Solution in Real Time [AAAI 2022]

License: Apache License 2.0

Python 100.00%

portrait-matting

modnet's People

Contributors

Stargazers

Watchers

Forkers

avatarworld ericustc codingmice doytsujin tamwaiban cuijianzhu cvtuge uptodiff kexiwjs lcc157 bartlomiejlac juliakahan antonilason iga635 atomeyang yuewenjun mornydew thiinbit teajia asdlei99 xrosliang everything-goes-well dickkky vincent630 arcchang1236 linnawang76 elliotthwang liuwenhaha mikedai2019 mikedai2020 aiart-gao wuzhiyang2016 zerrui xiangliu886 kickbox aliabd sedatli satoshirobatofujimoto liannice majinshaoyuindustry opentld whitedigitalai dwhou yinxx jurjsorinliviu saynn wancy2020 crossxxd tkianai vladimirgl tubbz-alt vensss cyy0523xc cule ejake gyanachand1 maxcodextc sunny11286 kerboute 0x4mio lingyv-li akumar-forks jndhlovu qaz734913414 electroholmes codeslime prathyusha-akundi loganwu0526 chijuwu90 jbdatascience yasuolumia intjun tarozone fylfot yuryscript xilaraux dumpmemory nullcookies yaroslavmavliutov ml-and-ai-repo rocketsvm c-chaitanya matveysudomoykin markusbuchholz zstarpak santhu45482 emiliorguez harshalshende alan-ai-learner zerrojs yzhou0919 adioslabs wps1215 muhk01 wyuzyf yeqifeng2288 sxy-ai2020 tomgalla11 liyunfei0411 t9-zz

modnet's Issues

关于加trimp的预测问题

请问作者，看到你们有添加trimp的预测结果，trimp怎么添加进预测呢？

difference between the video Inference and single Image Inference in the code

Hi , Thanks for your great work !
I am confused . what the difference between the video inference and the single image inference? except that where the picture came from. and the difference between the two ckpt file: modnet_webcam_portrait_matting.ckpt, modener_photographic_portrait_matting.ckpt. Thank you!

哪里可以下载到“pre-trained MODNet”？

多谢分享，在学习过程中，发现还没有提供pre-trained MODNet文件，这个在哪里可以下载，或者什么时候可以提供？多谢

Modnet Demo App not working

The Gradio demo app you created does not appear to be working. It says "Launching Your Gradio Interface". "You will receive an email notification when the process is complete. You may close this page."

I don't know what this means. It also wasn't saying this yesterday.

https://gradio.app/g/modnet

run webcam demo failed

Hi,

I'm trying to run the video matting demo based on WebCam, after follow your recommend cmd.
It failed at pytorch issue,but I google and can't find other error issue like this.
My environment is ubuntu 16.04 with gpu rtx2070 python3.6 cuda10

Could you help me solve this error.
Thanks

Dataset and Trimap

Hello! I used the model you trained to test a large number of pictures, and found that there is a significant problem. When there is an intermediate background such as a seat and other people’s avatars at both ends of the head and shoulders, the matting effect is poor, I can understand Is the semantic segmentation of portraits in the network inaccurate? Or there is no such case in your improved portrait dataset, which leads to poor segmentation. If this is the problem, can you add some such images to it?
You mentioned that the modnet can use trimap input, how do you do it? If the trimap is not very accurate, can the network adapt to the trimap to some extent?
There is also a portrait semantic segmentation model in the model you provided. How was it trained? It was in the first stage of the network.

关于用于SOC的数据问题

作者您好：
想问下400段的video clips 是按连续帧抽的吗？谢谢

question on your code

MODNet/src/models/modnet.py

Line 27 in bece9b4

in_x = self.inorm(x[:, self.inorm_channels:, ...].contiguous())

the original code is "in_x = self.inorm(x[:, self.inorm_channels:, ...].contiguous())"

Should this line use "self.bnorm_channels" instead of inorm_channels?

TypeError: VideoWriter() missing required argument 'frameSize' (pos 5)

想问一下前辈，您的opencv用的是什么版本的？我用opencv创建视频流对象的时候总是报错
video_writer = cv2.VideoWriter(result,fourcc,fps,(frame_width,frame_height))

SOC problem

Thanks for your sharing.Nice work~
here is a question about the SOC in your paper.
The self-supervied stage is used in the new domain datasets, so the new or the target datasets are which we will test later?

And another question is when i try to train MODNet, the prediction of dp if just boundary which is not same as your paper .

Error loading Colab demo notebooks

There was an error loading this notebook. Ensure that the file is accessible and try again.
Error loading https://apis.google.com/js/client.js

https://drive.google.com/drive/?action=locate&id=1GANpbKT06aEFiW-Ssx0DQnnEADcXwQG6&authuser=0
Error loading https://apis.google.com/js/client.js
Error: Error loading https://apis.google.com/js/client.js
at HTMLScriptElement.h.onerror (https://colab.research.google.com/v2/external/gapi_loader.js:19:330)

Support for Python 3.8?

I think the version of torch your using (1.0.0) requires Python 3.6. However, since most people are on 3.8 now, would it be possible to update the script to use a later version of torch?

sns group need

搞个wx群或者qq 群如何~

Training data

In paper you've said

For a fair comparison, we train all models on the same
dataset, which contains nearly 3000 annotated foregrounds.

What is this dataset?

error occured when run on windows cpu

hi, because there is no GPU on my windows pc, so i change the GPU to cpu by :

and chang the code in image_matting/inference.py line 98:
_, _, matte = modnet(im.cpu(), inference=False)

but error occured as:

my torch-cpu=1.6.0
thanks.

The difference between using BN or using IBNorm

Hi, I noticed that you use IBNorm in your code( combine IN and BN into one layer). What are the benefits of task? Thanks！

关于SOC的训练问题

您好，关于soc的训练还有几个问题想请问一下。
在soc的时候，需要训练多少个epoch呢？
我自己在复现的时候，是按照论文中的Adam优化器 1e-4 的学习率去微调，我设置了30个epoch。但是发现在微调过程中loss在第一轮是先上升的很快，然后逐渐下降，但下降的并不多。并且在验证集上的iou还在不断下降。
我目前不知道是哪个环节出了问题，还望您指点一下。
非常感谢！~

Question about the label of detail prediction

I want to know about how you get the label of detail branch.
Due to figure.2 in your paper, my code is as follow:
image = cv2.imread("/mnt/SegmentData/labels/00000.jpg")
kernel = np.ones((80, 80), np.uint8)
dil = cv2.dilate(image, kernel, iterations=1)
ero = cv2.erode(image, kernel, iterations=1))
dil_ero = dil - ero
out = dil_ero * image
But cannot get the similar result as you show as details_dp in Figure2.
It would be helpful if you could correct my codes. Thanks

Change License to a compatible one that would allow easier use in open source projects.

MODNET is an impressive work in solving image and video matting problems.
This work has a non-commercial license. Could you please provide a more convenient license like MIT or Apache license that allows for commercial use. This would make it easier for this work to be used in open source projects. Non-commercial License work is not compatible for use in open source projects. You can read more about the incompatibility of non-commercial work from here. I am unable to make use of this work in my open source project because of the non-commercial license.

Can we export pretrained model to TorchScript format?

this is a great work, and have some advantages comparing to other models.

does there have TorchScript format model as BackgroundMatting project do? this will do much more check and performance benchmark work on other devices such as mobile, i think this project must has a wide perspective and huge potential.

@ZHKKKe thanks your work, waiting for future progress.

i try to use jit.save but it is not working:

modnet = MODNet(backbone_pretrained=False)
modnet = nn.DataParallel(modnet)
modnet.load_state_dict(torch.load(args.ckpt_path, map_location=torch.device('cpu')))
modnet.eval()
scriptmod = torch.jit.script(modnet)
torch.jit.save(scriptmod, "modnet.pt")

Training code

Can you release your training code?

How to synthesize a new background?

关于 adobe 数据训练的问题

@ZHKKKe
您好
我认为这是一个非常牛逼的项目！
当然，在有时候，一些特殊的图抠的不太理想，但是在正常的图抠的非常精细。
很自然的，我想到了使用更好的数据训练，比如 Adobe Deep Image Matting Dataset 数据集
能否分享一下详细的训练过程，非常感谢！

关于compositional loss

您好，有个问题想请教一下
关于 compositional loss，您论文中说到：It measures the absolute difference between the input image I and the composited image obtained from αp, the ground truth foreground, and the ground truth background.
这里的ground truth foreground， ground truth background指的是什么呢？您是在求解loss的时候用到了trimap么？
刚入门matting的小白，可能问题有点多

train on rgb or bgr?

Hello , for image matting the train was done using bgr or rgb format?
i got better result when testing images with Bgr format (opencv format)
But when i checked your code of inference it's seems you're using RGB (Pil format)
so i'm confused maybe in training you sed RGB format.
thks

How to improve image matting accuracy

Hi,

Thanks for this great project. I tried your colab for image matting, it looks like the boundary is not clear enough for some inputs(also the one in Github readme).

Is there anyway to improve image matting accuracy?

License clarification

The README file says that this project is released under the CC-BY-NC-SA 4.0 license.

Does this affect the images produced with the network? Usually, the license of a tool does not affect the license of the output of the tool, except if specified in the license. For example, the license for Adobe's Deep Image Matting dataset explicitly states that models trained on their images can only be used for non-commercial purposes, so I just wanted to ask to make sure this is not the case here.

有百度云的地址吗？

预训练模型的GD地址由于现在较难科学上网，无法打开，希望可以更新到度盘或者直接放在github上

When does the code become public?

Some question about training！

Hello, your work is excellent！
But, I want to know how your pre-trained model is trained,and I want to know how your loss function is represented in your code.
Can you publish more your code details about the training process?
Can you contact me through email?
[email protected]

mobilenetv2_human_seg.ckpt，这个文件是该项目中的吗？

作者你好，
我刚才试了试开源的模型，发现头发抠的非常精细，但是其它地方比较容易多抠出一些东西，特别是背景略复杂的时候。请问加上人体segmentation会不会好些？正好我在你分享的google drive链接中看到这个：mobilenetv2_human_seg.ckpt。请问是做这件事的吗？请问mobilenetv2_human_seg.ckpt怎么使用呢？

谢谢！

Noisy output when no human present

Hi, Thank you for open sourcing such a great work.
I tried the model, it works well in detecting person in almost every case, But there's always some noise in real input images. I tried it on an image with no person, but still the model masks a significant segment.

Is it possible to increase confidence score, or some post or preporcessing method to improve the results.
Thank you

Provide user video

The colab notebook uses user's webcam video. How can we provide a pre-recorded video file for inference?

Composition of two videos

Hi,

Thanks for this great project!
Are you planning to make a new release with composition of two videos? It would be good

Demo result is far from good

Thank you for this excellent work.
However,there is a big gap between test results and description. There is obvious jitter at the edge and wrong classifications.
Do you have any plan to improve ?

Any way to run the model in Tensorflow / Tensorflow.js?

Hi!
I'm not very experienced with ML yet and was wondering if you could point me towards ability to run the trained model inference using Tensorflow / TF.js. What would I need to do?

Thanks so much for the paper and the model!

PPM100

Thanks your great work. I'm very interested in the data sets you've built. Can you tell me how to create this dataset? Could you please open the dataset for us? thanks a lot.

Web edition output in png format but with white background

I'm talking about this website - https://gradio.app/g/modnet
It would be much better, if users will be able to download png file without background at all.
Is it possible in future?
Will you update it later?

Transfer learning

Hi,

Do you plan to release transfer learning options to specify which layers should be trained?

Also, have you tried this network on different objects besides portraits? If yes, did you managed to get comparable results?

train

when will you share the train code, i am too eager to try it !

Speed Related Issue

Hi,
Thank you for this great work. I really appreciate the accuracy. I wanted to segment this object at 30 FPS but relatively the detection speed is very slow. is there any way to use this as a real-time application. ? This repo will rock in the future...!

Exporting model to onnx format

How to export model to onnx format?
I tried the following with colab demo code , but it showed an error:
TypeError: forward() missing 1 required positional argument: 'inference'

from torch.autograd import Variable

model = MODNet(backbone_pretrained=False)
model = nn.DataParallel(model).cuda()

state_dict = torch.load(pretrained_ckpt)
model.load_state_dict(state_dict)
model.eval()
dummy_input = Variable(torch.randn(1, 3, 512, 512))
torch.onnx.export(model.module, dummy_input, '/content/MODNet/modnet.onnx', export_params = True)

BTW the test results looks amazing !!!

关于soc部分训练的问题

作者您好，对于soc训练这部分有些疑问，想请教一下。

对于下面这个损失函数，以前面的l2损失为例，此时两个输入都是网络的输出结果，那么在做反向传播的时候，梯度是从两个输出同时传进去的吗，还是说会固定第二项。
将M复制给M‘后，M’在后面权重是一直不变的吗？
我对于soc的理解是，Lcons损失由于是自监督训练，会导致ap向sp和dp靠近，但由于sp是模糊的，那么会导致ap也变的模糊。此时加个Ldd来限制ap朝模糊的方向发展。但是如果前面训练好的M网络对于新域的视频数据预测的很差，那么结果会不会朝不好的方向发展呢？

使用预训练模型效果不理想

[您好，我用了您的预训练模型直接进行预测matte,然后将matte作为mask与原图做“与"操作，发现效果很差，我是不是哪里操作不对啊？
这是我的代码，其中matting是利用预训练模型modnet_photographic_portrait_matting.ckpt预测得到的matte
img = cv2.imread(image)
matting = cv2.imread(matting,cv2.IMREAD_GRAYSCALE)
masked = cv2.bitwise_and(img,img,mask=matting)

No module named ‘src.models’

when I tred to complie image_matting, here is the error that I got. Does anyone know how to fix it ? I am kinda new for programming. Thx.

请问什么时候可以发布代码和预训练模型呢？

很感谢您的工作，请问什么时候可以发布代码和预训练模型呢？期待！

Embedded lowcost GPU

Hi,

cool project. Would like to test if we can port it to an ARM64 with a 32GB embedded low power GPU system (NVIDIA AGX Xavier 32GB). This can be used for training and inferencing. A first test with the pretrained in Dec. 2020 would give us a first impression of how the performance is comparable with desktop GPU.

hd分支loss值一直不下降

抱歉又来打扰
在训练过程中hd_loss的值一直在 0.06 附近，按照你在另一个问题中回答改了一下损失函数

num_mask = torch.sum(mask)
hd_loss = torch.sum((torch.abs(pred_detail - matte) + torch.abs(pred_detail.detach() - matte)) * mask) / num_mask

在后续训练中发现还是不能收敛，有什么其他解决办法么

关于mobilenetv2_human_seg.ckpt一些问题

感谢作者论文
想问一下，上面提到的预训练权重文件我并没有在Human-Segmentation-PyTorch找到，是UNet_MobileNetV2么，还是其他的，谢谢