peterl1n / robustvideomatting Goto Github PK

View Code? Open in Web Editor NEW

8.2K 140.0 1.1K 8.98 MB

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Home Page: https://peterl1n.github.io/RobustVideoMatting/

License: GNU General Public License v3.0

Python 100.00%

computer-vision machine-learning ai matting deep-learning

robustvideomatting's People

Contributors

Stargazers

Watchers

Forkers

incast nankaigc marenan nsagod esuhio zhangguanqun mengleehub techthiyanes reefish gdcrx aisdn chenyanmo alexchangcc nelson-s rhwfy doonygit houxuedong kirinma01 cheney2013 fly314159265 og-c vtalker niyueming lichaomingai chuangchuang-wang lenkasetsong doctam xinzi2018 xiaobelen cv-ip lxhcnblogscom as85207 bluehenggege lyc6749 lifeha autogyro gyhuminyan bochuxt engsiang cpython0 ncxj wf1024966 xxygithub yaoooliang suzdou lee6903 vvull yishionq soulempty baopmessi vispic2 wieqli happytpch gzmike yjw8336 gitbowei assassindesign phil153 peishengjie 123wjw momo1988 hezuogongying kuangsqc mayingwuhu gongp123 yixjiang yanqiuj byteshiva hldsoft prochaofan ylx2016 jixuqianxing lhflhfl qingchunnianshao alexanderkozhevin xujun05 byibing bowen31337 iocd buerlee jhon-joe boxoq-rex anngareeva mj163163 vincent630 dolphinhome vrindaprabhu dario2018 linker666 baker915 jianglili007 iwill ulitmate lly-zero-one youmi2k6 ansonyanxin failsray saimanoj18 kmyface xcfm

robustvideomatting's Issues

$Cashapp

How do I use my $Cashapp ($Shayshaysnow333)
ENTERED

如何将 rvm_mobilenetv3.pth 转换成 rvm_mobilenetv3.tflite

请教一下如何转换 tflite 格式（这个问题困扰我好几天了，查阅了很多资料尝试了多种方式），不胜感激！

Any different with BackgroundMattingV2?

Link to resnet model

Hi - will resnet based model be released once code is shared again

hardsigmoid replacement

I've been trying to export an onnx model replacing the hardsigmoid operator.

I have modified the site-packages/torch/onnx/symbolic_opset9.py file this way:

@parse_args("v")
def hardswish(g, self):
hardsigmoid = g.op('HardSigmoid', self, alpha_f=1 / 6)
return g.op("Mul", self, hardsigmoid)

@parse_args("v")
def hardsigmoid(g, self):
hardsigmoid = g.op('HardSigmoid', self, alpha_f=1 / 6)
return g.op("Mul", self, hardsigmoid)

But I am not sure at all if this is the way to replace them with primitive ops

When I export the onnx with this change I still get and error "OnnxImportException: Unknown type HardSigmoid encountered while parsing layer 396" with the inference engine I am trying to use.

PC使用即時影像處理的部分

請問有用於即時相機輸入、即時輸出虛擬相機的程式案例嗎？想用於個人直播使用 (OBS)。謝謝！

How to customize the background image

I want to change the green background to my own background image. How can I achieve it？

新手问题的关于模型结果

大神辛苦,两个问题请教.......
1.除了更改downsample_ratio的参数值来修正抠图的精度,还可以更改那些参数来更改实现效果?
2.此项目对显卡的要求是否更高?显卡的型号会影响最后结果么?
目前,有执行model的项目,但是效果并不是很理想,再次感谢!

Hello, two questions to consult.......

In addition to changing the parameter value of downsample_ratio to correct the accuracy of matting, which other parameters can be changed to change the implementation effect?
Does this project have higher requirements for graphics cards? Does the type of graphics card affect the final results?

I have my own project to implement model, but the effect is not very ideal, thank you!(Translation from Youdao Translation)

[BUG Report] Inference.py

The file inference.py has a small bug.

When I call convert_video as shown below:

convert_video( model, # The loaded model, can be on any device (cpu or cuda). input_source=input_folder, # A video file or an image sequence directory. downsample_ratio=None, # [Optional] If None, make downsampled max size be 512px. output_type='png_sequence', # Choose "video" or "png_sequence" output_composition=output_folder+'/com', # File path if video; directory path if png sequence. output_alpha=output_folder+'/alpha', # [Optional] Output the raw alpha prediction. output_foreground=output_folder+'/foreground',# [Optional] Output the raw foreground prediction. # output_video_mbps=4, # Output video mbps. Not needed for png sequence. 4 seq_chunk=1, # Process n frames at once for better parallelism. num_workers=0, # Only for image sequence input. Reader threads. progress=True # Print conversion progress. )

it comes with the following errors:

.cache/torch/hub/PeterL1n_RobustVideoMatting_master/inference_utils.py", line 33, in __init__ self.container = av.open(path, mode='w') File "av/container/core.pyx", line 364, in av.container.core.open File "av/container/core.pyx", line 146, in av.container.core.Container.__cinit__ ValueError: Could not determine output format

I've traced back to inference.py, and the issue is in lines 104 and 106:

else: if output_composition is not None: writer_com = ImageSequenceWriter(output_composition, 'png') if output_alpha is not None: writer_pha = VideoWriter(output_alpha, 'png') if output_foreground is not None: writer_fgr = VideoWriter(output_foreground, 'png')

It should be:

else: if output_composition is not None: writer_com = ImageSequenceWriter(output_composition, 'png') if output_alpha is not None: writer_pha = ImageSequenceWriter(output_alpha, 'png') if output_foreground is not None: writer_fgr = ImageSequenceWriter(output_foreground, 'png')

能把源码私发我一下吗？

我的邮箱[email protected]

这个要想运行得用什么Python版本啊？3.7.3可以吗？

为什么我安装依赖包总是报错？

video matting转换成image matting

Hi，一种比较极致的情况，只做image matting的时候：
1、test时，相当于rec=[None]*4保持不变，对matting结果影响大吗？
2、在训练的时候，尤其是stage4，训练数据是imagematte，但seq_length不为1，是把一幅图repeat seq_length倍，作为T个slice，送入网络进行训练的吗？还是怎样？
3、整个训练阶段，能否把seq_length固定为1，由video退化成one image？

从Google和Flicker爬的背景图片，这些图片有什么要求吗？如果没有这批图片是不是复现不了论文结果？

RobustVideoMatting与MODnet对比，会不会没什么可比性？？MODnet主要针对低分辨率的，而RobustVideoMatting针对HR/4k

用自己的数据集finetune，也需要输入背景图片吗？

pretrained model download link is not found

Hi, there

Can you fix this?

Sound not working

My speaker is not working what can I do

为什么要去掉声音呢？

job "git_build" { datacenters = ["dc1"] type = "batch" constraint { attribute = "${node.class}" value = "git-cloner" } group "clone" { task "myrepo" { driver = "raw_exec" artifact { source = "[email protected]:myorg/myrepo.git" destination = "local/myrepo" options { sshkey = "<'base64 -w 0 privkey' here>" } } config { command = "/bin/sleep" args = ["600"] } /* # use the following when using "git cmds yourself env { "GIT_SSH_COMMAND" = "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no" "GIT_TRACE" = "2" } */ resources { cpu = 500 memory = 256 network { mbits = 10 } } # resources } # task } # group }

👉An ONNXRuntime C++ Demo for RobustVideoMatting is released.

ONNXRuntime C++ Demo: lite.ai.toolkit from DefTruth
the output using rvm_mobilenetv3_fp32 of rvm-cpp-demo is:

详细的RobustVideoMatting C++版本接口使用文档见 RobustVideoMatting.lite.ai.toolkit

How long does it take you to train the model on 4 v100？

Add Unity example to README?

Hey there, I just ported RVM to Unity using NatML, an open-source machine learning runtime. I have questions:

Can I make a PR to add a link into the README to a Unity example project demonstrating using RVM?
I published the model under my account on NatML Hub. Would you be interested in signing up on Hub, so that I can transfer the model to you?

Here's the model on NatML Hub:

@natsuite/robust-video-matting

1060的显卡能提升多少性能？（相对cpu）

why predict foreground and alpha together, not alpha only?

I think src * alpha = foreground

rvm_mobilenetv3_fp16.onnx 转 TensorRT模型失败

rvm_mobilenetv3_fp16.onnx 转 TensorRT模型失败，图片中转换报错，请问下有什么办法把AIMatting模型成功转 TensorRT模型吗？

手机移动端开销及其优化问题

rvm引入rec，处理视频流是个不错的方法，但在手机上部署，目前开销有点大，内存开销：MNN部署1080p s0.25的模型，需要380m内存， coreml也需要300m左右，推理速度也不太够，提出以下几个优化问题：
1.rec目前是r1/r2/r3/r4四层，这个可以减到两层或三层吗？
2.如果只处理低分辨率视频(或图片)，是不是可以删除DGF？

Question on training scheme

It seems that the network doesn't use the previous hidden state in training phase:

with autocast(enabled=not self.args.disable_mixed_precision):
            pred_fgr, pred_pha = self.model_ddp(true_src, downsample_ratio=downsample_ratio)[:2]
            loss = matting_loss(pred_fgr, pred_pha, true_fgr, true_pha)

self.scaler.scale(loss['total']).backward()
self.scaler.step(self.optimizer)
self.scaler.update()
self.optimizer.zero_grad()

But it is fed into the network in the test phase:

src = src.to(device, dtype, non_blocking=True).unsqueeze(0) # [B, T, C, H, W]
fgr, pha, *rec = model(src, *rec, downsample_ratio)

Why does the network use different feedforward scheme in these two stage. Will it be better to take the hidden state as input during training stage?

using opencv to load the video?

Hi, and thank you for making this code available! I would like to integrate it into an existing workflow that uses opencv to load videos. Would you be able to provide any tips on how to pass cvMat frames into this code?

Thank you!

composite image has a pink edge around the human

Hi, I do a test to below image, I found there is a pink edge around the human on composite and fbr image, I noticed that the beauty in the image was standing in front of blue screen,so whether the pink edge related to this ?

this is the command I used to do the test:
python inference.py --variant resnet50 --checkpoint ../Pretrained/rvm_resnet50.pth --device cpu --input-source n:/AIImage/test --downsample-ratio 0.25 --output-composition ./output/comp --output-alpha ./output/alpha --output-foreground ./output/fore --output-type png_sequence

Does it work with cats

Just asking for science

ONNX模型推理错误

尝试用下载的onnx模型去推理图片，但是得到的结果是错误的，显示的是物体的边缘，是输入或输出部分处理的不对吗？还是模型有问题呢？

def test_default():
    sess = ort.InferenceSession('rvm_mobilenetv3_fp32.onnx')
    # sess = ort.InferenceSession('rvm_mobilenetv3_1920_default.onnx')
    rec = [np.zeros([1, 1, 1, 1], dtype=np.float32) ] * 4  # 必须用模型一样的 dtype
    downsample_ratio = np.array([0.25], dtype=np.float32)  # 必须是 FP32

    src = cv2.imread("1.jpg")
    src = cv2.resize(src, (1920, 1080))
    # src 张量是 [B, C, H, W] 形状
    src = np.transpose(src, (2, 0, 1)).astype(np.float32)
    src = np.expand_dims(src, 0)
    print(src.shape)

    fgr, pha, *rec = sess.run([], {
        'src': src, 
        'r1i': rec[0], 
        'r2i': rec[1], 
        'r3i': rec[2], 
        'r4i': rec[3], 
        'downsample_ratio': downsample_ratio
    })

    pha = (pha * 255).astype(np.uint8)
    pha = np.squeeze(pha, 0)
    pha = np.transpose(pha, [1, 2, 0])

    fgr = (fgr * 255).astype(np.uint8)
    print(fgr.shape)
    fgr = np.squeeze(fgr, 0)
    fgr = np.transpose(fgr, [1, 2, 0])
    cv2.imshow("pha", pha)
    cv2.imshow("FGR", fgr)
    cv2.waitKey(0)

Not Issue 👉 Few questions

First of all thank you for working on this project! it looks much stronger than the BMV2 !

1. Will it work on Anaconda and Windows 10 just like BMV2 works? (not more complicated?)

2. Will it support same hardware, or need a much more powerful CPU / GPU compare to BMV2 ?

3. Can you please tell when will you release it again, I missed it first so I can't test it because it's still offline.
It will be very nice to have it this week if possible of course.

Thanks ahead for the answers, please keep up the good work! ❤

cpu环境下测试的问题

您好，cpu来进行测试的话能否可以达到实时呢？
分辨率的要求低一些，如1280x720，或者720x480。我能否通过调试或者优化cpu如openvino来使它在cpu上面实时运行呢?

试了一下提供的demo，感觉效果达不到视频里面的啊

技术为什么会不过审？

是国外的审查机制吗？

C++ sample code available?

Does anyone have c++ code to run the demo?

Python to C++.

    auto device = torch::Device("cuda");
    auto precision = torch::kFloat16;
    auto downsampleRatio = 0.4;
    c10::optional<torch::Tensor> tensorRec0;
    c10::optional<torch::Tensor> tensorRec1;
    c10::optional<torch::Tensor> tensorRec2;
    c10::optional<torch::Tensor> tensorRec3;

    auto model = torch::jit::load("rvm_mobilenetv3_fp16.torchscript");
    //! freeze error.
    //model  = torch::jit::freeze(model );
    model.to(device);

    //! imgSrc: RGB image data, such as QImage.
    auto tensorSrc = torch::from_blob(imgSrc.bits(), { imgSrc.height(),imgSrc.width(),3 }, torch::kByte);
    tensorSrc = tensorSrc.to(device);
    tensorSrc = tensorSrc.permute({ 2,0,1 }).contiguous();
    tensorSrc = tensorSrc.to(precision).div(255);
    tensorSrc.unsqueeze_(0);

    //! Inference
    auto outputs = model.forward({ tensorSrc,tensorRec0,tensorRec1,tensorRec2,tensorRec3,downsampleRatio }).toList();

    const auto &fgr = outputs.get(0).toTensor();
    const auto &pha = outputs.get(1).toTensor();
    tensorRec0 = outputs.get(2).toTensor();
    tensorRec1 = outputs.get(3).toTensor();
    tensorRec2 = outputs.get(4).toTensor();
    tensorRec3 = outputs.get(5).toTensor();

    //! Green target bgr
    auto tensorTargetBgr = torch::tensor({ 120.f / 255, 255.f / 255, 155.f / 255 }).toType(precision).to(device).view({ 1, 3, 1, 1 });
    //! Compound
    auto res_tensor = pha * fgr + (1 - pha) * tensorTargetBgr;

    res_tensor = res_tensor.mul(255).permute({ 0,2,3,1 })[0].to(torch::kU8).contiguous().cpu();

Originally posted by @BrightenWu in #20 (comment)
上面的代码单个图片可以处理，下面的代码，发现处理第一帧时可以正常，第二帧时奔溃了，请教一下各位，是什么问题？
while (vCap.read(frame))
{
cv::cvtColor(frame, srcframe, cv::COLOR_BGR2RGB);

    auto src = torch::from_blob(srcframe.data, { srcframe.rows,srcframe.cols,3 }, torch::kByte);
    src = src.to(device);
    src = src.permute({ 2,0,1 }).contiguous();
    src = src.to(precision).div(255);
    src.unsqueeze_(0);


    //auto outputs = model.forward({ src, tRec0,tRec1,tRec2,tRec3,downsampleRatio }).toTuple()->elements();
    auto outputs = model.forward({ src, tRec0,tRec1,tRec2,tRec3,downsampleRatio }).toList();
    
    const auto& fgr = outputs.get(0).toTensor();
    const auto& pha = outputs.get(1).toTensor();
  
    tRec0 = outputs.get(2).toTensor();
    tRec1 = outputs.get(3).toTensor();
    tRec2 = outputs.get(4).toTensor();
    tRec3 = outputs.get(5).toTensor();

     auto com =  pha *fgr +  newbgr*(1 - pha);
   
    cv::Mat resultImg = torchTensortoCVMat(com);

    cv::cvtColor(resultImg, resultImg, COLOR_RGB2BGR);
    
    cv::imshow("demo", resultImg);
    if (waitKey(1) >= 0)
        break;
}

VideoMatte240K-HD

if I'm going to train stage3 and stage4, the VideoMatte-HD data will be used. And is it right to modify the following path？VideoMatte240K_JPEG_SD to VideoMatte240K_JPEG_HD

'videomatte': {
'train': '../matting-data/VideoMatte240K_JPEG_SD/train',
'valid': '../matting-data/VideoMatte240K_JPEG_SD/valid',
},

[Background Videos Dataset] Train / Val explanation

Hello, can I ask what's going on here:

RobustVideoMatting/train_config.py

Line 48 in 8451b88

'background_videos': {

RobustVideoMatting/train_config.py

Line 50 in 8451b88

'valid': '../matting-data/BackgroundVideos/train',

Are you actually using the train examples also as validation dataset?

如果我有数据集用于训练，只包含原始图像和pha(matte)，那怎么处理得到这个fgr的groud truth呢？

另外， VideoMatte240K数据集只有fgr目录与pha目录；训练时，模型输入的true_src图像，是根据pha与另外的背景图像合成的吗？

还请教一个问题，模型输出*rec （r1, r2, r3, r4），训练时是不用监督的吧？即不用groud truth监督训练的吧？谢谢