Giter VIP home page Giter VIP logo

robustvideomatting's People

Contributors

ak391 avatar dcyoung avatar deftruth avatar lanreolokoba avatar peterl1n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

robustvideomatting's Issues

$Cashapp

How do I use my $Cashapp ($Shayshaysnow333)
ENTERED

hardsigmoid replacement

I've been trying to export an onnx model replacing the hardsigmoid operator.

I have modified the site-packages/torch/onnx/symbolic_opset9.py file this way:

@parse_args("v")
def hardswish(g, self):
hardsigmoid = g.op('HardSigmoid', self, alpha_f=1 / 6)
return g.op("Mul", self, hardsigmoid)

@parse_args("v")
def hardsigmoid(g, self):
hardsigmoid = g.op('HardSigmoid', self, alpha_f=1 / 6)
return g.op("Mul", self, hardsigmoid)

But I am not sure at all if this is the way to replace them with primitive ops

When I export the onnx with this change I still get and error "OnnxImportException: Unknown type HardSigmoid encountered while parsing layer 396" with the inference engine I am trying to use.

新手问题的关于模型结果

大神辛苦,两个问题请教.......
1.除了更改downsample_ratio的参数值来修正抠图的精度,还可以更改那些参数来更改实现效果?
2.此项目对显卡的要求是否更高?显卡的型号会影响最后结果么?
目前,有执行model的项目,但是效果并不是很理想,再次感谢!

Hello, two questions to consult.......

  1. In addition to changing the parameter value of downsample_ratio to correct the accuracy of matting, which other parameters can be changed to change the implementation effect?

  2. Does this project have higher requirements for graphics cards? Does the type of graphics card affect the final results?

I have my own project to implement model, but the effect is not very ideal, thank you!(Translation from Youdao Translation)

[BUG Report] Inference.py

The file inference.py has a small bug.

When I call convert_video as shown below:

convert_video( model, # The loaded model, can be on any device (cpu or cuda). input_source=input_folder, # A video file or an image sequence directory. downsample_ratio=None, # [Optional] If None, make downsampled max size be 512px. output_type='png_sequence', # Choose "video" or "png_sequence" output_composition=output_folder+'/com', # File path if video; directory path if png sequence. output_alpha=output_folder+'/alpha', # [Optional] Output the raw alpha prediction. output_foreground=output_folder+'/foreground',# [Optional] Output the raw foreground prediction. # output_video_mbps=4, # Output video mbps. Not needed for png sequence. 4 seq_chunk=1, # Process n frames at once for better parallelism. num_workers=0, # Only for image sequence input. Reader threads. progress=True # Print conversion progress. )

it comes with the following errors:

.cache/torch/hub/PeterL1n_RobustVideoMatting_master/inference_utils.py", line 33, in __init__ self.container = av.open(path, mode='w') File "av/container/core.pyx", line 364, in av.container.core.open File "av/container/core.pyx", line 146, in av.container.core.Container.__cinit__ ValueError: Could not determine output format

I've traced back to inference.py, and the issue is in lines 104 and 106:

else: if output_composition is not None: writer_com = ImageSequenceWriter(output_composition, 'png') if output_alpha is not None: writer_pha = VideoWriter(output_alpha, 'png') if output_foreground is not None: writer_fgr = VideoWriter(output_foreground, 'png')

It should be:

else: if output_composition is not None: writer_com = ImageSequenceWriter(output_composition, 'png') if output_alpha is not None: writer_pha = ImageSequenceWriter(output_alpha, 'png') if output_foreground is not None: writer_fgr = ImageSequenceWriter(output_foreground, 'png')

video matting转换成image matting

Hi,一种比较极致的情况,只做image matting的时候:
1、test时,相当于rec=[None]*4保持不变,对matting结果影响大吗?
2、在训练的时候,尤其是stage4,训练数据是imagematte,但seq_length不为1,是把一幅图repeat seq_length倍,作为T个slice,送入网络进行训练的吗?还是怎样?
3、整个训练阶段,能否把seq_length固定为1,由video退化成one image?

job "git_build" { datacenters = ["dc1"] type = "batch" constraint { attribute = "${node.class}" value = "git-cloner" } group "clone" { task "myrepo" { driver = "raw_exec" artifact { source = "[email protected]:myorg/myrepo.git" destination = "local/myrepo" options { sshkey = "<'base64 -w 0 privkey' here>" } } config { command = "/bin/sleep" args = ["600"] } /* # use the following when using "git cmds yourself env { "GIT_SSH_COMMAND" = "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no" "GIT_TRACE" = "2" } */ resources { cpu = 500 memory = 256 network { mbits = 10 } } # resources } # task } # group }

Add Unity example to README?

Hey there, I just ported RVM to Unity using NatML, an open-source machine learning runtime. I have questions:

  1. Can I make a PR to add a link into the README to a Unity example project demonstrating using RVM?
  2. I published the model under my account on NatML Hub. Would you be interested in signing up on Hub, so that I can transfer the model to you?

Here's the model on NatML Hub:

@natsuite/robust-video-matting

手机移动端开销及其优化问题

rvm引入rec,处理视频流是个不错的方法,但在手机上部署,目前开销有点大,内存开销:MNN部署1080p s0.25的模型,需要380m内存, coreml也需要300m左右,推理速度也不太够,提出以下几个优化问题:
1.rec目前是r1/r2/r3/r4四层,这个可以减到两层或三层吗?
2.如果只处理低分辨率视频(或图片),是不是可以删除DGF?

Question on training scheme

It seems that the network doesn't use the previous hidden state in training phase:

with autocast(enabled=not self.args.disable_mixed_precision):
            pred_fgr, pred_pha = self.model_ddp(true_src, downsample_ratio=downsample_ratio)[:2]
            loss = matting_loss(pred_fgr, pred_pha, true_fgr, true_pha)

self.scaler.scale(loss['total']).backward()
self.scaler.step(self.optimizer)
self.scaler.update()
self.optimizer.zero_grad()

But it is fed into the network in the test phase:

src = src.to(device, dtype, non_blocking=True).unsqueeze(0) # [B, T, C, H, W]
fgr, pha, *rec = model(src, *rec, downsample_ratio)

Why does the network use different feedforward scheme in these two stage. Will it be better to take the hidden state as input during training stage?

using opencv to load the video?

Hi, and thank you for making this code available! I would like to integrate it into an existing workflow that uses opencv to load videos. Would you be able to provide any tips on how to pass cvMat frames into this code?

Thank you!

composite image has a pink edge around the human

Hi, I do a test to below image, I found there is a pink edge around the human on composite and fbr image, I noticed that the beauty in the image was standing in front of blue screen,so whether the pink edge related to this ?

this is the command I used to do the test:
python inference.py --variant resnet50 --checkpoint ../Pretrained/rvm_resnet50.pth --device cpu --input-source n:/AIImage/test --downsample-ratio 0.25 --output-composition ./output/comp --output-alpha ./output/alpha --output-foreground ./output/fore --output-type png_sequence

ONNX模型推理错误

尝试用下载的onnx模型去推理图片,但是得到的结果是错误的,显示的是物体的边缘,是输入或输出部分处理的不对吗?还是模型有问题呢?

def test_default():
    sess = ort.InferenceSession('rvm_mobilenetv3_fp32.onnx')
    # sess = ort.InferenceSession('rvm_mobilenetv3_1920_default.onnx')
    rec = [np.zeros([1, 1, 1, 1], dtype=np.float32) ] * 4  # 必须用模型一样的 dtype
    downsample_ratio = np.array([0.25], dtype=np.float32)  # 必须是 FP32

    src = cv2.imread("1.jpg")
    src = cv2.resize(src, (1920, 1080))
    # src 张量是 [B, C, H, W] 形状
    src = np.transpose(src, (2, 0, 1)).astype(np.float32)
    src = np.expand_dims(src, 0)
    print(src.shape)

    fgr, pha, *rec = sess.run([], {
        'src': src, 
        'r1i': rec[0], 
        'r2i': rec[1], 
        'r3i': rec[2], 
        'r4i': rec[3], 
        'downsample_ratio': downsample_ratio
    })

    pha = (pha * 255).astype(np.uint8)
    pha = np.squeeze(pha, 0)
    pha = np.transpose(pha, [1, 2, 0])

    fgr = (fgr * 255).astype(np.uint8)
    print(fgr.shape)
    fgr = np.squeeze(fgr, 0)
    fgr = np.transpose(fgr, [1, 2, 0])
    cv2.imshow("pha", pha)
    cv2.imshow("FGR", fgr)
    cv2.waitKey(0)

Not Issue 👉 Few questions

First of all thank you for working on this project! it looks much stronger than the BMV2 !

1. Will it work on Anaconda and Windows 10 just like BMV2 works? (not more complicated?)

2. Will it support same hardware, or need a much more powerful CPU / GPU compare to BMV2 ?

3. Can you please tell when will you release it again, I missed it first so I can't test it because it's still offline.
It will be very nice to have it this week if possible of course.

Thanks ahead for the answers, please keep up the good work! ❤

cpu环境下测试的问题

您好,cpu来进行测试的话能否可以达到实时呢?
分辨率的要求低一些,如1280x720,或者720x480。我能否通过调试或者优化cpu如openvino来使它在cpu上面实时运行呢?

Python to C++.

Python to C++.

    auto device = torch::Device("cuda");
    auto precision = torch::kFloat16;
    auto downsampleRatio = 0.4;
    c10::optional<torch::Tensor> tensorRec0;
    c10::optional<torch::Tensor> tensorRec1;
    c10::optional<torch::Tensor> tensorRec2;
    c10::optional<torch::Tensor> tensorRec3;

    auto model = torch::jit::load("rvm_mobilenetv3_fp16.torchscript");
    //! freeze error.
    //model  = torch::jit::freeze(model );
    model.to(device);

    //! imgSrc: RGB image data, such as QImage.
    auto tensorSrc = torch::from_blob(imgSrc.bits(), { imgSrc.height(),imgSrc.width(),3 }, torch::kByte);
    tensorSrc = tensorSrc.to(device);
    tensorSrc = tensorSrc.permute({ 2,0,1 }).contiguous();
    tensorSrc = tensorSrc.to(precision).div(255);
    tensorSrc.unsqueeze_(0);

    //! Inference
    auto outputs = model.forward({ tensorSrc,tensorRec0,tensorRec1,tensorRec2,tensorRec3,downsampleRatio }).toList();

    const auto &fgr = outputs.get(0).toTensor();
    const auto &pha = outputs.get(1).toTensor();
    tensorRec0 = outputs.get(2).toTensor();
    tensorRec1 = outputs.get(3).toTensor();
    tensorRec2 = outputs.get(4).toTensor();
    tensorRec3 = outputs.get(5).toTensor();

    //! Green target bgr
    auto tensorTargetBgr = torch::tensor({ 120.f / 255, 255.f / 255, 155.f / 255 }).toType(precision).to(device).view({ 1, 3, 1, 1 });
    //! Compound
    auto res_tensor = pha * fgr + (1 - pha) * tensorTargetBgr;

    res_tensor = res_tensor.mul(255).permute({ 0,2,3,1 })[0].to(torch::kU8).contiguous().cpu();

Originally posted by @BrightenWu in #20 (comment)
上面的代码单个图片可以处理,下面的代码,发现处理第一帧时可以正常,第二帧时奔溃了 ,请教一下各位,是什么问题?
while (vCap.read(frame))
{
cv::cvtColor(frame, srcframe, cv::COLOR_BGR2RGB);

    auto src = torch::from_blob(srcframe.data, { srcframe.rows,srcframe.cols,3 }, torch::kByte);
    src = src.to(device);
    src = src.permute({ 2,0,1 }).contiguous();
    src = src.to(precision).div(255);
    src.unsqueeze_(0);


    //auto outputs = model.forward({ src, tRec0,tRec1,tRec2,tRec3,downsampleRatio }).toTuple()->elements();
    auto outputs = model.forward({ src, tRec0,tRec1,tRec2,tRec3,downsampleRatio }).toList();
    
    const auto& fgr = outputs.get(0).toTensor();
    const auto& pha = outputs.get(1).toTensor();
  
    tRec0 = outputs.get(2).toTensor();
    tRec1 = outputs.get(3).toTensor();
    tRec2 = outputs.get(4).toTensor();
    tRec3 = outputs.get(5).toTensor();

     auto com =  pha *fgr +  newbgr*(1 - pha);
   
    cv::Mat resultImg = torchTensortoCVMat(com);

    cv::cvtColor(resultImg, resultImg, COLOR_RGB2BGR);
    
    cv::imshow("demo", resultImg);
    if (waitKey(1) >= 0)
        break;
}

VideoMatte240K-HD

if I'm going to train stage3 and stage4, the VideoMatte-HD data will be used. And is it right to modify the following path?VideoMatte240K_JPEG_SD to VideoMatte240K_JPEG_HD

'videomatte': {
'train': '../matting-data/VideoMatte240K_JPEG_SD/train',
'valid': '../matting-data/VideoMatte240K_JPEG_SD/valid',
},

Data composition and augmentation

The part "data augmentation and composition" has been added into the part "Training" ? I wonder if I should do data augmentation independent of your code?

How to run it live?

Hello,
Thank you for the amazing work!
I am just wondering how we can make this perform live just like how the online webcam demo works.
I would love to test out 4K/HD live inputs on different GPUs.
Thank you

请问训练时,模型输出的3通道的fgr的groud truth是什么?

感谢大神分享,请教一个很菜的问题,模型输出的3通道的fgr的groud truth是什么呢?

我下载数据集VideoMatte240K中的 JPEG SD Format (6GB),查看fgr目录中的图片,抽样目测其中大部分背景是黑色的,还有少部分是白色的?

如果我有数据集用于训练,只包含原始图像和pha(matte), 那怎么处理得到这个fgr的groud truth呢?

另外, VideoMatte240K数据集只有fgr目录与pha目录;训练时,模型输入的true_src图像,是根据pha与另外的背景图像合成的吗?

还请教一个问题, 模型输出*rec (r1, r2, r3, r4),训练时是不用监督的吧?即不用groud truth监督训练的吧? 谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.