Giter VIP home page Giter VIP logo

frtm-vos's Introduction

FRTM-VOS

This repository contains an implementation of the video object segmentation method FRTM. A detailed description of the method is found in the CVPR 2020 paper "Learning Fast and Robust Target Models for Video Object Segmentation"

CVF: [paper] [supplement]
Arxiv: [paper]

If you find the code useful, please cite using:

@InProceedings{Robinson_2020_CVPR,
    author = {Robinson, Andreas and Lawin, Felix Jaremo and Danelljan, Martin and Khan, Fahad Shahbaz and Felsberg, Michael},
    title = {Learning Fast and Robust Target Models for Video Object Segmentation},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2020}

}

Installation

Clone the repository: git clone https://github.com/andr345/frtm-vos.git

Create a conda environment and install the following dependencies:

sudo apt install ninja-build  # For Debian/Ubuntu
conda install -y cython pip scipy scikit-image tqdm
conda install -y pytorch torchvision cudatoolkit=10.1 -c pytorch
pip install opencv-python easydict

PyTorch 1.0.1 is slightly faster. If you wish to try to this version, replace the conda install pytorch above with the following:

conda install pytorch==1.0.1 torchvision==0.2.2 -c pytorch
pip install "pillow<7"

Datasets

DAVIS

To test the DAVIS validation split, download and unzip the 2017 480p trainval images and annotations here: https://davischallenge.org/davis2017/code.html.

Or, more precisely, this file.

YouTubeVOS

To test our validation split and the YouTubeVOS challenge 'valid' split, download YouTubeVOS 2018 and place it in this directory structure:

/path/to/ytvos2018
|-- train/
|-- train_all_frames/
|-- valid/
`-- valid_all_frames/

You only actually need 300 sequences of train/ and train_all_frames/ and these are listed in lib/ytvos_jjvalid.txt. Thanks to Joakim Johnander for providing this split.

Models

These pretrained models are available for download:

Name Backbone Training set Weights
rn18_ytvos.pth ResNet18 YouTubeVOS download
rn18_all.pth ResNet18 YouTubeVOS + DAVIS download
rn101_ytvos.pth ResNet101 YouTubeVOS download
rn101_all.pth ResNet101 YouTubeVOS + DAVIS download
rn101_dv.pth ResNet101 DAVIS download

The script weights/download_weights.sh will download all models and put them in the folder weights/.

Running evaluations

Open evaluate.py and adjust the paths dict to your dataset locations and where you want the output. The dictionary is found near line 110, and looks approximately like this:

    paths = dict(
        models=Path(__file__).parent / "weights",  # The .pth files should be here
        davis="/path/to/DAVIS",  # DAVIS dataset root
        yt2018="/path/to/ytvos2018",  # YouTubeVOS 2018 root
        output="/path/to/results",  # Output path
    )

Then try one of the evaluations below. The first run will pause for a few seconds while compiling a PyTorch C++ extension.

Scripts for generating the results in the paper:

python evaluate.py --model rn101_ytvos.pth --dset yt2018val       # Ours YouTubeVos 2018
python evaluate.py --model rn101_all.pth --dset dv2016val         # Ours DAVIS 2016
python evaluate.py --model rn101_all.pth --dset dv2017val         # Ours DAVIS 2017

python evaluate.py --model rn18_ytvos.pth --fast --dset yt2018val # Ours fast YouTubeVos 2018
python evaluate.py --model rn18_all.pth --fast --dset dv2016val   # Ours fast DAVIS 2016
python evaluate.py --model rn18_all.pth --fast --dset dv2017val   # Ours fast DAVIS 2017

--model is the name of the checkpoint to use in the weights directory.

--fast reduces the number of optimizer iterations to match "Ours fast" in the paper.

--dset is one of

Name Description
dv2016val DAVIS 2016 validation set
dv2017val DAVIS 2017 validation set
yt2018jjval Our validation split of YouTubeVOS 2018 "train_all_frames"
yt2018val YouTubeVOS 2018 official "valid_all_frames" set

Training

Running the trainer

Training is set up similarly to evaluation.

Open train.py and adjust the paths dict to your dataset locations, checkpoint and tensorboard output directories and the place to cache target model weights.

To train a network, run

python train.py <session-name> --ftext resnet101 --dset all --dev cuda:0

--ftext is the name of the feature extractor, either resnet18 or resnet101.

--dset is one of dv2017, ytvos2018 or all ("all" really means "both").

--dev is the name of the device to train on.

Replace "session-name" with whatever you like. Subdirectories with this name will be created under your checkpoint and tensorboard paths.

Target model cache

Training target models from scratch and filling the cache take approximately 5 days of training. Once the cache is mostly full, the next training session should take less than 24 hours. The cache requires 17 GB disk space for training with ResNet-101 features and 32 intermediate channels (as in the paper) and 5 GB for ResNet-18 and the same number of channels.

Our own cache (20 GB) is available here. The link is not permanent and will change eventually, so make sure to check this readme in the GitHub repository if you find that it has expired.

Contact

Andreas Robinson

email: [email protected]

Felix Järemo Lawin

email: [email protected]

frtm-vos's People

Contributors

andr345 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

frtm-vos's Issues

resnet18 training

Hi,

I was trying to train your model with a rs18 as backbone: nonetheless I got the following error:

Traceback (most recent call last):
  File "train.py", line 133, in <module>
    trainer.train()
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/lib/training.py", line 130, in train
    stats = self.model(*batch)
  File "/home/gjeanneret/anaconda3/envs/frtm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/model/training_model.py", line 95, in forward
    cache_hits = self._initialize(images[0], labels[0], specs)
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/model/training_model.py", line 137, in _initialize
    self.tmodels[i].initialize(ft, lb)
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/model/training_model.py", line 20, in initialize
    self.discriminator.init(ft[self.discriminator.layer], mask)
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/model/discriminator.py", line 175, in init
    optimizer.run(self.init_iters)
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/model/optimizer.py", line 70, in run
    self.run_GN_iter(cg_iter)
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/model/optimizer.py", line 81, in run_GN_iter
    self.f0 = self.problem(self.x)
  File "/media/user_home1/gjeanneret/SOFTWARE/frtm-vos/model/discriminator.py", line 47, in __call__
    s = self.net(self.x)
  File "/home/gjeanneret/anaconda3/envs/frtm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gjeanneret/anaconda3/envs/frtm/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/gjeanneret/anaconda3/envs/frtm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/gjeanneret/anaconda3/envs/frtm/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 353, in forward
    return self._conv_forward(input, self.weight)
  File "/home/gjeanneret/anaconda3/envs/frtm/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 349, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Given groups=1, weight of size [32, 1024, 1, 1], expected input[15, 256, 30, 54] to have 1024 channels, but got 256 channels instead

I fixed it with ease just by adding an if condition on line

layer="layer4", in_channels=1024, c_channels=32, out_channels=1,

by adding:
layer="layer4", in_channels=256 if '18' in feature_extractor else 1024, c_channels=32, out_channels=1

Multi GPU's problem

Hello!
Thanks for your wonderful code.
I just learned about this field, and i want to run this code in multi GPU.I modify --dev parameter to cuda:0,1 , but it has no effect. Should I use nn.DataParallel() somewhere in the code or take some other ways?

How to add conv3_x features in the target model (discriminator)?

Sorry to disturb you again. I observed the output of your code and found that the segmentation of fine objects and small objects is relatively poor. I wonder if we can add conv3_x or conv2_X features to the discriminator, and also output a scores, and the resnet layer Features are spliced in TSE to improve the segmentation effect of small targets.
I tried it, but I really don't know how to modify it. If you can provide some ideas, which parts need to be modified, and some important details, I will be very grateful! !

No module named 'nppig_cpp'

Hi, when i run the code, ImportError: No module named 'nppig_cpp'

Compiling npp extension
Traceback (most recent call last):
  File "evaluate.py", line 18, in <module>
    from lib.datasets import DAVISDataset, YouTubeVOSDataset
  File "/home/rtm/zc/frtm-vos/lib/datasets.py", line 6, in <module>
    from lib.image import imread
  File "/home/rtm/zc/frtm-vos/lib/image.py", line 6, in <module>
    from ._npp import nppig_cpp
  File "/home/rtm/zc/frtm-vos/lib/_npp/__init__.py", line 16, in <module>
    with_cuda=True, build_directory=_build_dir)
  File "/home/rtm/Envs/py36env/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load
    is_python_module)
  File "/home/rtm/Envs/py36env/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 824, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/rtm/Envs/py36env/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 967, in _import_module_from_library
    file, path, description = imp.find_module(module_name, [path])
  File "/home/rtm/Envs/py36env/lib/python3.6/imp.py", line 297, in find_module
    raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'nppig_cpp'

my environment:
torch1.1.0
py3.6
Do you have any suggestions?

speed of model

Hello, my test model is very slow on Nvidia RTX3090, but still relatively fast on 1080Ti, do you know why?

code

excuse me ,i want to know when will you show your code,sorry to disturb you

A problem about discriminator.py and seg_network.py

在discriminator.py中,
class Discriminator(nn.Module):
...
def apply(self, ft):

    self.frame_num += 1
    cft = self.project(ft)
    self.current_sample = cft
    scores = self.filter(cft)
    return scores

与seg_network.py中,
class SegNetwork(nn.Module):
...
def forward(self, scores, features, image_size):

    num_targets = scores.shape[0]
    num_fmaps = features[next(iter(self.ft_channels))].shape[0]
    if num_targets > num_fmaps:
        multi_targets = True
    else:
        multi_targets = False

    x = None
    for i, L in enumerate(self.ft_channels):
        ft = features[L]
        s = interpolate(scores, ft.shape[-2:])  # Resample scores to match features size

        if multi_targets:
            h, hpool = self.TSE[L](ft.repeat(num_targets, 1, 1, 1), s, x)
        else:
            h, hpool = self.TSE[L](ft, s, x)

        h = self.RRB1[L](h)
        h = self.CAB[L](hpool, h)
        x = self.RRB2[L](h)

    x = self.project(x, image_size)
    return x

Hello, I print cft and scores, and found that for the evaluate process of each frame, the sequence has several targets, and it will print cft and scores the same times. The size of scores is [1,1,m,n], and for different target, the parameters are not the same. I want to know why this is and where is it set up?
Another question: In seg_network.py, I found that num_targets and num_fmaps are always 1, and there are several num_targets in the target, and output 1 the same times, so multi_targets is always False, but the result of segmentation is multi-target. Why? ? I am really confused and need your answer.
Looking forward to your reply!

Questions about the Target Model

Although this question may not be suitable for mentioning here, I have been confused for a long time and hope to get a little hint.
The first question is: why the 'target model' chooses l2 loss instead of other common loss for segmentation(dice, cross-entropy, etc.). And the second question is: why only two convolutional layers without non-linear activation layers can achieve such a brilliant effect. (I noticed that the name for this layer is 'project', the reason behind 'projection layer' also bothers me.)

evaluate problem on DAVIS2017

Hello, When I run code:"python evaluate.py --model rn18_all.pth --fast --dset dv2017val", it always appears
"Computing J-scores
1/30: bike-packing: 2 objects
Traceback (most recent call last):
File "evaluate.py", line 164, in
evaluate_dataset(dset, out_path, measure='J')
File "/new/personal/limoran/frtm-vos/lib/evaluation.py", line 66, in evaluate_dataset
_print("joint {obj}: acc {score:.3f} \u250a{apf}\u250a".format(obj=obj_id, score=s, apf=text_bargraph(score)))
File "/new/personal/limoran/frtm-vos/lib/evaluation.py", line 20, in _print
print(msg)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-89: ordinal not in range(128)"
I don't know why and how to modify the code, could you help me ?

RuntimeError: Error building extension 'nppig_cpp'

RuntimeError: Error building extension 'nppig_cpp': b'[1/2] c++ -MMD -MF nppig.o.d -DTORCH_EXTENSION_NAME=nppig_cpp -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include/TH -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/hp/anaconda3/envs/open-mmlab/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /media/hp/01a64147-0526-48e6-803a-383ca12a7cad/WH/wh2020/frtm-vos-master/lib/_npp/nppig.cpp -o nppig.o\nFAILED: nppig.o \nc++ -MMD -MF nppig.o.d -DTORCH_EXTENSION_NAME=nppig_cpp -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include/TH -isystem /home/hp/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/hp/anaconda3/envs/open-mmlab/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /media/hp/01a64147-0526-48e6-803a-383ca12a7cad/WH/wh2020/frtm-vos-master/lib/_npp/nppig.cpp -o nppig.o\n/media/hp/01a64147-0526-48e6-803a-383ca12a7cad/WH/wh2020/frtm-vos-master/lib/_npp/nppig.cpp:9:37: fatal error: ATen/cuda/CUDAGuard.h: \xe6\xb2\xa1\xe6\x9c\x89\xe9\x82\xa3\xe4\xb8\xaa\xe6\x96\x87\xe4\xbb\xb6\xe6\x88\x96\xe7\x9b\xae\xe5\xbd\x95\ncompilation terminated.\nninja: build stopped: subcommand failed.\n'

How to understand the discriminative model in the paper

Hi, thanks for sharing!
The discriminative method used in the paper is called the target model, and this corresponds to the "Discriminator" class in the code. The "Discriminator" is often used in the generative adversarial networks. So how should I understand it and whether it is related to the generative adversarial networks?But from the perspective of the loss function and its model, I did not find that it is related to GAN, so why is it called "Discriminator". Looking forward to your response, thank you!

How to use 'Target model cache'

Thank you for your wonderful code. I am a green hand, and I want to know:

  1. I can't find the place where we use the path of 'tmcache' in training.py

  2. How can we use this cache(20GB) in training.

Best wishes! ^_^

when run train.py, 'TargetObject' object has no attribute 'get_state_dict'

Hi, I can run evaluate.py, but run train.py faild.

(p36) rtm@rtm:~/zc/f/frtm-vos$ python train.py dv17_res101 --ftext resnet101 --dset dv2017 --dev cuda:0
Compiling npp extension
done
Traceback (most recent call last):
  File "train.py", line 133, in <module>
    trainer.train()
  File "/home/rtm/zc/f/frtm-vos/lib/training.py", line 131, in train
    stats = self.model(*batch)
  File "/home/rtm/Envs/p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rtm/zc/f/frtm-vos/model/training_model.py", line 92, in forward
    cache_hits = self._initialize(images[0], labels[0], specs)
  File "/home/rtm/zc/f/frtm-vos/model/training_model.py", line 137, in _initialize
    self.save_target_model(specs[i], L, self.tmodels[i].get_state_dict())
AttributeError: 'TargetObject' object has no attribute 'get_state_dict'

my env:
pytohn3.6
torch1.0.1
Do you have any suggestions?
Thanks!

Can't reach correct performance on the Youtube VOS2018

Hi, thanks for sharing.
When I use the pretrained rn101-ytvos model to evaluate on the Youtube VOS2018 dataset with nothing else changed, I only get 0.695 which far from 0.721. What should I do to get the correct value?

Looking forward to your response, thank you!

How to solve the problem of 'Permission denied: '/outputs'' in evaluate.py?

code ///////////////////////////////////
paths = dict(
models=Path(file).parent / "checkpoints/try", # The .pth files should be here
davis="~/disk/MATNet/data/DAVIS2017", # DAVIS dataset root
yt2018="/path/to/ytvos2018", # YouTubeVOS 2018 root
output="/outputs", # Output path
)
traceback////////////////////////////
Traceback (most recent call last):
File "/home/fg/disk/frtm-vos/evaluate.py", line 152, in
os.makedirs(out_path)
File "/home/fg/anaconda3/lib/python3.8/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/home/fg/anaconda3/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/outputs'

Got stuck on beginning.

Thanks for your nice work . I got stuck on here .
image
It take long time but no response.
Here are my envs :
pytorch=1.0.1
cudatoolkit=10.0.130
torchvision=0.2.2
I would appriciate it if you give me some advice.

About the learning objective for target model

Hi, tanks for your excellent work.
During reading your paper and codes, I have a question:
A L2 loss is adopted in Equation (2) as in your paper.
But in discriminator.py, the residual was computed by residuals = self.w * (s - self.y), which dosen't seem to be a L2 loss.
Could you explain why the residual is not computed as in your paper? Or, are they equivalent to each other?
Thanks a lot.

ImportError: libnppc.so.9.2: cannot open shared object file: No such file or directory

when i run evaluate.py, it appears this mistake, can you provide some suggestions?

Compiling npp extension
Traceback (most recent call last):
File "/data2/jaffeProj/frtm/evaluate.py", line 18, in
from lib.datasets import DAVISDataset, YouTubeVOSDataset
File "/data2/jaffeProj/frtm/lib/datasets.py", line 6, in
from lib.image import imread
File "/data2/jaffeProj/frtm/lib/image.py", line 6, in
from ._npp import nppig_cpp
File "/data2/jaffeProj/frtm/lib/_npp/init.py", line 17, in
with_cuda=True, build_directory='/home/jaffe/tmp') # _build_dir
File "/home/jaffe/miniconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 680, in load
is_python_module)
File "/home/jaffe/miniconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/jaffe/miniconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/home/jaffe/miniconda3/envs/pytracking/lib/python3.7/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/jaffe/miniconda3/envs/pytracking/lib/python3.7/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libnppc.so.9.2: cannot open shared object file: No such file or directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.