suhwan-cho / tmo Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 3.0 13 KB

[WACV 2023] Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation

License: MIT License

Python 100.00%

tmo's People

Contributors

Stargazers

Watchers

Forkers

cv-seg yonsei-mvplab jodyngo

tmo's Issues

.

Looks like some files are missing

Hi,

I'm glad you updated the repository again, but some files seem to be missing. I want to use mit-b1 as the backbone, just like the one in tmo.py, but I can't find the Segformer model, is it convenient for you to provide a copy?

How to train on my own dataset?

Hi,

Your work is really great. But how to train on my own dataset? My dataset likes DUTS， which masks are binary. I noticed that it is the validation set because it is only validated on DAVIS. Can you tell me where I can modify the color palette for semantic segmentation?

.

Questions about epoch

Hello, I'm sorry to bother you again.

In the project, max_epoch is set to 4000. Although the setting is large, the training and convergence are fast. But according to my past experience, isn't the epoch usually set between 100~300? I have set the epoch to 200 on other models, but the training is slower than yours, and they achieve a similar degree of accuracy.

So, I would like to ask what is the epoch here. Is it Iterations or some other parameter? I feel very confused. Could you give me some help?

.

some questions about output selection

Hello,

I'm coming back to this. I have two questions about output selection:

I trained as well as tested on the new TMO without using output selection and the metrics show up the same as the previous version of TMO, but the visualized binary map has significant edge jaggedness. I'm curious as to why the visualization results differ, but I'm calculating the same metrics. The presence of edge jaggedness should be lower for miou. The new TMO predicted binary plot is below. (Is it because of parameter B? Is the output soft score?)
On my own dataset, performance drops significantly using output selection. I looked at the code and the purpose of output selection seems to be to compare the percentage of more defined pixels in the saliency map (non-blurred areas). Is this somewhat inappropriate for confidence calculations? For example, structural similarity is not taken into account. The parameter B, here, is meant to be binary, so should the output selection be done when calculating the final score?

Looking forward to your reply~

Generating single channel output

Hi, is it possible to generate single channel output by TMO? as TMO currently uses standard cross entropy loss, you are generating 2 channel output, can we just change the last layer of TMO to generate single channel output? if so, then do we need to change the code of IOU after loss and also the code for J and F values inside mode=val ?

.

Question about TMO and TMO++

Hi, sorry I got a bit confused while using your updated repo. Before there was only code for TMO and there was no any option for choosing encoder or output selection, so it means when we were downloading the code we had TMO's code.

But now we have many options， that is why I have few questions:

If I download the code directly and train it without making any changes in the options, is it TMO++ using mitb1 encoder? I mean with the following default settings:

set device

torch.cuda.set_device(0)

# define model
ver = 'mitb1'
aos = True
model = TMO(ver, aos).eval()

# training stage
if options.train:
    model = torch.nn.DataParallel(model)
    train_duts_davis(model, ver)

In the above default settings if I only replace the 'mitb1' with 'rn101' and assume it is TMO++ with Resnet 101 encoder, am I right?

set device

torch.cuda.set_device(0)

# define model
ver = 'rn101'
aos = True
model = TMO(ver, aos).eval()

# training stage
if options.train:
    model = torch.nn.DataParallel(model)
    train_duts_davis(model, ver)

In this case what if I want to use TMO? if I only set the aos = False, the code will be TMO's code? For example like this:

TMO with mitb1 encoder:

# set device
torch.cuda.set_device(1)

# define model
ver = 'mitb1'
aos = False
model = TMO(ver, aos).eval()

# training stage
if options.train:
   # model = torch.nn.DataParallel(model)
    train_duts_davis(model, ver)

TMO with Resnet 101 encoder:

# set device
torch.cuda.set_device(1)

# define model
ver = 'rn101'
aos = False
model = TMO(ver, aos).eval()

# training stage
if options.train:
   # model = torch.nn.DataParallel(model)
    train_duts_davis(model, ver)

There is a problem in model training

Hi,

my dataset is binary segment data and has been modified in util.py. It can be trained, and the loss shows that it is going down. But the output iou has not been quite normal. I have two datasets; one has stabilized at 1 in less than 20 rounds of iou, and the other iou has been 0. Has the initialization of the model created any problems?

.

flow maps

您好,我想问一下，您每个数据集的光流生成是否有具体的脚本文件？

Why TMO performs better than TMO++?

Hi, sorry for disturbing you so much. It is because I really liked your approach. However I have few questions:

Is there any difference between TMO and TMO++ in training stage? as much as I checked the papers and looked at the code, I see no difference between them during training stage, both of them use RGB images and Optical Flows as input of motion encoder randomly.
The major difference I saw between them is the output selection algorithm which is not affecting the training process. am I right?
I implemented TMO(Using rn101 encoder) and TMO++(Using rn101 encoder) on some ultrasound images and videos, I used some ultrasound images instead of Duts and some ultrasound videos instead of DAVIS 2016, but TMO performs better than TMO++(While the data set is same for both of them) , for example TMO gets 66.4 in terms of mean of J and F, but TMO++ gets 62.3, the difference is very high and does not seem very reasonable, this made me confused, it is only reasonable if there is a difference in training stage of TMO and TMO++, Would you please help me to understand it?

.

Inference time

Thanks for your interesting work. I want to query about the inference time. I run your codes on my 2080ti, and the environment is the same. But the inference time is just about 20 FPS, which is the result of your print code. The FPS is quite lower than the 43.2 FPS claimed in your paper. Could you provide more details about inference or give some insight on the inference time difference?

suhwan-cho / tmo Goto Github PK

tmo's People

Contributors

Stargazers

Watchers

Forkers

tmo's Issues

set device

set device

Recommend Projects

Recommend Topics

Recommend Org