Comments (11)
@kaylode It's in the appendix -- F.3.2.
I agree with everything else. Thank you for explaining.
from stcn.
In inference time, we independently process each object as a binary mask and merge them at the end. One exception is that the sum of all "other" object masks would also be fed as a separate channel input to suppress the response in other object areas. This implementation detail follows from STM, and I think it generally helps.
In training time, we pick at most two random objects for a single video snippet. This allows that extra channel to be learned. While technically it can be extended to any number of objects in training time, unfortunately, the current code is written in a way that this extension is not trivial (and I take the blame for that).
STM uses three objects in training time, while we use two. This is mainly for computational and memory constraints. I do think the benefit of going from 1->2 objects in training is much larger than that of going from 2->3, i.e., with diminishing returns.
from stcn.
Thank you very much, I can understand more clearly now.
from stcn.
Hello @hkchengrex , first of all thanks for your dedicated projects, it really inspires me a lot. This is not a bug report, but a small question.
I have been running and modifying most of the your code base to adapt for my problem and it works acceptably. However, there is a part of the code that is still unclear to me:
As I understand (from the code and the paper), STCN trains as a binary segmentation task , but I also notice that in the code when training on VOS you use a second additional class to train as well (by randomly choice from list of classes, if I understand correctly). May I ask what is the meaning of this? I cannot find where it is mentioned in the paper. Is it possible to use more classes ? I have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
Thanks in advance
I have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
@kaylode Hi, I'm particularly interested in your question. What's the difference between training on masks with 2 objects and training with a single mask for each object?
from stcn.
In inference time, we independently process each object as a binary mask and merge them at the end. One exception is that the sum of all "other" object masks would also be fed as a separate channel input to suppress the response in other object areas. This implementation detail follows from STM, and I think it generally helps.
In training time, we pick at most two random objects for a single video snippet. This allows that extra channel to be learned. While technically it can be extended to any number of objects in training time, unfortunately, the current code is written in a way that this extension is not trivial (and I take the blame for that).
STM uses three objects in training time, while we use two. This is mainly for computational and memory constraints. I do think the benefit of going from 1->2 objects in training is much larger than that of going from 2->3, i.e., with diminishing returns.
In inference time, we independently process each object as a binary mask and merge them at the end. One exception is that the sum of all "other" object masks would also be fed as a separate channel input to suppress the response in other object areas. This implementation detail follows from STM, and I think it generally helps.
In training time, we pick at most two random objects for a single video snippet. This allows that extra channel to be learned. While technically it can be extended to any number of objects in training time, unfortunately, the current code is written in a way that this extension is not trivial (and I take the blame for that).
STM uses three objects in training time, while we use two. This is mainly for computational and memory constraints. I do think the benefit of going from 1->2 objects in training is much larger than that of going from 2->3, i.e., with diminishing returns.
@hkchengrex Hi, what does it mean going from 1->2 objects in training ? Do you mean that if there is enough memory, training stcn with three targets will perform better?
from stcn.
Looking forward to your reply!
from stcn.
- It means training with two instead of one objects. We currently train with two.
- Not sure about that. The gain, if any, would not be huge I guess.
from stcn.
- It means training with two instead of one objects. We currently train with two.
- Not sure about that. The gain, if any, would not be huge I guess.
@hkchengrex You mean it's better to use a target when training?
from stcn.
Hello @hkchengrex , first of all thanks for your dedicated projects, it really inspires me a lot. This is not a bug report, but a small question.
I have been running and modifying most of the your code base to adapt for my problem and it works acceptably. However, there is a part of the code that is still unclear to me:
As I understand (from the code and the paper), STCN trains as a binary segmentation task , but I also notice that in the code when training on VOS you use a second additional class to train as well (by randomly choice from list of classes, if I understand correctly). May I ask what is the meaning of this? I cannot find where it is mentioned in the paper. Is it possible to use more classes ? I have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
Thanks in advanceI have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
@kaylode Hi, I'm particularly interested in your question. What's the difference between training on masks with 2 objects and training with a single mask for each object? @kaylode
from stcn.
Hello @hkchengrex , first of all thanks for your dedicated projects, it really inspires me a lot. This is not a bug report, but a small question.
I have been running and modifying most of the your code base to adapt for my problem and it works acceptably. However, there is a part of the code that is still unclear to me:
As I understand (from the code and the paper), STCN trains as a binary segmentation task , but I also notice that in the code when training on VOS you use a second additional class to train as well (by randomly choice from list of classes, if I understand correctly). May I ask what is the meaning of this? I cannot find where it is mentioned in the paper. Is it possible to use more classes ? I have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
Thanks in advanceI have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
@kaylode Hi, I'm particularly interested in your question. What's the difference between training on masks with 2 objects and training with a single mask for each object? @kaylode
@longmalongma
Well , to my understanding, the author firstly pretrain the model on the static images, which consists of one label per mask. After that, they continue training on their multiclass synthesized BL dataset and VOS dataset (stage 1 and 2 in the source code), this time they randomly sample only TWO classes that appear in the masks for training. Although this wasn't mentioned in the paper as I recall.
What I did was try splitting my multiclass masks into binary masks (single object with background) and let the model learn. But it just got worse than sampling TWO objects.
In my opinion, using two classes at a time helps enhance the model's ability to distinguish between different ones, it is like introducing some constraints for the model to capture.
from stcn.
Hello @hkchengrex , first of all thanks for your dedicated projects, it really inspires me a lot. This is not a bug report, but a small question.
I have been running and modifying most of the your code base to adapt for my problem and it works acceptably. However, there is a part of the code that is still unclear to me:
As I understand (from the code and the paper), STCN trains as a binary segmentation task , but I also notice that in the code when training on VOS you use a second additional class to train as well (by randomly choice from list of classes, if I understand correctly). May I ask what is the meaning of this? I cannot find where it is mentioned in the paper. Is it possible to use more classes ? I have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
Thanks in advanceI have tried training on masks with 2 objects on a different dataset, and it seems that the result is far better than when training with a single mask for each object.
@kaylode Hi, I'm particularly interested in your question. What's the difference between training on masks with 2 objects and training with a single mask for each object? @kaylode
@longmalongma Well , to my understanding, the author firstly pretrain the model on the static images, which consists of one label per mask. After that, they continue training on their multiclass synthesized BL dataset and VOS dataset (stage 1 and 2 in the source code), this time they randomly sample only TWO classes that appear in the masks for training. Although this wasn't mentioned in the paper as I recall.
What I did was try splitting my multiclass masks into binary masks (single object with background) and let the model learn. But it just got worse than sampling TWO objects.
In my opinion, using two classes at a time helps enhance the model's ability to distinguish between different ones, it is like introducing some constraints for the model to capture.
@kaylode Ok, thank you very much!
from stcn.
Related Issues (20)
- how to replace resnet 50 with swin transformer in backbone of STCN? HOT 2
- how to get the figure 6? HOT 6
- I would like to ask a question about Youtube-vos2019 evaluation HOT 3
- How can i get the Jaccard and F-Score HOT 1
- if I use --flip, must use the result without --flip? HOT 1
- about V HOT 4
- The same issue: good IOU in training but very bad results in testing. HOT 3
- @zhouweii234 Hi. Do you find the reason for the bad results? I meet the same issue with you. HOT 1
- RuntimeError: DataLoader worker (pid 46794) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit. HOT 1
- 四gpu训练 HOT 1
- How to check DAVIS, YouTube performence?? HOT 1
- How to check dot product performance? HOT 1
- 启动代码报错
- total_loss HOT 3
- git.Repo() has an error HOT 4
- STCN 代码中的一些问题 HOT 4
- 只在视频数据集上训练的结果 HOT 2
- questions regard training the model HOT 2
- youtube 2019上的测试结果 HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stcn.