chinayi / asformer Goto Github PK

View Code? Open in Web Editor NEW

90.0 90.0 19.0 24 KB

Official repo for BMVC2021 paper ASFormer: Transformer for action segmentation

License: MIT License

Python 100.00%

asformer's People

Contributors

Stargazers

Watchers

Forkers

jaakik marahgamdou edybk mfkiwl jazib-sudo miccooper9 habakan samedarslan90 linleon1995 talifargan lls-trans sigmalethe polinli ux404 vkgo masatate naive0409 cutemld gkaviani

asformer's Issues

about the randomness of code

Hi, Thank you for your code

ASFormer/model.py

Line 370 in 3940443

if (epoch + 1) % 10 == 0 and batch_gen_tst is not None:

When I change the test interval from 10 to 20, 30, etc., different results(such as training loss ) are obtained under the same seed. What do you think is the reason？
best regards

Error in evaluation code

Hi,

Thanks for sharing the code. I noticed the bg_class in the evaluation code is not properly set.

The default name of background class is set to background, which is true in GTEA yet need to be changed to SIL for breakfast and action_start and action_end for 50salads. It seems they are not changed for the results in the paper.

With the correct class name and the released model, I obtained a lower result

	[email protected]	[email protected]	[email protected]
Breakfast	70.9	67.5	56.7
50salads	83.7	81.8	73.7

Long training time

Hello,

I am adapting your code for my own dataset which usually train relatively fast when using only ASRF, but when using your model with the transformer it's taking approximately 10x times longer. Do you have a similar behaviour with Salad/breakfast/gtea datasets ?

Thank you :)

flops，GPU mem code

作者您好：感谢您做出的系列贡献，我想请教的是您论文部分Table9计算prams flops和GPU mem这里的代码可否方便分享一下，不知如何求得flops大小，谢谢！

Issue while trying to run the pretained models.

I tried to run the pretrained models but i keep getting the following error:

**(myenv) E:\ASN\ASFormer>python main.py --action=predict --dataset=50salads --split=1
Model Size: 1134476
Traceback (most recent call last):
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 189, in nti
n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'ld_tenso'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 2297, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 1093, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 1035, in frombuf
chksum = nti(buf[148:156])
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 191, in nti
raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\talks\miniconda3\envs\myenv\lib\site-packages\torch\serialization.py", line 556, in _load
return legacy_load(f)
File "C:\Users\talks\miniconda3\envs\myenv\lib\site-packages\torch\serialization.py", line 467, in legacy_load
with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 1589, in open
return func(name, filemode, fileobj, **kwargs)
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 1619, in taropen
return cls(name, mode, fileobj, **kwargs)
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 1482, in init
self.firstmember = self.next()
File "C:\Users\talks\miniconda3\envs\myenv\lib\tarfile.py", line 2309, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 97, in
trainer.predict(model_dir, results_dir, features_path, batch_gen_tst, num_epochs, actions_dict, sample_rate)
File "E:\ASN\ASFormer\model.py", line 399, in predict
self.model.load_state_dict(torch.load(model_dir + "/epoch-" + str(epoch) + ".model"))
File "C:\Users\talks\miniconda3\envs\myenv\lib\site-packages\torch\serialization.py", line 387, in load
return _load(f, map_location, pickle_module, pickle_load_args)
File "C:\Users\talks\miniconda3\envs\myenv\lib\site-packages\torch\serialization.py", line 560, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: ./models/50salads/split_1/epoch-120.model is a zip archive (did you mean to use torch.jit.load()?)

I tried changing the torch.load to torch.jit.load but i get another error saying that pytorch version is old to run this. I am using Python 3.6.10, PyTorch 1.1.0, torchvision 0.3.0 and i am for now just trying to run on CPU not GPU. Kindly, need your assistance related this matter. Thank you.

results on salads50 does not match table 5

Hi thanks for your work.
I was able to train and test the model and achieve similar performance as mentioned in the paper when I use both enc and dec. However, when I don't use the decoder, the results are much worse than what is mentioned in the table 5 (first row).
I was wondering if I need to do any changes to the setting to get the same performance (specially for Acc)?
I notice that without using the decoder the acc drops lower than 80.

Batch size constraint

Hello,

Thank you for your amazing work !

I was wondering if there is any particular reason for imposing a batch size of 1 in model.py:

ASFormer/model.py

Line 138 in 89e72d8

assert m_batchsize == 1 # currently, we only accept input with batch size 1

In my testing, ASFormer learns fine with bigger batch sizes.

How to understand stage images from result

Hello, thank you for sharing your amazing work!.

I have a question when analysing the results.

for images generated like below

What does each row means? and what does each stage 0, 1, 2, 3 means?

Also, as the methods uses each frame's action label to evaluate, you might compare the model with action recognition models too. Is there any specific reason for not comparing result for action recognition?

Thank you in advance!

Feature Extraction

Hi, can you provide more informations about the feature extraction? I would like to use this fantastic model on my dataset but I don't know how to extract the features to feed to the encoder.

Cannot download the model

Hi !
Firstly, thanks for sharing this repo ! I'm struggling to download the model (3. Download the pre-trained models at (https://pan.baidu.com/s/1zf-d-7eYqK-IxroBKTxDfg)) Indeed, the site says that you need to create an account to download the file. The thing is I cannot create an account with a french phone number 😅 Any other way to download the pretrained model ?
Many thanks !

Enviroment issues

I installed the environment as you asked: Pytorch == 1.1.0, torchvision == 0.3.0, python == 3.6, CUDA=10.1

It is certain that the model is loaded because the model size is printed:
Model Size: 1130860

But the problem is:
Traceback (most recent call last):
File "main.py", line 99, in
trainer.predict(model_dir, results_dir, features_path, batch_gen_tst, num_epochs, actions_dict, sample_rate)
File "/home/cpslabrtx3090/zjb/projects/ASFormer/model.py", line 399, in predict
self.model.load_state_dict(torch.load(model_dir + "/epoch-" + str(epoch) + ".model"))
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 560, in _load
raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: ./models/gtea/split_1/epoch-120.model is a zip archive (did you mean to use torch.jit.load()?)

Traceback (most recent call last):
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 189, in nti
n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'ld_tenso'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 2299, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1093, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1035, in frombuf
chksum = nti(buf[148:156])
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 191, in nti
raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 556, in _load
return legacy_load(f)
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/site-packages/torch/serialization.py", line 467, in legacy_load
with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar,
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1591, in open
return func(name, filemode, fileobj, **kwargs)
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1621, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 1484, in init
self.firstmember = self.next()
File "/home/cpslabrtx3090/anaconda3/envs/ASFormer/lib/python3.6/tarfile.py", line 2311, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header

pre-extracted feature

你好，我最近也在研究一个类似的任务，看到你的这篇文章很感兴趣，打算借鉴下。请问下，论文里提到的pre-extract feature指的是每帧图像提取出来的feature map吗，能详细阐述下吗？谢谢

The provided models generate lower scores than the paper reported

Thanks for you nice work, meanwhile, may I confirm one thing? By using your features and pre-trained models (epoch=120), the obtained scores are lower than your BMVC paper for three datasets. For instance, the edit and F1@10 of gtea can only reach 84.0 and 88.9, which are lower than 84.6 and 90.1 in your paper. Same for another two datasets.
50salads edit=75.7, F1@10=83.4.

cross-self attention

您好，在您提供的decoder代码中，V 是来自上一个decoder或者encoder的feature, Q 和K利用的x1，即上一层的输出，是不是和论文中描述的不一样？

Increase the batchsize and the result is hurt

Hi,
Thank you for your work.
When I try to increase the batch size, the index drops a lot. What do you think are the possible reasons

the GPU is A100 40g.

train with default setting, only split 1

(s1)[83.40807175 81.16591928 72.19730942] 75.934108 83.2241

just change batch size to 8 lr 0.001 and then train

(s1)[68.94977169 67.57990868 55.25114155] 63.931922 72.0049

How to extract attention weights

Hi, I was wondering how is it possible to extract the self attention weight? As you have done in Figure 2. I am interested in the hierarchical case.

attention实现的问题

您好，您提到的层次注意力是不是指的是band attention（如下图所示），只不过随着层数增加，窗口大小指数递增。这样的话model.py里这个函数里的那个for循环内容，是不是应该改为window_mask[:, i, i:i+self.bl] = 1

def construct_window_mask(self):
    window_mask = torch.zeros((1, self.bl, self.bl + 2* (self.bl //2)))
    for i in range(self.bl):
        window_mask[:, :, i:i+self.bl] = 1
    return window_mask.to(device)

chinayi / asformer Goto Github PK

asformer's People

Contributors

Stargazers

Watchers

Forkers

asformer's Issues

Recommend Projects

Recommend Topics

Recommend Org