yjxiong / action-detection Goto Github PK

View Code? Open in Web Editor NEW

641.0 30.0 175.0 7.96 MB

temporal action detection with SSN

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

action-recognition action-detection temporal-activity-localization video-understanding structured-segment-networks

action-detection's People

Contributors

Stargazers

Watchers

Forkers

bityangke zhufengx dimplesl 3dmm-icme2023 qijiezhao ml-lab wanjinchang jm2981858 zhanninggao suhaisheng hyzcn wanghxcis wenli-vision xinyuegtxy walter1218 willdamon rlagent yusea yurkovanton zjukop vipbinbin wangwen39 ken2yliu lengmm wsyjwps1983 labimage choiyeren dengcy028 liygcheng junmuzi jdc08161063 fendaq lysh chituma110 eglxiang aimeng100 rizhiy shubhampachori12110095 wuxiaomin0110 jonghwanmun ahuirecome zhanghaoinf tanyjiang junx1992 shuidongliu solomon1588 hanyong0519 tangshixiang adwardlee yoosan xiaobai12345 hzhang57 ttengwang akumar14 zhang-can hudengjunai helq2612 riteshhota2008 fangbren cooparation charles-xie dreamlee0625 asuradayuci fytrace compass-wang liviust zgsxwsdxg yunwenhuang liu-zhy 124399839 andrewhuman ewenwan zhaoyue-zephyrus pingaowang hqz2 daringpig wzmsltw zaie autohe xuyunlu1030 malithakabir hanimiao jacoblee121 crazysherman vinxentzhang erinchen824 ivyvideo mynameiziji dreadlord1984 xiaoanshi nemonameless swim010 northstar salt-fly blankworld gunnerwang june01 jindingwang ybcliff kevintrannz

action-detection's Issues

Binary actioness classifier training in TAG

Hi, @yjxiong

Would you release your actioness classifier network arch?
When i train the Binary actioness classifier, It's difficult to converge on big dataset(just over 20W extracted frame) with some simple CNN arch. RGB modality and flow modality suffer similar problem.

The method of organising train dataset as below:
Positive : frames in all ActivityNet1.3 action instance, sampling interval set to 6, then random choosing some frames.
Negative : frames in all ActivityNet1.3 backgroud , sampling interval set to 6, then random choosing some frames.

Is there any problem in dataset organising ? Should I use all Positive/Negative frames?

thanks a lot.

IndexError: The advanced indexing objects could not be broadcast

When I train the model of activitynet 1.2,
train(train_loader, model, activity_criterion, completeness_criterion, regression_criterion, optimizer, epoch)

activity_out, activity_target,
completeness_out, completeness_target,
regression_out, regression_labels, regression_target = model(input_var, scaling_var, target_var,
reg_target_var, prop_type_var)

in ss_train.py raised an error:IndexError: The advanced indexing objects could not be broadcast.
How to fix this problem?

Coverage Threshold

During dataset creation, bg_coverage of the whole video is checked in:

action-detection/ssn_dataset.py

Lines 127 to 129 in 4cd92c9

 if tag[i] == 0 and \ 

 self.proposals[i].best_iou < bg_iou_thresh and \ 

 self.proposals[i].coverage > bg_coverage_thresh:

Why is this done? Can I remove this condition?

My videos are very long, with lots of activity. Therefore backgrounds are quite short, therefore most of the proposals fail.

Interval of snippets ?

Hi
I am reading your CVPR2017 paper about action-detection recently, really a wonderful work.
I have a problem about paper. You said in paper :" Given a video, a sequence of snippets will be extracted with a regular interval in between." I want to know the size of interval you used in both THUMOS14 and ActivityNet.
Thank you so much.

HDD training speed

Hi, I'm trying to train the model.
My data is stored on HDD and the script takes a long time to load data into memory, how can I speed it up?

No module model_zoo

Hello, @yjxiong
I train the network with 'ssn_train.py',and then come across the error :import model_zoo,and no module named ' model_zoo' in 'ssn_models.py'

average recall rates of ActivityNet v1.3

Hi, I have read your paper "Temporal Action Detection with Structured Segment Networks" and I have a question about experiments. Can you give the average recall rate of your TAG method in Anet v1.3 ? Thanks.

Questions about dataloader

First of all I want to thank you.
I have read the main code of SSN and have a questions about training code.
the model and criterion is moved to GPU with '.cuda()', while I found no '.cuda()' for input data of the model, which means the data is on cpu memory. Why?

How to run ssn on custom videos

Hi @yjxiong ,
Thanks for your cool work, it's really useful. I wonder how to run ssn action detection on custom videos(which are not included in ActvityNet and Thumos) using pretrained model, can you give me some hints? Thank you in advance.

pretrained model on thumos14

Hi, @yjxiong wonderful job!
Could you tell me where to download the pretrained model on THUMOS14 dataset? Thank you very much!

Could you please release TSN features？

Could you please release a set of TSN features of public dataset（e.g. THUMOS'14）？
Extracting these features is so time and computing resource consuming.
This must be very helpful to the community.
Thank you in advance!

problem of proposal_list

Hi, @yjxiong can you release your proposal_list? When I use my generated proposal_list file, testing is always wrong, it says that math domain error, maybe because gt_size < prop_size in

def compute_regression_targets(self, gt_list, fg_thresh):

in ssn_dataset.py

Flow + RGB result

Are the results reported for 'Flow + RGB' come from 2 model ensemble?

Proposal list

Hi, in your proposal_list.txt,what does each line stand for?
something like below:

1

video_validation_0000354
1
1
0
226
0 0.0000 0.0000 0.7439 0.7713
0 0.0000 0.0000 0.7533 0.7698
0 0.0000 0.0000 0.7259 0.7682
0 0.0000 0.0000 0.7125 0.7698
0 0.0000 0.0000 0.7408 0.7737
0 0.0000 0.0000 0.7408 0.7792

How can I get the proposal list like this ?can you release the code to produce the proposal list? Thank you so much! Looking forward to your reply~

Results for single stream network

Hi, I am implementing your model on my own. Could you please release the results for single stream (e.g RGB, optical flow) on THUMOS14? It will be very helpful to check the intermediate results.

By the way, about the two-stream networks, does the fusion process apply on the frame-wise scores when testing?

Thanks a lot!

problem of detection result

Hi, @yjxiong when I use your pretrained model to test, my result is worse than paper as follows：

+Detection Performance on thumos14------+--------+--------+--------+--------+--------+--------+---------+
| IoU thresh | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 | Average |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+
| mean AP | 0.5606 | 0.5084 | 0.4379 | 0.3422 | 0.2466 | 0.1589 | 0.0926 | 0.0399 | 0.0052 | 0.2658 |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+

Do you know why please？

AN < classid , category name > mapping

Dear all,

I am working on AN v1.2 now and trying to map the classid used in ssn (which is indexed from 0 to 99) to its original category name. I am wondering does anyone have such mapping file for AN v1.2 and v1.3? Thanks much!

Best,
Zheng

confused about ssn_op.py

class CompletenessLoss(torch.nn.Module):
def init(self, ohem_ratio=0.17):
super(CompletenessLoss, self).init()
self.ohem_ratio = ohem_ratio

    self.sigmoid = nn.Sigmoid()

def forward(self, pred, labels, sample_split, sample_group_size):
    pred_dim = pred.size()[1]
    pred = pred.view(-1, sample_group_size, pred_dim)
    labels = labels.view(-1, sample_group_size)

    pos_group_size = sample_split
    neg_group_size = sample_group_size - sample_split
    pos_prob = pred[:, :sample_split, :].contiguous().view(-1, pred_dim)
    neg_prob = pred[:, sample_split:, :].contiguous().view(-1, pred_dim)
    pos_ls = OHEMHingeLoss.apply(pos_prob, labels[:, :sample_split].contiguous().view(-1), 1,
                                 1.0, pos_group_size)
    neg_ls = OHEMHingeLoss.apply(neg_prob, labels[:, sample_split:].contiguous().view(-1), -1,
                                 self.ohem_ratio, neg_group_size)
    pos_cnt = pos_prob.size(0)
    neg_cnt = int(neg_prob.size()[0] * self.ohem_ratio)

    return pos_ls / float(pos_cnt + neg_cnt) + neg_ls / float(pos_cnt + neg_cnt)

why not apply sigmoid function to pred ?
pred is a fc layer output without normalized to [0,1]

Or I get something wrong?
thank you

Length and content of videos

Is there a guideline to how long the videos should be and how much activity they should contain?
e.g.

If I have a long video which is mostly background, should I cut the background parts out?
If I have a long video with lots of activity, should I split it into several short ones? I noticed that there is a number of proposals specified in dataset config:

action-detection/data/dataset_cfg.yaml

Line 11 in 4cd92c9

prop_per_video: 8

Does that mean that only 8 proposals are used per video?

How to train using RGB and flow modality?

Hi!
Thanks for your great works!
I read your README file and notice that I can train your model by using
python ssn_train.py thumos14 MODALITY -b 16 --lr_steps 20 40 --epochs 45
I can choose MODALITY to be RGB or flow
But how can I train the model using both RGB and flow?
Or I should train them separately and put them together only in testing?
Thanks for your help!

when will you release the code of "A pursuit of temporal accuracy in general activity detection "

Hi,thanks for your reading!
I have read your paper of "A pursuit of temporal accuracy ...",i found the method of TAG is useful for my work,could you please release the code?
Thank you so much ~
Looking forward to your replay~

RuntimeError: out of memory when batch size is bigger than 2

Hi, when I train SSN on thumos14 with the command:
python ssn_train.py thumos14 MODALITY -b 16 --lr_steps 20 40 --epochs 45
, more specifically, sudo python3 ssn_train.py thumos14 RGB -b 16 --lr_steps 20 40 --epochs 45,
I got the RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58 error.
Only when I reduced the batch size to 2 the error disappeared. The memory usage is 9256MiB/12207MiB
of one GPU when batch size is 2. But I quickly got nan loss most of the time. I didn't modify the release code except some small changes of name and path. So I think it's unreasonable that the batch size is so small. Does anyone know why?

My environment info is as follows,
Python3.5, Cuda 7.5, nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 00000000:04:00.0 Off |                  N/A |
| 22%   58C    P2    72W / 250W |    325MiB / 12207MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:05:00.0 Off |                  N/A |
| 26%   40C    P8    13W / 250W |     11MiB /  6081MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

I installed PyTorch by the following commands:

pip3 install http://download.pytorch.org/whl/cu75/torch-0.3.0.post4-cp35-cp35m-linux_x86_64.whl 
pip3 install torchvision

The full log is as follows.

administrator@xxxx:xxxx/action-detection$ sudo python3 ssn_train.py thumos14 RGB -b 16 --lr_steps 20 40 --epochs 45

    Initializing SSN with base model: BNInception.
    SSN Configurations:
        input_modality:     RGB
        starting_segments:  2
        course_segments:    5
        ending_segments:    2
        num_segments:       9
        new_length:         1
        dropout_ratio:      0.8
        loc. regression:    ON
        bn_mode:            frozen

        stpp_configs:       (1, 1, 1)

/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py:482: UserWarning: src is not broadcastable to dst, but they have the same number of elements.  Falling back to deprecated pointwise behavior.
  own_state[name].copy_(param)
Freezing all BatchNorm2D layers
computing regression target normalizing constants


            SSNDataset: Proposal file data/thumos14_tag_val_proposal_list.txt parsed.

            There are 28231 usable proposals from 200 videos.
            6676 foreground proposals
            17950 incomplete_proposals
            3605 background_proposals

            Sampling config:
            FG/BG/INC: 1/1/6
            Video Centric: True

            Epoch size multiplier: 10

            Regression Stats:
            Location: mean -0.02322 std 0.08391
            Duration: mean -0.00504 std 0.19560

/usr/local/lib/python3.5/dist-packages/torchvision/transforms/transforms.py:156: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  "please use transforms.Resize instead.")


            SSNDataset: Proposal file data/thumos14_tag_test_proposal_list.txt parsed.

            There are 33634 usable proposals from 210 videos.
            7298 foreground proposals
            21316 incomplete_proposals
            5020 background_proposals

            Sampling config:
            FG/BG/INC: 1/1/6
            Video Centric: True

            Epoch size multiplier: 1

            Regression Stats:
            Location: mean -0.02322 std 0.08391
            Duration: mean -0.00504 std 0.19560

group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 71 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 71 params, lr_mult: 2, decay_mult: 0
group: BN scale/shift has 0 params, lr_mult: 1, decay_mult: 0
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
inception_3a_1x1_bn torch.Size([576, 64, 28, 28])
inception_3a_3x3_bn torch.Size([576, 64, 28, 28])
inception_3a_double_3x3_2_bn torch.Size([576, 96, 28, 28])
inception_3a_pool_proj_bn torch.Size([576, 32, 28, 28])
Traceback (most recent call last):
  File "ssn_train.py", line 418, in <module>
    main()
  File "ssn_train.py", line 154, in main
    train(train_loader, model, activity_criterion, completeness_criterion, regression_criterion, optimizer, epoch)
  File "ssn_train.py", line 208, in train
    reg_target_var, prop_type_var)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 68, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 78, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
    raise output
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
    output = module(*input, **kwargs)
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/share/v-yuphu/action-detection/ssn_models.py", line 255, in forward
    return self.train_forward(input, aug_scaling, target, reg_target, prop_type)
  File "/mnt/share/v-yuphu/action-detection/ssn_models.py", line 266, in train_forward
    base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:]))
  File "/home/administrator/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/share/v-yuphu/action-detection/model_zoo/bninception/pytorch_load.py", line 56, in forward
    data_dict[op[2]] = torch.cat(tuple(data_dict[x] for x in op[-1]), 1)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
THCudaCheckWarn FAIL file=/pytorch/torch/lib/THC/THCStream.cpp line=50 error=29 : driver shutting down
THCudaCheckWarn FAIL file=/pytorch/torch/lib/THC/THCStream.cpp line=50 error=29 : driver shutting down

train and test video numbers in THUMOS 2014 Dataset

@yjxiong hello,thanks for your excellent again,but for now i got confused about your thumos14_tag_test_normalized_proposal_list.txt file.

as far as i know, the THUMOS 2014's validation and test set contain 1010 and 1574
untrimmed videos separately. but there are only 20 action categories are involved and annotated temporally.,which is 200 validation set videos and 213 test set videos are used for temporal action detection task,but in your thumos14_tag_test_normalized_proposal_list.txt ,it contains all 1574 videos proposal start time and end time，since it is impossible to train for those videos which does not have annotation information，so during training or testing，did you really train and test on all videos（1010 and 1574） or you just train on（200 +213） and the other videos in thumos14_tag_test_normalized_proposal_list.txt is not used at all？

details about the temporal actionness grouping?

Hi @yjxiong ,
Could you please tell me about the details about the temporal actionness grouping if possible? What is the duration of each snippet and how many RGB frames and optical flow frames are sampled from each snippet? Thanks.

problems about proposal file

what are these zeros mean?

probelm of overlapped and unseen THUMOS14 categories

Hi Yuanjun,

I also want to evaluate the generaliability of our new proposal method as Table 2. in "A Pursuit of Temporal Accuracy in General Activity Detection". But I am not sure about which 10 classes are overlapped in THUMOS14. Can you list these classes? Thanks.

Performance of THUMOS‘14 IoU = 0.6,0.7

Hi Yuanjun,
We want to cite your paper in our work, but we need the mAP of THUMOS14 when IoU = 0.6 and 0.7 , which is not provided in your paper.
Do you have the results at hand for this setting?
Thanks!

What's the size of extracted frames and optical flow images?

The authors suggest using the tools provided in the TSN repo to extract frames and optical flow images. I wonder should I resize the images the same way as TSN repo? What's the image size the authors used?
One dataset used in TSN repo is UCF101. The original videos' size is 320*256. The command in TSN repo resize images to 340*256 as follows,

python tools/build_of.py ${SRC_FOLDER} ${OUT_FOLDER} --num_worker ${NUM_WORKER} --new_width 340 --new_height 256 2>local/errors.log

And in the *_train_val.prototxt the crop_size is 224.
The size of THUMOS14 videos is 320*180, in resnet/vgg/BNInception models input_size = 224, in InceptionV3/inception models input_size = 299.

I got worse results when I used reference models to evaluate my extracted images, which haven't been resized (320*180). (The image number is the same as the image number of denseflow extracted images)

action-detection$ python3 ssn_test.py thumos14 RGB none score_thumos14_rgb_reference.npz --use_reference
action-detection$ python3 eval_detection_results.py thumos14 score_thumos14_rgb_reference.npz

rgb reference model
+Detection Performance on thumos14------+--------+--------+--------+--------+--------+--------+---------+
| IoU thresh | 0.10   | 0.20   | 0.30   | 0.40   | 0.50   | 0.60   | 0.70   | 0.80   | 0.90   | Average |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+
| mean AP    | 0.4375 | 0.3839 | 0.3266 | 0.2430 | 0.1639 | 0.1051 | 0.0588 | 0.0244 | 0.0059 | 0.1943  |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+


action-detection$ python3 ssn_test.py thumos14 Flow none score_thumos14_flow_reference.npz --use_reference
action-detection$ python3 eval_detection_results.py thumos14 score_thumos14_flow_reference.npz

+Detection Performance on thumos14------+--------+--------+--------+--------+--------+--------+---------+
| IoU thresh | 0.10   | 0.20   | 0.30   | 0.40   | 0.50   | 0.60   | 0.70   | 0.80   | 0.90   | Average |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+
| mean AP    | 0.4166 | 0.3769 | 0.3183 | 0.2538 | 0.1907 | 0.1208 | 0.0698 | 0.0250 | 0.0045 | 0.1974  |
+------------+--------+--------+--------+--------+--------+--------+--------+--------+--------+---------+

ActivityNet Downloader Link

It currently points to this: https://github.com/yjxiong/anet2016-cuhk; but shouldn't it point to this: https://github.com/activitynet/ActivityNet?

Incomplete description in wiki

On wiki page about proposal format it says:

FPS The next line is for the frames per second (FPS) of this video, if the unit is frame, then this line will be 1. If the unit is

and ends there.

What should I put for FPS if my proposals are normalised?

own trained check file dose not contain 'arch', 'best_loss', 'epoch', 'reg_stats'

I trained the weights using ssn_train.py and got the weight pth.tar file.
But, when I try to test with this weight file, error occured.
The weight file saved by ssn_train.py is not correctly configured for ssn_test requiring 'arch', best_loss', 'epoch', 'reg_stats', 'state_dict' as key in dictionary.
Do I need to manually change configuration of the weight file?
And what is 'reg_stats'?

how to get best_iou and overlap_self value in thumos14_tag_val_normalized_proposal_list.txt

hello @yjxiong ，i am trying to reimplement your TAG method，for now i can get actionness score and do the grouping scheme，and after the NMS，i can get 82 proposals for video_validation_0000201.mp4, and
i want to replace the proposals in your thumos14_tag_val_normalized_proposal_list.txt with mine to check if my reimplement is all right , but yout TXT file not only contain the proposal itself,but also the best_iou and overlap_self items to calculate the incomplete proposal as well as the background proposal , but i wonder how to calculate these two items,because there is no description in your paper about it.

e.g: thumos14_tag_val_normalized_proposal_list.txt

59

video_validation_0000201
1
1
1
8 0.8631 0.8926
32
8 0.7603 0.8995 0.8684 0.8956
8 0.6497 0.6497 0.8593 0.9047
0 0.0000 0.0000 0.1354 0.1626
##format
(label,best_iou,overlap_self,start_frame_norm,end_frame_norm)

THANK YOU .

Run on untrimmed videos

Can i run the test code on untrimmed videos.

How does DataParallel work?

Sorry to bother.
I want to ask a question about how DataLoader work. I find that no matter what value the batchsize is, every gpu kernel at leat process (1+6+1)=8 proposal. Even if I set batchsize=2, then only 2 of 8 gpu kernels will be used and each gpu kernel process 8 proposal. Why?

Problem about recall performance of TAG

Hi Yuanjun,

I generate proposal list of THUMOS14 using "gen_proposal_list.py" and got "thumos14_tag_test_proposal_list.txt". Then I use evaluation codes in https://github.com/escorciav/daps/wiki/FAQs , where random score is used for proposals retrieving.

However, I only got AR 39.14% ( AR is calculated with [0.5:0.05:1.0] ), which is lower than 48.9% in SSN paper.
And using this evaluation code, the performance of following method is (with 200 proposals)
DAPs: 33.96%
TAP( sparseprop): 23.13%
SCNN-prop: 37.01% (for performance of scnn-prop, I guess you need to read this: https://github.com/escorciav/daps/wiki )

So how did you evaluate these models? And how can I re-produce TAG performance reported in SSN paper? Thanks!

mdoel size not match

hello ,
I run the code and then come across the problem about size that model size does not match parameter size.
RuntimeError: While copying the parameter named inception_4d_pool_proj_bn.running_var, whose dimensions in the model are torch.Size([128]) and whose dimensions in the checkpoint are torch.Size([1, 128]).

About proposal list (no CliffDiving class in val list, missing video_test_0001496 in test list)

Why all validation proposals (used for training) you generated for CliffDiving class videos are labelled with 8 (Diving) but not 5 (CliffDiving)? Wouldn't this deteriorate the performance of CliffDiving classifier?
For example, in thumos14_tag_val_normalized_proposal_list.txt

# 380
video_validation_0000161
1
1
8
8 0.1826 0.2280
8 0.3393 0.3863
8 0.4268 0.4754
8 0.8228 0.8957
5 0.1826 0.2280
5 0.3393 0.3863
5 0.4268 0.4754
5 0.8228 0.8957
86
8 0.3272 0.3438 0.3458 0.4689
8 0.1725 0.1725 0.1972 0.4689
8 0.2917 0.3254 0.3522 0.4625
8 0.2999 0.2999 0.3139 0.4754
8 0.1530 0.1530 0.1777 0.4949
8 0.2458 0.2458 0.2977 0.4949
8 0.7752 0.8887 0.3458 0.3911
8 0.7883 0.8401 0.3425 0.3944
8 0.8001 0.8023 0.3393 0.3976
8 0.7602 0.9535 0.3490 0.3879
......

thumos14_tag_test_normalized_proposal_list.txt has 200 videos while there are 213 videos in TH14_Temporal_Annotations_Test\xgtf_renamed. Two reasonable missing videos are video_test_0000270 (its annotationa are HammerThrow but its ground truth in video is HairCut which doesn't belong to the 20 classes) and video_test_0001292 (it only has ambiguous annotations).
It seems that another missing video video_test_0001496 can be included into test list after modifying the annotations (annotations are CricketShot while ground truth is FrisbeeCatch).

proposal generation code

would you please release the code for training binary classifier based on TAG and generating proposals by
watershed algorithm?

issue about the thumos14_test_normalized_proposal_list.txt

hello guy,I just try to reproduce your amazing work, and for convenience(computational cost),I just use 213 videos instead which are used later for test in thumos14 dataset eval toolkit.But I find that there might be something wrong about your groundtruth annotation in thumos14_tag_test_normalized_proposal_list.txt file.For example, you can check your groundtruth annotations in following three videos:video_test_0001292、video_test_0000270、video_test_0001496.
In your .txt file,these three videos are negative with 0 gt instance,however in thumos14 dataset test annotation , all of them include several groundtruth action instances.So when i run the ssn test python file,my video numbers decrease from 213 to 210,and the final reproducing results tend to be lower than yours listed in the paper(about 1.5% difference).WAITING FOR YOUR REPLY, thx so much!

proposal list for activitynet 1.3

Can you please release the proposal list for activitynet v1.3?

How to get temporal region proposals by myself?

As it provides the proposal file, I don't need it to reproduce the results. However, I want to know how to train the actionness classifier to evaluate the actionness for snippets.
BTW, how to define a snippet, i.e. how many frames within a snippet? Thanks!

Realtime prediction

How do I predict realtime from the cam?

Window scales for sliding window

Hi, I just read your papers and notice that the method can also achieve good performance with proposals generated by sliding windows. In your paper, "we generate windows in 20 exponential scales starting from 0.3 second long". Could you provide the details of 20 scales? Thanks a lot!

ssn_train.py

when I run： python ssn_train.py thumos14 RGB -b 16 --lr_steps 20 40 --epochs 45
something wrong,i donot know why ?nobody else have met this issue?
File "ssn_train.py", line 103
modality=args.modality, exclude_empty=True, **sampling_configs,
^

SyntaxError: invalid syntax

pytorch weight file

I'm trying to run 'ssn_test.py'.

I get the arguments in the example script below, but where can I get TRAINING_CHECKPOINT?

python ssn_test.py DATASET MODALITY TRAINING_CHECKPOINT RESULT_PICKL

There is no way to download 'the trained pytorch weights file' from authors.

val_proposal_list about THUMOS14

Hi,I run the code gen_proposal_list.py, and then get the data about that:

0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
13 0.4760 0.4823 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0
0 0.0000 0.0000 0 0

in above proposal data,the are many invalid data,is that correct??

issue about the annotation(start frame and ending frame）calculation

hello,another question plz...
I wonder how you calculate the start frame and end frame of each groundtruth instance given the start time and end time by thumos14 dataset.
for example,in the video_test_0000004, the start and end time is 0.2 and 1.1 etc. . I multipy it by 30(frame rate) respectively.But the result(
6 | 33
342 | 366
558 | 624
849 | 891
30 | 45
624 | 669
909 | 951
)
is different from yours(
4 32
340 364
555 621
845 887
29 44
621 666
905 947
),so i want to know how you calculate it~thx very much!

Question about sample balance

In your paper, the ratio of positive, background, and incomplete proposals is 1:1:6. So when you train the action classifier, background proposal : foreground proposal is 1:1, but there are 100 class(in activitynet1.2), which means that the sample of foreground of each class is much more less than backgound sample. Should this problem be ignored? Would it bring some problem?

I tried training SSN model without any modification of your code, and find that after several backward and update operation, the model tends to predict all the proposals as background. How come?

Would you share the detection result json file on ActivityNet?

Dear Author:

I am reading your papers:

A Pursuit of Temporal Accuracy in General Activity Detection
Temporal Action Detection with Structured Segment Networks

Thanks for your great works!

Currently, I am evaluating my detection result on one out of five subsets of ActivityNet 1.2 validation set. However, on one has ever release the results on these five subsets respectively.

I am wondering would you share your detection result json file of these two works on ActivityNet 1.2 or 1.3 validation set with me?

If yes, can you email the file to [email protected] or just paste down below? That means a lot!