bryanyzhu / two-stream-pytorch Goto Github PK

View Code? Open in Web Editor NEW

556.0 18.0 149.0 44.66 MB

PyTorch implementation of two-stream networks for video action recognition

License: MIT License

Python 100.00%

action-recognition pytorch two-stream video

two-stream-pytorch's People

Contributors

Stargazers

Watchers

Forkers

turingyizhu peratham bityangke qijiezhao d1ngn1gefe1 soumenms2015 sanyammehra tsingzao jinx-ustc blankit wanjinchang sanmucode fffjh liningzh zhangxgu willdamon zumbalamambo yanwang2014 hyzcn wenwu313 yanizhang101 starsky victorleelk jiansowa jakexxh bragilee songpeng326 fengqian1989 aimeng100 jxlin feirenlg zcrwind luqinghit ggq1996 zhuxinqimac dkrathi457 fican1 fmthoker dsp6414 tanyjiang nanzhixionggit yesyu robonich sdd9465 hanimiao jianhua2022 jpchen2012 infinite-song dendisuhubdy xiaoyu5301 panna19951227 salt-fly taokong shuxjweb konglongteng huantingzhao nicholaswen jovialio koala7580 fesianxu vinocherish zhangkaij fendou201398 guoruiwang 2qwerty ji0gui tijoy ylqi wenjinthu hust-wayne larry-c yunfan55 maodong2056 bruceshuyu 13331112522 benjaminliupenrose cmondora staceycy dreamer121121 yulengchuanjiang andrewhuman itwpub yinhance pikapikamsl himanshurepo annto981017 snlee81 jiashuaiyu pcshih zphilip zyx1996 5l1v3r1 lsy7777 zhangwenyuanzhang yaohaizhou yuanyuan2222 krishna-22 chazard dwivy521 lk-greenbird

two-stream-pytorch's Issues

Is there a demo to recognition the action?

Hi, may I know whether there is a demo to detect the human action in a short video?Not giving the accuracy of this net,but the prediction just like jump, run ,fall ,dance.Thanks.

Are optical fow compenents rescaled to [0,255] ?

Hello @bryanyzhu ,

Thank you for the work.

In the paper you mentioned that your rescaled the optical flow components to [0,255] before feeding it to temporal netwok. I want to make sure if it also the case for the pretrained flow model resnet152.

Thank you

Loss changes little (~4.5) and Prec@1 keeps extremely low (~2%) when training with flow images

Hi. The code work well with rgb frames after I made some changes. However, it met some mistakes when training with flow imges. The loss and Prec@1 seemd to keep unchanged.

I ran this code on 4 GPUs and the batch size is 224. I set the learning rate to the initial LR (0.005) decayed by 10 every 30 epochs. The new_length was set as 5 and in_channels was change to be 10 (5*2). The flow images were computed with OpenCV and saved as '.jpg':
flow = cv2.calcOpticalFlowFarneback(prevGray, nextGray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
flow_x = cv2.normalize(flow[..., 0],None,0,255,cv2.NORM_MINMAX)
flow_y = cv2.normalize(flow[..., 1],None,0,255,cv2.NORM_MINMAX)

The following figure shows the results at the beginning.

And the following figure shows the results after 30 epochs.

I tried vgg16 and inception_v3 (pre-trained model / training from scratch). Also I tried different initial LR, from 0.005 to 0.0001. Same issue. It's weird. Does anyone have comments about this?

About flow_vgg16

Thanks for your open source code! And I meet a problem on flow_vgg16. I train flow_vgg16 by
python main_single_gpu.py /data/ltj/tsn -m flow -a flow_vgg16 --new_length=10 --epochs 350 --lr 0.001 --lr_steps 200 300
but the result of last epoch on training process is just
* Prec@1 48.744 Prec@3 69.469
and I think I can't obtain 80% even though I run temporal_demo.py.
Can you give me some advice?
And I just change one line of your code:
rgb_weight_mean = torch.mean(rgb_weight, dim=1, keepdim=True)
it is in function change_key_name of flow_vgg16.py. I just add keepdim=True because it will squeeze on dim 1 if I don't add it.

And I have another question about
clip_mean = [0.5, 0.5] * args.new_length clip_std = [0.226, 0.226] * args.new_length
it is in main_single_gpu.py. I want to know why the clip_main and clip_std need to multiply by new_length. Shouldn't the mean value be constant even though there are 10 samples?

a PROBLEM when using VGG as motion model

Hi,I try to use VGG16 as motion-model to fine-tune on ucf101 dataset,first stack 10 x-axis optical folw pics and 10 y-axis pics to get a 20 pics clip,then send it to VGG16 pretrained on ImagNet(change the in_channel from 3 to 20 and change the last classifier layer to 101)to do classification task. I met the same question as #1.
But when I use ResNet architechiture,everything goes well,does it correspoding to the preprocess on optical-flow pics?I dont normalize the optiflow pics.
my training acc1 and test acc1 is around 1% and dosnt go up.

FileNotFoundError: [Errno 2] No such file or directory: '/home/taimoor/.keras/datasets/cifar-10-batches-py/data_batch_1'

I am running cifar10_cnn.py file using this command and getting following exception
THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python cifar10_cnn.py
Using TensorFlow backend.
Traceback (most recent call last):
File "cifar10_cnn.py", line 37, in
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
File "/home/taimoor/anaconda3/envs/my_env/lib/python3.6/site-packages/keras/datasets/cifar10.py", line 26, in load_data
data, labels = load_batch(fpath)
File "/home/taimoor/anaconda3/envs/my_env/lib/python3.6/site-packages/keras/datasets/cifar.py", line 18, in load_batch
f = open(fpath, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/taimoor/.keras/datasets/cifar-10-batches-py/data_batch_1'

file is present in following directory. i tried by giving absolute path but couldn't run . getting same error.
Tried to get help from google and tried many ways that found via searching issue description but... not resolved
I am Stuck..

Question about training the models together

Hi, I was wondering can the models be trained together with one single loss instead of trained separately?
I have seen other two-stream UCF codes but all of them seem to train separately and then find the loss by adding the output logits or something like that, but do you think its possible to train at once? has this been done?
and if it has, is there any recommended hyperparameters because it seems that they are different for each modality.

Thanks in advance!

How is the two streams fused ?

Hello,

I noticed in the paper (https://arxiv.org/abs/1406.2199) that they finally used svm to fuse the two softmax scores computed by the two streams. Do you know how to use svm in this scenario?

The performance of the flow-vgg16

Hi~For flow-vgg, you obtain 80%, the performance reported in the paper your reproduce is 85.7%. What is the back propagation detph of your experiments, may be an impact factor of your performance. And you didn't metion the setting details of VGG-16.

Batch normalization in VGG16

Hi, I just want to know that if batch_norm will influence the performance in rgb_vgg16, because it looks like that you didn't use batch_norm in rgb_vgg16.
def make_layers(cfg, batch_norm=False):
model = VGG(make_layers(cfg['D']), **kwargs)
I saw there is a rgb_vgg16_bn function but it has no available pretrained module, so how about its performance?

speed and pytorch version

excuse me, i want to known the speed of your project, and the pytroch version. if you do not mind, and the other requirement

Extract Features for Optical Flow Images

Hi @bryanyzhu ,
As title said, how is this possible using your pre-trained model?

Thank you.

frame extraction

Hi dear
how r u extracting frames from video?

How to obtain better precision

I set the parameter as readme but obtain 81.76% for spatial stream on split 1 of UCF101 dataset using ResNet152. How to achieve better performance? Thanks.

fusion two stream feature?

Thank you for your coding! How to use SVM to fuse two stream features？I have searched a lot of data, but I still can't complete the feature fusion of SVM？Can you provide some code or some suggestions！thank you!

ModuleNotFoundError: No module named 'torch'

Unable to run source code.getting error. tried many ways to solve but couldn't.
i am using anaconda latest version. Tunning this command and getting error.
'conda install -c peterjc123 pytorch'

pytorch-0.2.1- 100% |###############################| Time: 0:12:11 727.92 kB/s
pytorch-0.2.1- 100% |###############################| Time: 0:04:07 2.15 MB/s
pytorch-0.2.1- 100% |###############################| Time: 0:01:14 7.19 MB/s

CondaError: CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/peterjc123/win-64/pytorch-0.2.1-py36he6bf560_0.2.1cu80.tar.bz2
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaError: CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/peterjc123/win-64/pytorch-0.2.1-py36he6bf560_0.2.1cu80.tar.bz2
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaError: CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/peterjc123/win-64/pytorch-0.2.1-py36he6bf560_0.2.1cu80.tar.bz2
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

.....
Also disable ssl verification in .condarc . and tried again but same issue occurs again n again. i am googling from last three days but stuck :(

prec@1 prec@3

Hello. I trained the resnet152 model on my own dataset. The accuracy of the train phase is quite good. The prec@1 of the train phase is 86%. But in the validation phase, the display accuracy is very low, and prec@1 and prec@3 The difference is relatively large, the prec@1 and prec@3 in the validation phase are 54% and 76% respectively. What is the reason for this? Prec@3 and prec@1 represent what they are, look forward to your reply, thank you!

dense_flow 可不可以在windows安装

老师我想问下怎么late fusion呀

out of memory

(1)when i run train flow,it often meet out of memory in validate(epoch 23 or 47...),can you give me some advice?Thanks!!!!
(2)How to run it in multi-GPU?
:
Current learning rate is 0.001000:
Epoch: [23][120/383] Time 10.661 (12.373) Loss 1.7992 (1.8488) Prec@1 47.200 (51.433)
Epoch: [23][240/383] Time 10.836 (12.372) Loss 1.7938 (1.8701) Prec@1 50.400 (50.050)
Epoch: [23][360/383] Time 10.346 (12.612) Loss 1.9167 (1.8574) Prec@1 52.000 (50.722)
main_single_gpu.py:282: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
input_var = torch.autograd.Variable(input, volatile=True)
main_single_gpu.py:283: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
target_var = torch.autograd.Variable(target, volatile=True)
main_single_gpu.py:291: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
losses.update(loss.data[0], input.size(0))
main_single_gpu.py:292: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top1.update(prec1[0], input.size(0))
main_single_gpu.py:293: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top3.update(prec3[0], input.size(0))
Test: [0/152] Time 18.244 (18.244) Loss 1.9410 (1.9410) Prec@1 44.000 (44.000) Prec@3 76.000 (76.000)
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main_single_gpu.py", line 362, in
main()
File "main_single_gpu.py", line 192, in main
prec1 = validate(val_loader, model, criterion)
File "main_single_gpu.py", line 286, in validate
output = model(input_var)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/media/ml/G/panna/two-stream-pytorch-master/models/flow_resnet.py", line 154, in forward
x = self.layer3(x)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/media/ml/G/panna/two-stream-pytorch-master/models/flow_resnet.py", line 87, in forward
out = self.conv3(out)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

About learning rate setting

Hi Yi,
How did you decide the lr step ?
Did you follow somewhere else or experiment it youself ?
Thanks in advance!

any idea about how much time will take to train on single 8 gb GPU machine?

Different running env?

Hello,i want to know this repository can be runned in a different environment like cuda 10 and python 3.7 or higher?

what's version of pytorch and cuda

Thank you for your codes ! can you tell me what's version of pytorch and cuda? I run the code with pytorch==0.3.0 and cuda10.1，when i training about 50 epoch,error occurred:
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/torch/lib/THC/THCBlas.cu:246

How the file structure of train sets looks like?

I wanna use my own fall-detection datasets .thanks

the problem about pre-trained model

I download the pre-trained model(ucf101_s1_rgb_resnet152.pth.tar) you give ,but I can't extract the file, Could you please give me some advices? Thanks!

dense_flow

I am facing this error which installing dense flow:

.
.
[ 57%] Building CXX object CMakeFiles/extract_gpu.dir/tools/extract_flow_gpu.cpp.o
[ 64%] Building CXX object CMakeFiles/extract_warp_gpu.dir/tools/extract_warp_flow_gpu.cpp.o
[ 71%] Building CXX object CMakeFiles/pydenseflow.dir/src/py_denseflow.cpp.o
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/extract_warp_gpu.dir/all' failed
make[1]: *** [CMakeFiles/extract_warp_gpu.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/extract_cpu.dir/all' failed
make[1]: *** [CMakeFiles/extract_cpu.dir/all] Error 2
CMakeFiles/Makefile2:178: recipe for target 'CMakeFiles/extract_gpu.dir/all' failed
make[1]: *** [CMakeFiles/extract_gpu.dir/all] Error 2
CMakeFiles/Makefile2:215: recipe for target 'CMakeFiles/pydenseflow.dir/all' failed
make[1]: *** [CMakeFiles/pydenseflow.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

How to solve this issue?

code undestanding

two-stream-pytorch/models/flow_resnet.py

Line 167 in 6b0b4ec

def change_key_names(old_params, in_channels):

hello bryanyzhu:
I'm reading your code,but i can't understand the effect of this function. Could you please explain it briefly?
thanks

Flipping in flow

I have seen that in the test code, for the horizontal flipping of x component of optical flow, the image is reversed and the reversed image is deducted from 255. Why dont we just take the reversed image instead? This is the line 71 of video temporal prediction which is
255 - img_x[:, ::-1]

Thanks in advance

Use the video input from the camera for action recognition

Hello, author, I have reproduced your code now, but I want to use it to achieve a function of reading in the video while classifying actions such as shaking hands and hugging. How can I achieve this

Errors about imput.size[1]

Hello. when I run ' main_single_gpu.py' with rgb frames, I met mistakes as follows:

Traceback (most recent call last):
  File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/main_single_gpu.py", line 357, in <module>
    main()
  File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/main_single_gpu.py", line 182, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/main_single_gpu.py", line 229, in train
    output = model(input_var)
  File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/models/rgb_vgg16.py", line 30, in forward
    x = self.features(x)
  File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 254, in forward
    self.padding, self.dilation, self.groups)
  File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 52, in conv2d
    return f(input, weight, bias)
RuntimeError: Need input.size[1] == 3 but got 30 instead.

The input.size is (2L, 30L, 224L, 224L). Every videos' 10 frames are fed to VGG, then the net cannot deal with the input.

two stream fusion

Hi bryanyzhu:
thanks for your opensource code!
the spatial and temporal stream were ahieved. I want to know ,wheter the fusion of these two steram is reached ??.I didn't find the relevant source code under the project.

Can I get the weight or pretrain model in China?

Hi,
I want to download these two pre-trained models but I can't get the weight from google because in China there is a GWF.
Pre-trained RGB_ResNet152 Model Pre-trained Flow_ResNet152 Model
How can I get these weight?
My email is [email protected].
Thank you very much!
@bryanyzhu

dense_flow issue

when I run: python build_of.py --src_dir ./UCF-101 --out_dir ./ucf101_frames --df_path
I have an error :build_of.py: error: unrecognized arguments
build_of.py [-h] [--src_dir SRC_DIR] [--out_dir OUT_DIR]
[--df_path DF_PATH] [--new_width NEW_WIDTH]
[--new_height NEW_HEIGHT] [--num_worker NUM_WORKER]
[--num_gpu NUM_GPU] [--out_format {dir,zip}]
[--ext {avi,mp4}]

test video

Can the model be directly input video for testing? How do I do that?

Expected accuracy

Hi,
May I ask what is the expected best accuracy I might find? So far I was able to achieve 73.64% as validation accuracy in RGB and I currently running the Flow experiments.

Thank you

A new error about input.size[1]

Hello! It's very nice of you to modify main_single_gpu.py. I found you change the parameter new_length from 10 to 1. Does new_length represent the number of frames fed into the network? When I change new_length's value, I get an error RuntimeError Need input.size[1] == 3 but got xx instead

the result of ssn_test.py

Hi yaunjun
Your work is so useful, thank you very much for your open source. I'm trying to run your code, and I want to know the meaning of the ssn_test.py result(variable rst). can you help me point them out in this code project?
thanks a lot

Can you provide your results for loss and accuracy values of spatial and temporal training?

In what way the data is organized in the folders?

Problems about VideoSpatialPrediction.py

Hello! Thanks for your codes! I have learned a lot from it. But as a beginner, I am a little confused about the code below in VideoSpatialPrediction.py:

crop

rgb_1 = rgb[:224, :224, :,:]
rgb_2 = rgb[:224, -224:, :,:]
rgb_3 = rgb[16:240, 60:284, :,:]
rgb_4 = rgb[-224:, :224, :,:]
rgb_5 = rgb[-224:, -224:, :,:]
rgb_f_1 = rgb_flip[:224, :224, :,:]
rgb_f_2 = rgb_flip[:224, -224:, :,:]
rgb_f_3 = rgb_flip[16:240, 60:284, :,:]
rgb_f_4 = rgb_flip[-224:, :224, :,:]
rgb_f_5 = rgb_flip[-224:, -224:, :,:]

rgb = np.concatenate((rgb_1,rgb_2,rgb_3,rgb_4,rgb_5,rgb_f_1,rgb_f_2,rgb_f_3,rgb_f_4,rgb_f_5), axis=3)

_, _, _, c = rgb.shape

Why do you code like this? I do not know why we should do that and what they mean.
Looking for your answering!

I use your restnet152 model parameters for testing， but in split_1 the accuracy is only 67.59%.

The number of GPUs?

Hello,I double how can i train this code with several GPUs? Since i just tried to run the code "main_sigle_gpu.py",only one GPU began to work.How to solve this problem?

[Question] Splits used for pretrained model...

Thanks for sharing this repo, it is very helpful. Just a quick question about the pretrained temporal resnet152 model you have for download:

Was the model ONLY trained on the split01 train set?
Are the splits you are using the official splits from the UCF101 website?

Thanks for the info!

Flow_model_accuarcy

hi, @bryanyzhu thanks for your nice share!
Wang[1] provide a method called Cross modality pre-training which may improve the flow model performance.

[1]. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Stuck there when I run "python main_single_gpu.py datasets/"

2017-09-21 04:19:56,066 - INFO - Building model ...
/home/ytan/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py:360: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
own_state[name].copy_(param)
2017-09-21 04:20:14,050 - INFO - Model flow_vgg16 is loaded.
2017-09-21 04:20:14,051 - INFO - Saving everything to directory ./checkpoints.
2017-09-21 04:20:14,100 - INFO - 13320 samples found, 9537 train samples and 3783 test samples.
2017-09-21 04:20:14,100 - INFO - 0.005
Could not load file datasets/JumpingJack/v_JumpingJack_g25_c04/flow_x_00029.jpg or datasets/JumpingJack/v_JumpingJack_g25_c04/flow_y_00029.jpg
Could not load file datasets/CliffDiving/v_CliffDiving_g17_c01/flow_x_00060.jpg or datasets/CliffDiving/v_CliffDiving_g17_c01/flow_y_00060.jpg
Could not load file datasets/RopeClimbing/v_RopeClimbing_g17_c05/flow_x_00074.jpg or datasets/RopeClimbing/v_RopeClimbing_g17_c05/flow_y_00074.jpg
Could not load file datasets/PlayingSitar/v_PlayingSitar_g24_c07/flow_x_00186.jpg or datasets/PlayingSitar/v_PlayingSitar_g24_c07/flow_y_00186.jpg
/////////////////////////////////////////////////////////////////////////
It just stops here and do not have any progress.

What is the accuracy of UCF101?

Hi, thank you very much for providing the code！
Could you tell me what is the accuracy of UCF101 and under what parameters?
Thanks again！

Testing from Video input

How do I test it realtime or any video as an input using VideoCapture?

About pre-trained Model

If I want to obtain a new pre-trained Flow_VGG16 Model of other video data, what should I do?

What's your next plan?

Hi, @bryanyzhu :

I recently write a same repo about two-stream video classification using PyTorch, I realized 85.5% of spatial frame's accuracy on split1 of UCF101 last month. I want to cooperate with you, help you build a more useful framework. (implement some famous works, such as TSN, C3D and so on), so I want to communicate more with you if you are interested in my thought.

bryanyzhu / two-stream-pytorch Goto Github PK

two-stream-pytorch's People

Contributors

Stargazers

Watchers

Forkers

two-stream-pytorch's Issues

crop

Recommend Projects

Recommend Topics

Recommend Org