bryanyzhu / two-stream-pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of two-stream networks for video action recognition
License: MIT License
PyTorch implementation of two-stream networks for video action recognition
License: MIT License
Hi, may I know whether there is a demo to detect the human action in a short video?Not giving the accuracy of this net,but the prediction just like jump, run ,fall ,dance.Thanks.
Hello @bryanyzhu ,
Thank you for the work.
In the paper you mentioned that your rescaled the optical flow components to [0,255] before feeding it to temporal netwok. I want to make sure if it also the case for the pretrained flow model resnet152.
Thank you
Hi. The code work well with rgb frames after I made some changes. However, it met some mistakes when training with flow imges. The loss and Prec@1 seemd to keep unchanged.
I ran this code on 4 GPUs and the batch size is 224. I set the learning rate to the initial LR (0.005) decayed by 10 every 30 epochs. The new_length was set as 5 and in_channels was change to be 10 (5*2). The flow images were computed with OpenCV and saved as '.jpg':
flow = cv2.calcOpticalFlowFarneback(prevGray, nextGray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
flow_x = cv2.normalize(flow[..., 0],None,0,255,cv2.NORM_MINMAX)
flow_y = cv2.normalize(flow[..., 1],None,0,255,cv2.NORM_MINMAX)
The following figure shows the results at the beginning.
And the following figure shows the results after 30 epochs.
I tried vgg16 and inception_v3 (pre-trained model / training from scratch). Also I tried different initial LR, from 0.005 to 0.0001. Same issue. It's weird. Does anyone have comments about this?
Thanks for your open source code! And I meet a problem on flow_vgg16. I train flow_vgg16 by
python main_single_gpu.py /data/ltj/tsn -m flow -a flow_vgg16 --new_length=10 --epochs 350 --lr 0.001 --lr_steps 200 300
but the result of last epoch on training process is just
* Prec@1 48.744 Prec@3 69.469
and I think I can't obtain 80% even though I run temporal_demo.py.
Can you give me some advice?
And I just change one line of your code:
rgb_weight_mean = torch.mean(rgb_weight, dim=1, keepdim=True)
it is in function change_key_name
of flow_vgg16.py
. I just add keepdim=True
because it will squeeze on dim 1 if I don't add it.
And I have another question about
clip_mean = [0.5, 0.5] * args.new_length clip_std = [0.226, 0.226] * args.new_length
it is in main_single_gpu.py. I want to know why the clip_main
and clip_std
need to multiply by new_length
. Shouldn't the mean value be constant even though there are 10 samples?
Hi,I try to use VGG16 as motion-model to fine-tune on ucf101 dataset,first stack 10 x-axis optical folw pics and 10 y-axis pics to get a 20 pics clip,then send it to VGG16 pretrained on ImagNet(change the in_channel from 3 to 20 and change the last classifier layer to 101)to do classification task. I met the same question as #1.
But when I use ResNet architechiture,everything goes well,does it correspoding to the preprocess on optical-flow pics?I dont normalize the optiflow pics.
my training acc1 and test acc1 is around 1% and dosnt go up.
I am running cifar10_cnn.py file using this command and getting following exception
THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python cifar10_cnn.py
Using TensorFlow backend.
Traceback (most recent call last):
File "cifar10_cnn.py", line 37, in
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
File "/home/taimoor/anaconda3/envs/my_env/lib/python3.6/site-packages/keras/datasets/cifar10.py", line 26, in load_data
data, labels = load_batch(fpath)
File "/home/taimoor/anaconda3/envs/my_env/lib/python3.6/site-packages/keras/datasets/cifar.py", line 18, in load_batch
f = open(fpath, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/taimoor/.keras/datasets/cifar-10-batches-py/data_batch_1'
file is present in following directory. i tried by giving absolute path but couldn't run . getting same error.
Tried to get help from google and tried many ways that found via searching issue description but... not resolved
I am Stuck..
Hi, I was wondering can the models be trained together with one single loss instead of trained separately?
I have seen other two-stream UCF codes but all of them seem to train separately and then find the loss by adding the output logits or something like that, but do you think its possible to train at once? has this been done?
and if it has, is there any recommended hyperparameters because it seems that they are different for each modality.
Thanks in advance!
Hello,
I noticed in the paper (https://arxiv.org/abs/1406.2199) that they finally used svm to fuse the two softmax scores computed by the two streams. Do you know how to use svm in this scenario?
Hi~For flow-vgg, you obtain 80%, the performance reported in the paper your reproduce is 85.7%. What is the back propagation detph of your experiments, may be an impact factor of your performance. And you didn't metion the setting details of VGG-16.
Hi, I just want to know that if batch_norm will influence the performance in rgb_vgg16, because it looks like that you didn't use batch_norm in rgb_vgg16.
def make_layers(cfg, batch_norm=False):
model = VGG(make_layers(cfg['D']), **kwargs)
I saw there is a rgb_vgg16_bn function but it has no available pretrained module, so how about its performance?
excuse me, i want to known the speed of your project, and the pytroch version. if you do not mind, and the other requirement
Hi @bryanyzhu ,
As title said, how is this possible using your pre-trained model?
Thank you.
Hi dear
how r u extracting frames from video?
I set the parameter as readme but obtain 81.76% for spatial stream on split 1 of UCF101 dataset using ResNet152. How to achieve better performance? Thanks.
Thank you for your coding! How to use SVM to fuse two stream features?I have searched a lot of data, but I still can't complete the feature fusion of SVM?Can you provide some code or some suggestions!thank you!
Unable to run source code.getting error. tried many ways to solve but couldn't.
i am using anaconda latest version. Tunning this command and getting error.
'conda install -c peterjc123 pytorch'
pytorch-0.2.1- 100% |###############################| Time: 0:12:11 727.92 kB/s
pytorch-0.2.1- 100% |###############################| Time: 0:04:07 2.15 MB/s
pytorch-0.2.1- 100% |###############################| Time: 0:01:14 7.19 MB/s
CondaError: CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/peterjc123/win-64/pytorch-0.2.1-py36he6bf560_0.2.1cu80.tar.bz2
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
CondaError: CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/peterjc123/win-64/pytorch-0.2.1-py36he6bf560_0.2.1cu80.tar.bz2
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
CondaError: CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/peterjc123/win-64/pytorch-0.2.1-py36he6bf560_0.2.1cu80.tar.bz2
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
.....
Also disable ssl verification in .condarc . and tried again but same issue occurs again n again. i am googling from last three days but stuck :(
Hello. I trained the resnet152 model on my own dataset. The accuracy of the train phase is quite good. The prec@1 of the train phase is 86%. But in the validation phase, the display accuracy is very low, and prec@1 and prec@3 The difference is relatively large, the prec@1 and prec@3 in the validation phase are 54% and 76% respectively. What is the reason for this? Prec@3 and prec@1 represent what they are, look forward to your reply, thank you!
(1)when i run train flow,it often meet out of memory in validate(epoch 23 or 47...),can you give me some advice?Thanks!!!!
(2)How to run it in multi-GPU?
:
Current learning rate is 0.001000:
Epoch: [23][120/383] Time 10.661 (12.373) Loss 1.7992 (1.8488) Prec@1 47.200 (51.433)
Epoch: [23][240/383] Time 10.836 (12.372) Loss 1.7938 (1.8701) Prec@1 50.400 (50.050)
Epoch: [23][360/383] Time 10.346 (12.612) Loss 1.9167 (1.8574) Prec@1 52.000 (50.722)
main_single_gpu.py:282: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
input_var = torch.autograd.Variable(input, volatile=True)
main_single_gpu.py:283: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
target_var = torch.autograd.Variable(target, volatile=True)
main_single_gpu.py:291: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
losses.update(loss.data[0], input.size(0))
main_single_gpu.py:292: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top1.update(prec1[0], input.size(0))
main_single_gpu.py:293: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top3.update(prec3[0], input.size(0))
Test: [0/152] Time 18.244 (18.244) Loss 1.9410 (1.9410) Prec@1 44.000 (44.000) Prec@3 76.000 (76.000)
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main_single_gpu.py", line 362, in
main()
File "main_single_gpu.py", line 192, in main
prec1 = validate(val_loader, model, criterion)
File "main_single_gpu.py", line 286, in validate
output = model(input_var)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/media/ml/G/panna/two-stream-pytorch-master/models/flow_resnet.py", line 154, in forward
x = self.layer3(x)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/media/ml/G/panna/two-stream-pytorch-master/models/flow_resnet.py", line 87, in forward
out = self.conv3(out)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/pn-pytorch/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
Hi Yi,
How did you decide the lr step ?
Did you follow somewhere else or experiment it youself ?
Thanks in advance!
Hello,i want to know this repository can be runned in a different environment like cuda 10 and python 3.7 or higher?
Thank you for your codes ! can you tell me what's version of pytorch and cuda? I run the code with pytorch==0.3.0 and cuda10.1,when i training about 50 epoch,error occurred:
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/torch/lib/THC/THCBlas.cu:246
I wanna use my own fall-detection datasets .thanks
I download the pre-trained model(ucf101_s1_rgb_resnet152.pth.tar) you give ,but I can't extract the file, Could you please give me some advices? Thanks!
I am facing this error which installing dense flow:
.
.
[ 57%] Building CXX object CMakeFiles/extract_gpu.dir/tools/extract_flow_gpu.cpp.o
[ 64%] Building CXX object CMakeFiles/extract_warp_gpu.dir/tools/extract_warp_flow_gpu.cpp.o
[ 71%] Building CXX object CMakeFiles/pydenseflow.dir/src/py_denseflow.cpp.o
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/extract_warp_gpu.dir/all' failed
make[1]: *** [CMakeFiles/extract_warp_gpu.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/extract_cpu.dir/all' failed
make[1]: *** [CMakeFiles/extract_cpu.dir/all] Error 2
CMakeFiles/Makefile2:178: recipe for target 'CMakeFiles/extract_gpu.dir/all' failed
make[1]: *** [CMakeFiles/extract_gpu.dir/all] Error 2
CMakeFiles/Makefile2:215: recipe for target 'CMakeFiles/pydenseflow.dir/all' failed
make[1]: *** [CMakeFiles/pydenseflow.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
How to solve this issue?
two-stream-pytorch/models/flow_resnet.py
Line 167 in 6b0b4ec
hello bryanyzhu:
I'm reading your code,but i can't understand the effect of this function. Could you please explain it briefly?
thanks
I have seen that in the test code, for the horizontal flipping of x component of optical flow, the image is reversed and the reversed image is deducted from 255. Why dont we just take the reversed image instead? This is the line 71 of video temporal prediction which is
255 - img_x[:, ::-1]
Thanks in advance
Hello, author, I have reproduced your code now, but I want to use it to achieve a function of reading in the video while classifying actions such as shaking hands and hugging. How can I achieve this
Hello. when I run ' main_single_gpu.py' with rgb frames, I met mistakes as follows:
Traceback (most recent call last):
File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/main_single_gpu.py", line 357, in <module>
main()
File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/main_single_gpu.py", line 182, in main
train(train_loader, model, criterion, optimizer, epoch)
File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/main_single_gpu.py", line 229, in train
output = model(input_var)
File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/zxf/Desktop/learn/reference/two-stream-pytorch/models/rgb_vgg16.py", line 30, in forward
x = self.features(x)
File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(*input, **kwargs)
File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 254, in forward
self.padding, self.dilation, self.groups)
File "/home/zxf/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 52, in conv2d
return f(input, weight, bias)
RuntimeError: Need input.size[1] == 3 but got 30 instead.
The input.size is (2L, 30L, 224L, 224L). Every videos' 10 frames are fed to VGG, then the net cannot deal with the input.
Hi bryanyzhu:
thanks for your opensource code!
the spatial and temporal stream were ahieved. I want to know ,wheter the fusion of these two steram is reached ??.I didn't find the relevant source code under the project.
Hi,
I want to download these two pre-trained models but I can't get the weight from google because in China there is a GWF.
Pre-trained RGB_ResNet152 Model Pre-trained Flow_ResNet152 Model
How can I get these weight?
My email is [email protected].
Thank you very much!
@bryanyzhu
when I run: python build_of.py --src_dir ./UCF-101 --out_dir ./ucf101_frames --df_path
I have an error :build_of.py: error: unrecognized arguments
build_of.py [-h] [--src_dir SRC_DIR] [--out_dir OUT_DIR]
[--df_path DF_PATH] [--new_width NEW_WIDTH]
[--new_height NEW_HEIGHT] [--num_worker NUM_WORKER]
[--num_gpu NUM_GPU] [--out_format {dir,zip}]
[--ext {avi,mp4}]
Can the model be directly input video for testing? How do I do that?
Hi,
May I ask what is the expected best accuracy I might find? So far I was able to achieve 73.64% as validation accuracy in RGB and I currently running the Flow experiments.
Thank you
Hello! It's very nice of you to modify main_single_gpu.py. I found you change the parameter new_length from 10 to 1. Does new_length represent the number of frames fed into the network? When I change new_length's value, I get an error RuntimeError Need input.size[1] == 3 but got xx instead
Hi yaunjun
Your work is so useful, thank you very much for your open source. I'm trying to run your code, and I want to know the meaning of the ssn_test.py result(variable rst). can you help me point them out in this code project?
thanks a lot
In what way the data is organized in the folders?
Hello! Thanks for your codes! I have learned a lot from it. But as a beginner, I am a little confused about the code below in VideoSpatialPrediction.py:
rgb_1 = rgb[:224, :224, :,:]
rgb_2 = rgb[:224, -224:, :,:]
rgb_3 = rgb[16:240, 60:284, :,:]
rgb_4 = rgb[-224:, :224, :,:]
rgb_5 = rgb[-224:, -224:, :,:]
rgb_f_1 = rgb_flip[:224, :224, :,:]
rgb_f_2 = rgb_flip[:224, -224:, :,:]
rgb_f_3 = rgb_flip[16:240, 60:284, :,:]
rgb_f_4 = rgb_flip[-224:, :224, :,:]
rgb_f_5 = rgb_flip[-224:, -224:, :,:]
rgb = np.concatenate((rgb_1,rgb_2,rgb_3,rgb_4,rgb_5,rgb_f_1,rgb_f_2,rgb_f_3,rgb_f_4,rgb_f_5), axis=3)
_, _, _, c = rgb.shape
Why do you code like this? I do not know why we should do that and what they mean.
Looking for your answering!
Hello,I double how can i train this code with several GPUs? Since i just tried to run the code "main_sigle_gpu.py",only one GPU began to work.How to solve this problem?
Thanks for sharing this repo, it is very helpful. Just a quick question about the pretrained temporal resnet152 model you have for download:
Thanks for the info!
hi, @bryanyzhu thanks for your nice share!
Wang[1] provide a method called Cross modality pre-training which may improve the flow model performance.
[1]. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
2017-09-21 04:19:56,066 - INFO - Building model ...
/home/ytan/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py:360: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
own_state[name].copy_(param)
2017-09-21 04:20:14,050 - INFO - Model flow_vgg16 is loaded.
2017-09-21 04:20:14,051 - INFO - Saving everything to directory ./checkpoints.
2017-09-21 04:20:14,100 - INFO - 13320 samples found, 9537 train samples and 3783 test samples.
2017-09-21 04:20:14,100 - INFO - 0.005
Could not load file datasets/JumpingJack/v_JumpingJack_g25_c04/flow_x_00029.jpg or datasets/JumpingJack/v_JumpingJack_g25_c04/flow_y_00029.jpg
Could not load file datasets/CliffDiving/v_CliffDiving_g17_c01/flow_x_00060.jpg or datasets/CliffDiving/v_CliffDiving_g17_c01/flow_y_00060.jpg
Could not load file datasets/RopeClimbing/v_RopeClimbing_g17_c05/flow_x_00074.jpg or datasets/RopeClimbing/v_RopeClimbing_g17_c05/flow_y_00074.jpg
Could not load file datasets/PlayingSitar/v_PlayingSitar_g24_c07/flow_x_00186.jpg or datasets/PlayingSitar/v_PlayingSitar_g24_c07/flow_y_00186.jpg
/////////////////////////////////////////////////////////////////////////
It just stops here and do not have any progress.
Hi, thank you very much for providing the code!
Could you tell me what is the accuracy of UCF101 and under what parameters?
Thanks again!
How do I test it realtime or any video as an input using VideoCapture?
If I want to obtain a new pre-trained Flow_VGG16 Model of other video data, what should I do?
Hi, @bryanyzhu :
I recently write a same repo about two-stream video classification using PyTorch, I realized 85.5% of spatial frame's accuracy on split1 of UCF101 last month. I want to cooperate with you, help you build a more useful framework. (implement some famous works, such as TSN, C3D and so on), so I want to communicate more with you if you are interested in my thought.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.