chaoyuaw / pytorch-coviar Goto Github PK

Compressed Video Action Recognition

Home Page: https://www.cs.utexas.edu/~cywu/projects/coviar/

License: GNU Lesser General Public License v2.1

Python 61.86% Shell 2.39% C 35.75%

pytorch-coviar's Introduction

Compressed Video Action Recognition

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl.
In CVPR, 2018. [Project Page]

Overview

This is a reimplementation of CoViAR in PyTorch (the original paper uses MXNet). This code currently supports UCF-101 and HMDB-51; Charades coming soon. (This is a work in progress. Any suggestions are appreciated.)

Results

This code produces comparable or better results than the original paper:
HMDB-51: 52% (I-frame), 40% (motion vector), 43% (residuals), 59.2% (CoViAR).
UCF-101: 87% (I-frame), 70% (motion vector), 80% (residuals), 90.5% (CoViAR).
(average of 3 splits; without optical flow. )

Data loader

We provide a python data loader that directly takes a compressed video and returns the compressed representation (I-frames, motion vectors, and residuals) as a numpy array . We can thus train the model without extracting and storing all representations as image files.

In our experiments, it's fast enough so that it doesn't delay GPU training. Please see GETTING_STARTED.md for details and instructions.

Using CoViAR

Please see GETTING_STARTED.md for instructions for training and inference.

Citation

If you find this model useful for your resesarch, please use the following BibTeX entry.

@inproceedings{wu2018coviar,
  title={Compressed Video Action Recognition},
  author={Wu, Chao-Yuan and Zaheer, Manzil and Hu, Hexiang and Manmatha, R and Smola, Alexander J and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2018}
}

Acknowledgment

This implementation largely borrows from tsn-pytorch by yjxiong. Part of the dataloader implementation is modified from this tutorial and FFmpeg extract_mv example.

pytorch-coviar's People

Contributors

Stargazers

Watchers

Forkers

zaie yangke13 dsp6414 yurkovanton manzilzaheer spxtrm jiarongqiu prpankajsingh ml-lab hzhang57 5kejun kekedan baiyancheng20 lolz0r zhengshou bityangke sunnyxiaohu manolo1988 gq124 wangshicr7 kaihemo aimeng100 ericwangyz jgyoung33 shubhampachori12110095 guocode nemonameless back2yes panna19951227 156aasdfg sufeidechabei tecmry vasusharma sadjadasghari salt-fly zhangfuhan klqulei jamiejackherer andrew-zhu joheny mxguo markov2016 hyzcn alex000kim etrigger zwcheng luojianp infinite-song cong-wu baek85 jiazewang asankagp spillai stjordanis chen8023 lollllcat michaelshiyu zyl19930813 threechen liyihan7 yuqihuo orm011 maruidear varun-karandikar30 sysuzyq felipecode videodnn emckwon daren996 wshenx caoliangjie nikhilperi q121q twilight-x sandra1985 iridescentz shaohuilin zyx1996 stevenaaaya linhduongtuan guoyangchen ndujar tbh8088 fengfan028 twilightzcx qsharpsword zxf864823150 ys-run azakhtyamov faustpy 1085437068 winniewu1998 dawnchou zherlock030 simonsxin edmontdants jinwooklim wenhuach coallaoh qianqianowo

pytorch-coviar's Issues

multiple GPUS

thanks for providing your code .when training,gpu id was set to 0 and 1.but it was occupied in my server.so i change ids to 1and 2 ,one error occured :all tensors must be on devices[0]. then i added os.environ['CUDA_VISIBLE_DEVICES'] = "3,6" ,it didn't work How can i solve this problem?
thanks for your reading

How to obtain the residual information?

According to the code that the residual information is obtained by subtracting the two frames, can it be directly extracted from the compressed video?

What's GOP_SIZE=12?

Thank for your nice work, but what the GOP_SIZE=12 means for in line 20 of dataset.py?

ModuleNotFoundError: No module named 'coviar'

I execute sudo ./install.sh and get the following result:

Did I succeed? Then when I try the command for training, I got the following wrong error:

Did this mean the "./install.sh" didn't work successfully? I am a beginner, thank you very much for your guidance.

About counting frames

I find the value of function get_num_frames is 1 less than actual video length. I use 100 frames to create a video but only get 99 decoded frames. I find the missing frame is the last one. Can you help me out? Thanks very much!

codecs

how long will you show other codecs ? Thanks

How to visualize the motion vector information

Assertion Error

@manzilzaheer @chaoyuaw

hi,

Kindly please check this error and tell the possible reason and solution for it. I have run this on Server with 4 GPUs and attach is its screenshot.

numpy issue during ./install.sh

thank you for your lovely paper. was trying to get your code to work. We are using amazon AMI and got FFmpeg compiled and using python3 an pytorch. when trying to run ./install.sh with the ffmpeg path we get the following GCC error. Any idea of how to resolve this? it says Numpy decprecated API. What version of Numpy was used?

we also followed steps in other issues - #6 and #5 but the error still persists.

something wrong with ./install.sh

hello! when I execute install.sh，I meet some problems.
codes：
static struct PyModuleDef coviarmodule = { PyModuleDef_HEAD_INIT, "coviar", /* name of module */ NULL, /* module documentation, may be NULL */ -1, /* size of per-interpreter state of the module, or -1 if the module keeps state in global variables. */ CoviarMethods };
and the erroes are:
error: variable 'coviarmodule' has initializer but incomplete type static struct PyModuleDef coviarmodule = { ^ error: variable 'PyModuleDef_HEAD_INIT' undeclared here (not in a function) PyModuleDef_HEAD_INIT, ^
How to solve this problem?
Thanks!

Data_loader

Hi
I am facing this error while doing installation steps in data_loader folder

Kindly please guide me.

How to Modify setup.py to use your FFmpeg path (${FFMPEG_INSTALL_PATH})?

from coviar import load

Hi , in the instructions, I'm in this part :"from coviar import load"
Now I'm struggling where to put this command? In my own program?
and what is "coviar"? there is no such file, what am I missing?

Sorry for my ignorance, and thanks for your help!

Accumulated Representation.

Testing

training error

Hi, I met with several problems in the training process.

ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
[jpegls @ 0x562d3c3db340] Found EOI before any SOF, ignoring
[jpegls @ 0x562d3c3db340] unable to decode APP fields: Invalid data found when processing input
[jpegls @ 0x562d3c3db340] unable to decode APP fields: Invalid data found when processing input
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
[jpegls @ 0x562d3c3db340] mjpeg: unsupported coding type (c5)
[jpegls @ 0x562d3c3db340] dqt: invalid precision
Decode Error.
Decoding video failed.
Error: loading video data/hmdb51/mpeg4_videos/stand/Man_Who_Cheated_Himself_512kb_stand_u_cm_np1_fr_med_11.mp4 failed.

Question about pre-processing mv and res

Hi man, thanks for ur good work, but i have some questions about mv and res normalization, could u pls explain in more detail about the code below?

      img = clip_and_scale(img, 20)                            Why u use size=20?
      img += 128
      img = (np.minimum(np.maximum(img, 0), 255)).astype(np.uint8)
elif self._representation == 'residual':
      img += 128
      img = (np.minimum(np.maximum(img, 0), 255)).astype(np.uint8)

if self._representation == 'iframe':
     input = (input - self._input_mean) / self._input_std
elif self._representation == 'residual':
    input = (input - 0.5) / self._input_std           why 0.5?
elif self._representation == 'mv':
     input = (input - 0.5)

cuda out of memory

Hi
When I set batch size as 40 for hmdb51 or 80 for ucf101 in terms of iframe, the training would stop showing "cuda: out of memory". I have to reduce batch size to 20 or 30, but the training process is very slow, and it needs 5 or more days for training ucf101. now I use 4 titan xp to train the model, have you seen this same bug or whether could I change some configurations?

Thank you!

I want to ask the usage of crops and segments in test.py

Thanks for your contribution!
I have a question about test.py
parser.add_argument('--test_segments', type=int, default=25)
parser.add_argument('--test-crops', type=int, default=10)
what is the usage of crops and segments?
Thanks!

With Multiple GPUs

Decoding a video frame, given the previous frame, current motion vectors, and current residual image.

Assume we have pos_target=t, a reference frame at pos_target=t-1, and the motion vectors and residual image for the given pos_target=t. However, assume we don't have the original video file.

Given these constraints, I would like to reconstruct the frame at pos_target=t, as described in Equation 1 of your paper.

So far, I've tried decoding the frame at pos_target=t by: (1) creating a reference frame, which is just a copy of the t-1 frame; (2) performing motion compensation by copying 16x16 pixel blocks from the t-1 frame to the reference frame, based on the motion vectors; (3) adding the residual image to the motion-compensated reference frame.

This is the reference frame at pos_target=2:

This is the result after step (1), for pos_target=3:

This is the result after step (2), for pos_target=3:

The final result seems to have some compression artifacts, so I guess I'm not reconstructing the frame correctly. Is there a better way to do this (particularly, using ffmpeg)? Thanks!

train.py

i follow the GETTING_STARTED.md, but when run train.py :
Augmentation scales: [1, 0.875, 0.75, 0.66]
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream

Please Help : ImportError: libavutil.so.56: cannot open shared object file: No such file or directory

I after all the steps (except make clean) I got this error:
"ImportError: libavutil.so.56: cannot open shared object file: No such file or directory"
when trying to do "import coviar" or "from coviar import load"..
Any Idea how do I fill this gap?!

Any help for building coviar_data_loader on windows?

I've tried to build coviar using mingw64

D:\bin\MinGW64\bin\gcc.exe -shared -s build\temp.win-amd64-3.6\Release\coviar_data_loader.o build\temp.win-amd64-3.6\Release\coviar.cp36-win_amd64.def -LD:\bin\python\libs -LD:\bin\python\PCbuild\amd64 -lpython36 -lmsvcr140 -o build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -lavutil -lavcodec -lavformat -lswscale -L./ffmpeg/lib/
running install
running build
running build_ext
running install_lib
copying build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -> C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages
running install_egg_info
Writing C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages\coviar-0.1-py3.6.egg-info

Successfully built.

But when I import coviar in python, it fails.

$ ipython
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import coviar
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-e416fb4448c7> in <module>
----> 1 import coviar

ImportError: DLL load failed: 找不到指定的模块。

Thanks for any ideas.

Transfer learning using Coviar

was wondering what your thoughts was on transfer learning in videos by coviar. I know some experiments were done by Karpathy et. al and few other works tried a few experiments (B. Zhang L et. al). Was any transfer learning experiments done with the coviar?

error: command 'aarch64-linux-gnu-gcc' failed with exit status 1

Hi, there are some problems when I tried sudo bash install.sh in data_loader:

coviar_data_loader.c:586:15: error: variable ‘coviarmodule’ has initializer but incomplete e type
static struct PyModuleDef coviarmodule = {

coviar_data_loader.c:587:5: error: ‘PyModuleDef_HEAD_INIT’ undeclared here (not in a function)
PyModuleDef_HEAD_INIT,

error: command 'aarch64-linux-gnu-gcc' failed with exit status 1

And I have tried the method mentioned in #33
But it still doesn't work

Do you have any suggestions?
Thanks! 😊

videos format

I have videos in MP4 format, why do I need to transcode? Thank you.

Decoupled Model

Good Day Respected @chaoyuaw,
Where are you calculating Decoupled Model in your code for breaking the dependency of Pframes?

How to combine the score of CoViAR with the score of TSN?

After obtained the score of CoViAR, how to combine the score with TSN by late fusion? Is there any file I can use to complete this process, thank you very much.

Encounter the problem "IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number", when I tried to train on HMDB51.

To be more specific, when I tried to use the following commands:

I encountered the problem as shown in the following pic:

I have not find the appropriate solution for this problem, can you help me? Thank you very much.

Optical flow

Hi, thanks for your suggestions, I already can reproduce your code, and I have a problem with the fusion of optical flow.

I guess that your fusion to optical flow is that after training and testing optical flow BN-Inception network, you would take the softmax scores using optical flow, and gives it a weight like "wm, wi ”， then integrate with a compressed part, is that true?

Thank you again!

./install.sh

Hi Sir,

Kindly check this error? how can I resolve this?

/usr/bin/ld can't found -lavutil -lavcodec ...

I follow the install guide.
as i run
./install.sh
the following problem happen:

/usr/bin/ld can't found -lavutil -lavcodec ...

i modify line6 in setup.py './ffmpeg/include/' , like 'opencv_install_path/include/'

looking forward to your reply

t-sne

Hi @chaoyuaw

how to use tsne for visualizing videos?

Cannot compile coviar module

When I try to compile the coviar moudle I've got the following errors: /usr/bin/ld: cannot find -lavformat
/usr/bin/ld: cannot find -lswscale

Visualise the MV

Hi, in the paper you said you visualised the MV using HSV Color space, I couldn't find the code, is he attached?

Thanks.

environment

Hey, I am ready for the environment for a long time. But for the installation of ffmpeg, I have a new problem after solving a problem. I am eager to reproduce your project. Can you provide a dockers image?

pytorch version

which version of Pytorch is used in this project?

Help me for this error

Make clean problem

I followed the instruction, got to this command:
make clean

I understood that this is something general, but never used it.
so I just entered "make clean" if the FFmpeg directory and got:

Makefile:2: ffbuild/config.mak: No such file or directory
Makefile:40: /tools/Makefile: No such file or directory
Makefile:41: /ffbuild/common.mak: No such file or directory
Makefile:90: /libavutil/Makefile: No such file or directory
Makefile:90: /ffbuild/library.mak: No such file or directory
Makefile:92: /fftools/Makefile: No such file or directory
Makefile:93: /doc/Makefile: No such file or directory
Makefile:94: /doc/examples/Makefile: No such file or directory
Makefile:160: /tests/Makefile: No such file or directory
make: *** No rule to make target '/tests/Makefile'. Stop.

Am I suppose to do something else?

using Python 2.7.12

Thanks,
Barak.

raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))

I met a problem in transform.py,could you please give me some advices. Thanks!

Traceback (most recent call last):
File "train.py", line 275, in
main()
File "train.py", line 104, in main
train(train_loader, model, criterion, optimizer, epoch, cur_lr)
File "train.py", line 134, in train
for i, (input, target) in enumerate(train_loader):
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 322, in next
return self._process_next_batch(batch)
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data/code/project/pytorch-coviar/dataset.py", line 160, in getitem
frames = self._transform(frames)
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 49, in call
img = t(img)
File "/data/code/project/pytorch-coviar/transforms.py", line 124, in call
crop_w, crop_h, offset_w, offset_h = self._sample_crop_size(im_size)
File "/data/code/project/pytorch-coviar/transforms.py", line 153, in _sample_crop_size
w_offset = random.randint(0, image_w - crop_pair[0])
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 221, in randint
return self.randrange(a, b+1)
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 199, in randrange
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0,-1, -1)

terminate called after throwing an instance of 'at::Error'
what(): CUDA error (29): driver shutting down (check_status at /pytorch/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: at::detail::CUDAStream_free(CUDAStreamInternals*&) + 0x50 (0x7fe59246aa50 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: THCStream_free + 0x13 (0x7fe56f4d0953 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: std::_Rb_tree<std::shared_ptr, std::shared_ptr, std::_Identity<std::shared_ptr >, std::less<std::shared_ptr >, std::allocator<std::shared_ptr > >::_M_erase(std::_Rb_tree_node<std::shared_ptr >*) + 0x8e (0x7fe56f4c1fbe in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: + 0xd1ca71 (0x7fe56f4c5a71 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: + 0xd1caa0 (0x7fe56f4c5aa0 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #5: + 0x38e69 (0x7fe5b237ae69 in /lib64/libc.so.6)
frame #6: + 0x38eb5 (0x7fe5b237aeb5 in /lib64/libc.so.6)
frame #7: __libc_start_main + 0xfc (0x7fe5b2363b1c in /lib64/libc.so.6)

Why we need --disable-yasm option to configure FFmpeg ?

Dear developer,

I fail to see why we need to add the --disable-yasm option to configure FFmpeg. It will lose a lot of performance, won't it?

Training Error

Hi,

I tried your code to reproduce the coviar result in your paper.

I followed all your instructions in GET_START.md
But I met this error:
v_frame_idx = random.randint(seg_begin,seg_end - 1) ... ValueError: empty range for randrange() (1,0,-1)

It seems there is something wrong with the video processing.

Can you give me some hints?

What does flush decoder mean?

pytorch-coviar/data_loader/coviar_data_loader.c

Lines 366 to 387 in 4f0857a

 //Flush Decoder  

 packet.data = NULL; 

 packet.size = 0; 

 while(1){ 

 ret = avcodec_decode_video2(pCodecCtx, pFrame, &got_picture, &packet); 

 if (ret < 0) { 

 printf("Decode Error.\n"); 

 return -1; 

 } 

 if (!got_picture) { 

 break; 

 } else if (cur_gop == gop_target) { 

 if ((cur_pos == 0 && accumulate) || 

 (cur_pos == pos_target - 1 && !accumulate) || 

 cur_pos == pos_target) { 

 create_and_load_bgr( 

 pFrame, pFrameBGR, buffer, bgr_arr, cur_pos, pos_target); 

 } 

 } 

 } 

 fclose(fp_in);

Hi, I wanna know what does flush decoder mean? Thanks!

Decoding video failed.

Error: loading video data/hmdb51/mpeg4_videos/turn/SoundAndTheStory_turn_u_nm_np1_ba_med_1.mp4 failed.
Could not allocate video parser context
Decoding video failed.

Some questions about coviar data loader

Hi,
I have some questions when reading the coviar_data_loader.c
Firstly, you init the variable accu_src_old as follows but i whether why:

                    for (size_t x = 0; x < w; ++x) {
                        for (size_t y = 0; y < h; ++y) {
                            accu_src_old[x * h * 2 + y * 2    ]  = x;
                            accu_src_old[x * h * 2 + y * 2 + 1]  = y;
                        }
                    }

Secondly, is the following codes means that every frame in the target gop before target frame will be decoded, and only the I-frame and the target frame will be transit to bgr format?

            if (cur_gop == gop_target && cur_pos <= pos_target) {
                ret = avcodec_decode_video2(pCodecCtx, pFrame, &got_picture, &packet);  
......
                if (got_picture) {

                    if ((cur_pos == 0              && accumulate  && representation == RESIDUAL) ||
                        (cur_pos == pos_target - 1 && !accumulate && representation == RESIDUAL) ||
                        cur_pos == pos_target) {
                        create_and_load_bgr(
                            pFrame, pFrameBGR, buffer, bgr_arr, cur_pos, pos_target);
                    }

Thirdly, in dataset.py, I whether why you process the img like follows:

def clip_and_scale(img, size):
    return (img * (127.5 / size)).astype(np.int32)

Thanks for your excellent work and code. Looking forward to your reply : )

Motion Vectors and Residuals for Salience Detection

Respected @chaoyuaw and @manzilzaheer

I am thinking to use motion vectors (extracted from compressed videos) to find the attention point of salience detection for Human Action Recognition. What is your opinion about this?

Problem with the make clean phase

I have a problem with "make clean" command, I'm getting those errors:

Using python 3.6.3

Testing Memory leakage ?

Hi ,
Has anyone used the testing script to test on a large number of videos? dataloader is leaking memory, i guess.
checking open file descriptors while using the testing script shows me the files aren't being closed.
'ls -1 /proc//fd | wc -l' gives open file descriptors for a process.
This is causing dataloader to leak memory and abort once it reaches ulimit.

Could not allocate video parser context

Hi！
I am facing this error when training,Could not allocate video parser context , Decoding video failed.
Kindly please guide me.

Can someone share the model's weights data trained on UCF-101?

Rencently, I have a course about video action recognition and the deadline is coming, this coviar method is selected as a comparing method, but the training time is too long due to our time and devices. So, can someone share the three models' weights data trained on UCF-101? Thanks very much !

	//Flush Decoder
	packet.data = NULL;
	packet.size = 0;
	while(1){
	ret = avcodec_decode_video2(pCodecCtx, pFrame, &got_picture, &packet);
	if (ret < 0) {
	printf("Decode Error.\n");
	return -1;
	}
	if (!got_picture) {
	break;
	} else if (cur_gop == gop_target) {
	if ((cur_pos == 0 && accumulate) \|\|
	(cur_pos == pos_target - 1 && !accumulate) \|\|
	cur_pos == pos_target) {
	create_and_load_bgr(
	pFrame, pFrameBGR, buffer, bgr_arr, cur_pos, pos_target);
	}
	}
	}

	fclose(fp_in);

chaoyuaw / pytorch-coviar Goto Github PK

pytorch-coviar's Introduction

Compressed Video Action Recognition

Overview

Results

Data loader

Using CoViAR

Citation

Acknowledgment

pytorch-coviar's People

Contributors

Stargazers

Watchers

Forkers

pytorch-coviar's Issues

Recommend Projects

Recommend Topics

Recommend Org