Giter VIP home page Giter VIP logo

pytorch-coviar's Introduction

Compressed Video Action Recognition

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl.
In CVPR, 2018. [Project Page]

Overview

This is a reimplementation of CoViAR in PyTorch (the original paper uses MXNet). This code currently supports UCF-101 and HMDB-51; Charades coming soon. (This is a work in progress. Any suggestions are appreciated.)

Results

This code produces comparable or better results than the original paper:
HMDB-51: 52% (I-frame), 40% (motion vector), 43% (residuals), 59.2% (CoViAR).
UCF-101: 87% (I-frame), 70% (motion vector), 80% (residuals), 90.5% (CoViAR).
(average of 3 splits; without optical flow. )

Data loader

We provide a python data loader that directly takes a compressed video and returns the compressed representation (I-frames, motion vectors, and residuals) as a numpy array . We can thus train the model without extracting and storing all representations as image files.

In our experiments, it's fast enough so that it doesn't delay GPU training. Please see GETTING_STARTED.md for details and instructions.

Using CoViAR

Please see GETTING_STARTED.md for instructions for training and inference.

Citation

If you find this model useful for your resesarch, please use the following BibTeX entry.

@inproceedings{wu2018coviar,
  title={Compressed Video Action Recognition},
  author={Wu, Chao-Yuan and Zaheer, Manzil and Hu, Hexiang and Manmatha, R and Smola, Alexander J and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={CVPR},
  year={2018}
}

Acknowledgment

This implementation largely borrows from tsn-pytorch by yjxiong. Part of the dataloader implementation is modified from this tutorial and FFmpeg extract_mv example.

pytorch-coviar's People

Contributors

chaoyuaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-coviar's Issues

multiple GPUS

thanks for providing your code .when training,gpu id was set to 0 and 1.but it was occupied in my server.so i change ids to 1and 2 ,one error occured :all tensors must be on devices[0]. then i added os.environ['CUDA_VISIBLE_DEVICES'] = "3,6" ,it didn't work How can i solve this problem?
thanks for your reading
default

What's GOP_SIZE=12?

Thank for your nice work, but what the GOP_SIZE=12 means for in line 20 of dataset.py?

ModuleNotFoundError: No module named 'coviar'

I execute sudo ./install.sh and get the following result:
image
Did I succeed? Then when I try the command for training, I got the following wrong error:
image
Did this mean the "./install.sh" didn't work successfully? I am a beginner, thank you very much for your guidance.

About counting frames

I find the value of function get_num_frames is 1 less than actual video length. I use 100 frames to create a video but only get 99 decoded frames. I find the missing frame is the last one. Can you help me out? Thanks very much!

codecs

how long will you show other codecs ? Thanks

Assertion Error

@manzilzaheer @chaoyuaw

hi,

Kindly please check this error and tell the possible reason and solution for it. I have run this on Server with 4 GPUs and attach is its screenshot.
3rd august

numpy issue during ./install.sh

thank you for your lovely paper. was trying to get your code to work. We are using amazon AMI and got FFmpeg compiled and using python3 an pytorch. when trying to run ./install.sh with the ffmpeg path we get the following GCC error. Any idea of how to resolve this? it says Numpy decprecated API. What version of Numpy was used?

screen shot 2018-07-27 at 12 44 52 am

we also followed steps in other issues - #6 and #5 but the error still persists.

something wrong with ./install.sh

hello! when I execute install.sh,I meet some problems.
codes:
static struct PyModuleDef coviarmodule = { PyModuleDef_HEAD_INIT, "coviar", /* name of module */ NULL, /* module documentation, may be NULL */ -1, /* size of per-interpreter state of the module, or -1 if the module keeps state in global variables. */ CoviarMethods };
and the erroes are:
error: variable 'coviarmodule' has initializer but incomplete type static struct PyModuleDef coviarmodule = { ^ error: variable 'PyModuleDef_HEAD_INIT' undeclared here (not in a function) PyModuleDef_HEAD_INIT, ^
How to solve this problem?
Thanks!

Data_loader

Hi
I am facing this error while doing installation steps in data_loader folder

error

Kindly please guide me.

How to Modify setup.py to use your FFmpeg path (${FFMPEG_INSTALL_PATH})?

from coviar import load

Hi , in the instructions, I'm in this part :"from coviar import load"
Now I'm struggling where to put this command? In my own program?
and what is "coviar"? there is no such file, what am I missing?

Sorry for my ignorance, and thanks for your help!

training error

Hi, I met with several problems in the training process.

ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
[jpegls @ 0x562d3c3db340] Found EOI before any SOF, ignoring
[jpegls @ 0x562d3c3db340] unable to decode APP fields: Invalid data found when processing input
[jpegls @ 0x562d3c3db340] unable to decode APP fields: Invalid data found when processing input
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
[jpegls @ 0x562d3c3db340] mjpeg: unsupported coding type (c5)
[jpegls @ 0x562d3c3db340] dqt: invalid precision
Decode Error.
Decoding video failed.
Error: loading video data/hmdb51/mpeg4_videos/stand/Man_Who_Cheated_Himself_512kb_stand_u_cm_np1_fr_med_11.mp4 failed.

Question about pre-processing mv and res

Hi man, thanks for ur good work, but i have some questions about mv and res normalization, could u pls explain in more detail about the code below?

      img = clip_and_scale(img, 20)                            Why u use size=20?
      img += 128
      img = (np.minimum(np.maximum(img, 0), 255)).astype(np.uint8)
elif self._representation == 'residual':
      img += 128
      img = (np.minimum(np.maximum(img, 0), 255)).astype(np.uint8)       
if self._representation == 'iframe':
     input = (input - self._input_mean) / self._input_std
elif self._representation == 'residual':
    input = (input - 0.5) / self._input_std           why 0.5?
elif self._representation == 'mv':
     input = (input - 0.5)

cuda out of memory

Hi
When I set batch size as 40 for hmdb51 or 80 for ucf101 in terms of iframe, the training would stop showing "cuda: out of memory". I have to reduce batch size to 20 or 30, but the training process is very slow, and it needs 5 or more days for training ucf101. now I use 4 titan xp to train the model, have you seen this same bug or whether could I change some configurations?

Thank you!

I want to ask the usage of crops and segments in test.py

Thanks for your contribution!
I have a question about test.py
parser.add_argument('--test_segments', type=int, default=25)
parser.add_argument('--test-crops', type=int, default=10)
what is the usage of crops and segments?
Thanks!

Decoding a video frame, given the previous frame, current motion vectors, and current residual image.

Assume we have pos_target=t, a reference frame at pos_target=t-1, and the motion vectors and residual image for the given pos_target=t. However, assume we don't have the original video file.

Given these constraints, I would like to reconstruct the frame at pos_target=t, as described in Equation 1 of your paper.

So far, I've tried decoding the frame at pos_target=t by: (1) creating a reference frame, which is just a copy of the t-1 frame; (2) performing motion compensation by copying 16x16 pixel blocks from the t-1 frame to the reference frame, based on the motion vectors; (3) adding the residual image to the motion-compensated reference frame.

This is the reference frame at pos_target=2:
image

This is the result after step (1), for pos_target=3:
image

This is the result after step (2), for pos_target=3:
image

The final result seems to have some compression artifacts, so I guess I'm not reconstructing the frame correctly. Is there a better way to do this (particularly, using ffmpeg)? Thanks!

train.py

i follow the GETTING_STARTED.md, but when run train.py :
Augmentation scales: [1, 0.875, 0.75, 0.66]
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream
Could not open input stream

Any help for building coviar_data_loader on windows?

I've tried to build coviar using mingw64

D:\bin\MinGW64\bin\gcc.exe -shared -s build\temp.win-amd64-3.6\Release\coviar_data_loader.o build\temp.win-amd64-3.6\Release\coviar.cp36-win_amd64.def -LD:\bin\python\libs -LD:\bin\python\PCbuild\amd64 -lpython36 -lmsvcr140 -o build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -lavutil -lavcodec -lavformat -lswscale -L./ffmpeg/lib/
running install
running build
running build_ext
running install_lib
copying build\lib.win-amd64-3.6\coviar.cp36-win_amd64.pyd -> C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages
running install_egg_info
Writing C:\Users\Jim\AppData\Roaming\Python\Python36\site-packages\coviar-0.1-py3.6.egg-info

Successfully built.

But when I import coviar in python, it fails.

$ ipython
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import coviar
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-e416fb4448c7> in <module>
----> 1 import coviar

ImportError: DLL load failed: 找不到指定的模块。

Thanks for any ideas.

Transfer learning using Coviar

was wondering what your thoughts was on transfer learning in videos by coviar. I know some experiments were done by Karpathy et. al and few other works tried a few experiments (B. Zhang L et. al). Was any transfer learning experiments done with the coviar?

error: command 'aarch64-linux-gnu-gcc' failed with exit status 1


Hi, there are some problems when I tried sudo bash install.sh in data_loader:

coviar_data_loader.c:586:15: error: variable ‘coviarmodule’ has initializer but incomplete e type
static struct PyModuleDef coviarmodule = {

coviar_data_loader.c:587:5: error: ‘PyModuleDef_HEAD_INIT’ undeclared here (not in a function)
PyModuleDef_HEAD_INIT,

error: command 'aarch64-linux-gnu-gcc' failed with exit status 1

And I have tried the method mentioned in #33
But it still doesn't work

Do you have any suggestions?
Thanks! 😊

videos format

I have videos in MP4 format, why do I need to transcode? Thank you.

Decoupled Model

Good Day Respected @chaoyuaw,
Where are you calculating Decoupled Model in your code for breaking the dependency of Pframes?

Optical flow

Hi, thanks for your suggestions, I already can reproduce your code, and I have a problem with the fusion of optical flow.

I guess that your fusion to optical flow is that after training and testing optical flow BN-Inception network, you would take the softmax scores using optical flow, and gives it a weight like "wm, wi ”, then integrate with a compressed part, is that true?

Thank you again!

./install.sh

Hi Sir,

Kindly check this error? how can I resolve this?

error

/usr/bin/ld can't found -lavutil -lavcodec ...

I follow the install guide.
as i run
./install.sh
the following problem happen:

/usr/bin/ld can't found -lavutil -lavcodec ...

i modify line6 in setup.py './ffmpeg/include/' , like 'opencv_install_path/include/'

looking forward to your reply

Cannot compile coviar module

When I try to compile the coviar moudle I've got the following errors: /usr/bin/ld: cannot find -lavformat
/usr/bin/ld: cannot find -lswscale

Visualise the MV

Hi, in the paper you said you visualised the MV using HSV Color space, I couldn't find the code, is he attached?

Thanks.

environment

Hey, I am ready for the environment for a long time. But for the installation of ffmpeg, I have a new problem after solving a problem. I am eager to reproduce your project. Can you provide a dockers image?

Make clean problem

I followed the instruction, got to this command:
make clean

I understood that this is something general, but never used it.
so I just entered "make clean" if the FFmpeg directory and got:

Makefile:2: ffbuild/config.mak: No such file or directory
Makefile:40: /tools/Makefile: No such file or directory
Makefile:41: /ffbuild/common.mak: No such file or directory
Makefile:90: /libavutil/Makefile: No such file or directory
Makefile:90: /ffbuild/library.mak: No such file or directory
Makefile:92: /fftools/Makefile: No such file or directory
Makefile:93: /doc/Makefile: No such file or directory
Makefile:94: /doc/examples/Makefile: No such file or directory
Makefile:160: /tests/Makefile: No such file or directory
make: *** No rule to make target '/tests/Makefile'. Stop.

Am I suppose to do something else?

using Python 2.7.12

Thanks,
Barak.

raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))

I met a problem in transform.py,could you please give me some advices. Thanks!

Traceback (most recent call last):
File "train.py", line 275, in
main()
File "train.py", line 104, in main
train(train_loader, model, criterion, optimizer, epoch, cur_lr)
File "train.py", line 134, in train
for i, (input, target) in enumerate(train_loader):
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 322, in next
return self._process_next_batch(batch)
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data/code/project/pytorch-coviar/dataset.py", line 160, in getitem
frames = self._transform(frames)
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 49, in call
img = t(img)
File "/data/code/project/pytorch-coviar/transforms.py", line 124, in call
crop_w, crop_h, offset_w, offset_h = self._sample_crop_size(im_size)
File "/data/code/project/pytorch-coviar/transforms.py", line 153, in _sample_crop_size
w_offset = random.randint(0, image_w - crop_pair[0])
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 221, in randint
return self.randrange(a, b+1)
File "/root/anaconda3/envs/caffe-tf/lib/python3.6/random.py", line 199, in randrange
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0,-1, -1)

terminate called after throwing an instance of 'at::Error'
what(): CUDA error (29): driver shutting down (check_status at /pytorch/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: at::detail::CUDAStream_free(CUDAStreamInternals*&) + 0x50 (0x7fe59246aa50 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: THCStream_free + 0x13 (0x7fe56f4d0953 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: std::_Rb_tree<std::shared_ptr, std::shared_ptr, std::_Identity<std::shared_ptr >, std::less<std::shared_ptr >, std::allocator<std::shared_ptr > >::_M_erase(std::_Rb_tree_node<std::shared_ptr >*) + 0x8e (0x7fe56f4c1fbe in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: + 0xd1ca71 (0x7fe56f4c5a71 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: + 0xd1caa0 (0x7fe56f4c5aa0 in /root/anaconda3/envs/caffe-tf/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #5: + 0x38e69 (0x7fe5b237ae69 in /lib64/libc.so.6)
frame #6: + 0x38eb5 (0x7fe5b237aeb5 in /lib64/libc.so.6)
frame #7: __libc_start_main + 0xfc (0x7fe5b2363b1c in /lib64/libc.so.6)

Training Error

Hi,

I tried your code to reproduce the coviar result in your paper.

I followed all your instructions in GET_START.md
But I met this error:
v_frame_idx = random.randint(seg_begin,seg_end - 1) ... ValueError: empty range for randrange() (1,0,-1)

It seems there is something wrong with the video processing.

Can you give me some hints?

What does flush decoder mean?

//Flush Decoder
packet.data = NULL;
packet.size = 0;
while(1){
ret = avcodec_decode_video2(pCodecCtx, pFrame, &got_picture, &packet);
if (ret < 0) {
printf("Decode Error.\n");
return -1;
}
if (!got_picture) {
break;
} else if (cur_gop == gop_target) {
if ((cur_pos == 0 && accumulate) ||
(cur_pos == pos_target - 1 && !accumulate) ||
cur_pos == pos_target) {
create_and_load_bgr(
pFrame, pFrameBGR, buffer, bgr_arr, cur_pos, pos_target);
}
}
}
fclose(fp_in);

Hi, I wanna know what does flush decoder mean? Thanks!

Decoding video failed.

Error: loading video data/hmdb51/mpeg4_videos/turn/SoundAndTheStory_turn_u_nm_np1_ba_med_1.mp4 failed.
Could not allocate video parser context
Decoding video failed.

Some questions about coviar data loader

Hi,
I have some questions when reading the coviar_data_loader.c
Firstly, you init the variable accu_src_old as follows but i whether why:

                    for (size_t x = 0; x < w; ++x) {
                        for (size_t y = 0; y < h; ++y) {
                            accu_src_old[x * h * 2 + y * 2    ]  = x;
                            accu_src_old[x * h * 2 + y * 2 + 1]  = y;
                        }
                    }

Secondly, is the following codes means that every frame in the target gop before target frame will be decoded, and only the I-frame and the target frame will be transit to bgr format?

            if (cur_gop == gop_target && cur_pos <= pos_target) {
                ret = avcodec_decode_video2(pCodecCtx, pFrame, &got_picture, &packet);  
......
                if (got_picture) {

                    if ((cur_pos == 0              && accumulate  && representation == RESIDUAL) ||
                        (cur_pos == pos_target - 1 && !accumulate && representation == RESIDUAL) ||
                        cur_pos == pos_target) {
                        create_and_load_bgr(
                            pFrame, pFrameBGR, buffer, bgr_arr, cur_pos, pos_target);
                    }

Thirdly, in dataset.py, I whether why you process the img like follows:

def clip_and_scale(img, size):
    return (img * (127.5 / size)).astype(np.int32)
Thanks for your excellent work and code. Looking forward to your reply : )

Testing Memory leakage ?

Hi ,
Has anyone used the testing script to test on a large number of videos? dataloader is leaking memory, i guess.
checking open file descriptors while using the testing script shows me the files aren't being closed.
'ls -1 /proc//fd | wc -l' gives open file descriptors for a process.
This is causing dataloader to leak memory and abort once it reaches ulimit.

Can someone share the model's weights data trained on UCF-101?

Rencently, I have a course about video action recognition and the deadline is coming, this coviar method is selected as a comparing method, but the training time is too long due to our time and devices. So, can someone share the three models' weights data trained on UCF-101? Thanks very much !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.