facebookresearch / slowfast Goto Github PK

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

License: Apache License 2.0

Shell 0.03% Python 99.97%

slowfast's Introduction

PySlowFast

PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficient training. This repository includes implementations of the following methods:

Introduction

The goal of PySlowFast is to provide a high-performance, light-weight pytorch codebase provides state-of-the-art video backbones for video understanding research on different tasks (classification, detection, and etc). It is designed in order to support rapid implementation and evaluation of novel video research ideas. PySlowFast includes implementations of the following backbone network architectures:

SlowFast
Slow
C2D
I3D
Non-local Network
X3D
MViTv1 and MViTv2
Rev-ViT and Rev-MViT

Updates

We now Reversible Vision Transformers. Both Reversible ViT and MViT models released. See projects/rev.
We now support MAE for Video. See projects/mae for more information.
We now support MaskFeat. See projects/maskfeat for more information.
We now support MViTv2 in PySlowFast. See projects/mvitv2 for more information.
We now support A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning. See projects/contrastive_ssl for more information.
We now support Multiscale Vision Transformers on Kinetics and ImageNet. See projects/mvit for more information.
We now support PyTorchVideo models and datasets. See projects/pytorchvideo for more information.
We now support X3D Models. See projects/x3d for more information.
We now support Multigrid Training for efficiently training video models. See projects/multigrid for more information.
PySlowFast is released in conjunction with our ICCV 2019 Tutorial.

License

PySlowFast is released under the Apache 2.0 license.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the PySlowFast Model Zoo.

Installation

Please find installation instructions for PyTorch and PySlowFast in INSTALL.md. You may follow the instructions in DATASET.md to prepare the datasets.

Quick Start

Follow the example in GETTING_STARTED.md to start playing video models with PySlowFast.

Visualization Tools

We offer a range of visualization tools for the train/eval/test processes, model analysis, and for running inference with trained model. More information at Visualization Tools.

Contributors

PySlowFast is written and maintained by Haoqi Fan, Yanghao Li, Bo Xiong, Wan-Yen Lo, Christoph Feichtenhofer.

Citing PySlowFast

If you find PySlowFast useful in your research, please use the following BibTeX entry for citation.

@misc{fan2020pyslowfast,
  author =       {Haoqi Fan and Yanghao Li and Bo Xiong and Wan-Yen Lo and
                  Christoph Feichtenhofer},
  title =        {PySlowFast},
  howpublished = {\url{https://github.com/facebookresearch/slowfast}},
  year =         {2020}
}

slowfast's People

Contributors

Stargazers

Watchers

Forkers

dreadlord1984 fintrek felixzhang7 caoliangjie yaoliuoa liuguoyou qoboty tianjicaoyuan slbinilkumar zhyj3038 garyggyy zymale sj-li zomkey labimage gokulsg adimukewar dim25 dcthang qiao-maoying hhy5277 lianzhaoy tchigher shaunstanislauslau morven-gan tomarraj008 elevanth mkzirncz1 vinneyj shalevy1 hadryan zetnim v-wewei dorucioclea floydedwin 2429581027 bryanyzhu rodrigogantier laksh9950 zr940326 lovelyqian recorsa nounique meelement mkchoi-0323 zhenyangli mathrho wangwill tridivb chenyihang1993 maya-zml ming1993li sadjadasghari apeizou videodnn apulis windowxiaoming jowinbin liuwenhaha aixioma 3dmm-icme2023 anhminh3105 zhudatu nash2325138 robot-ai-machinelearning leefly072 hintonthu zldrobit amirstudy aoe-khkhan liguiming77 jiazewang salmedina lingtengqiu reactivetype snoworld888 liya2001 xenonpdash jubird915 azhuantou ashkansl twoniu-fr zhzhuangxue 121649982 yoosan onlyonewater mabubaker1947 343570096 qybing haooooooqi yuluhan fangwudi idekazuki thanhphuong82 cdyangbo jojo23333 pekingman raivien yyfsyw smartgamer

slowfast's Issues

K600 training configs and pretrained models.

Thanks for using the codebase, if you are looking for Kinetics 600 related recipes/ models (as #49), I will release it later.

Question about the pooling layer

Hi, thank you for sharing the great codebase. I am looking at the SlowFast model builder, and find an extra pooling layer between res2 and res3 stage. I didn't find it in the paper.

https://github.com/facebookresearch/SlowFast/blob/master/slowfast/models/video_model_builder.py#L219-L225

In addition, both the kernel size and the strides are [1,1,1], which seems like no pooling is performed. The output has the same shape as the input.

Actually, when I read the paper, it says on page 3 that

In our instantiations, we use no temporal downsampling layers (neither temporal pooling nor time-strided convolutions) throughout the Fast pathway, until the global pooling layer before classification.

So this pooling layer shouldn't be here according to the paper, at least for fast pathway. Can you clarify more on this pooling layer, such as why we need it? Thank you.

confusion about the configuration in the paper ?

hi, great work!
But i just have some confusion about some details in the paper .
as the paper mentioned , SlowOnly res_101 and Slowfast res_101 was fine tuned on Charades Dataset with only single machine and the corresponding batch_size is 16.

1、does single machine refer to single gpu ? or 4 or 8 gpu?
2、if single machine refer to 8 gpu, the total batch_size is 16 , right?
3、does the model fine tuned with precise_bn in your code ?

Thanks very much . Looking forward your reply !

Config for Charades and AVA dataset ?

HIi, Great work for video understanding !
will you pls release the config for Charades dataset and AVA dataset .Since i am able to get almost the same result on Kinetics-400, but not able to get the comparble result on Charades dataset with finetuned slowonly-res101, maybe i have just miss some important thing , thanks very much !

CUDA and CUDNN version

what is the corrent verion to match this slowfast code on CentOS 7.2?

Traceback (most recent call last):
File "/root/software_install/anaconda2/envs/pytorch1/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/root/code/sf/sf1/slowfast/utils/multiprocessing.py", line 49, in run
torch.cuda.set_device(local_rank)
File "/root/software_install/anaconda2/envs/pytorch1/lib/python3.6/site-packages/torch/cuda/init.py", line 300, in set_device
torch._C._cuda_setDevice(device)
File "/root/software_install/anaconda2/envs/pytorch1/lib/python3.6/site-packages/torch/cuda/init.py", line 192, in _lazy_init
_check_driver()
File "/root/software_install/anaconda2/envs/pytorch1/lib/python3.6/site-packages/torch/cuda/init.py", line 111, in _check_driver
of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError:
The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

UnicodeDecodeError when loading the pre-trained model

Thank you for your great work!

When I tried to load the model you provided here, I encountered the following error:

Traceback (most recent call last):
  File "tools/run_net.py", line 152, in <module>
    main()
  File "tools/run_net.py", line 128, in main
    train(cfg=cfg)
  File "/home/ubuntu/projects/SlowFast/tools/train_net.py", line 217, in train
    convert_from_caffe2=cfg.TRAIN.CHECKPOINT_TYPE == "caffe2",
  File "/home/ubuntu/projects/SlowFast/slowfast/utils/checkpoint.py", line 212, in load_checkpoint
    checkpoint = torch.load(path_to_checkpoint, map_location="cpu")
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/serialization.py", line 426, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.7/site-packages/torch/serialization.py", line 603, in _load
    magic_number = pickle_module.load(f, **pickle_load_args)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte

This problem occurred probably because I tried to decode in a different way from way of encoding.
I would appreciate it if you could show me how this problem has happened, and how to solve it.

Thank you for your consideration!

Gap in testing with the pretrained model

Similar to issue #28, I also have a gap in test results with the pretrained model. It is evaluated with SLOWFAST_4x16_R50 model. The top-1 accuracy is only 74.37 (smaller than reported result 75.6).

Here is the running script I used

The version of pyav on my size is 6.2.0. All videos are resized to the short edge size of 256.

There are roughly around 19,600 valid videos in my val dataset. For the missing videos, some links to videos are broken, and some videos only contain the audio track.

Demo programme on camera input

Hi,

I find this project very interesting and thanks for open-sourcing it.

I am trying to make a demo programme to load and run the models (e.g. SlowFast) and infer it using input from a USB camera to visually evaluate the accuracy and performance and I wonder if it would it be possible? if so could you briefly elaborate on which modules should be used and how it could be implemented?

Thanks in advance.

Is it possible to release the .csv files for the dataset ?

I wanted this because i'll be creating a dataset with a few more classes/labels of my own.

error while loading model

when I run
python tools/run_net.py
--cfg configs/Kinetics/SLOWFAST_8x8_R50.yaml
DATA.PATH_TO_DATA_DIR data/k400/val_mini.csv
TEST.CHECKPOINT_FILE_PATH checkpoints/k400/SLOWFAST_8x8_R50.pkl
TRAIN.ENABLE False

This error happens:
[INFO: checkpoint.py: 204]: res5_1_branch2b_bn_b: (512,) => s5.pathway0_res1.branch2.b_bn.bias: (512,)
[INFO: checkpoint.py: 204]: res3_1_branch2a_bn_riv: (128,) => s3.pathway0_res1.branch2.a_bn.running_var: (128,)
[INFO: checkpoint.py: 204]: res5_1_branch2b_bn_s: (512,) => s5.pathway0_res1.branch2.b_bn.weight: (512,)
[INFO: checkpoint.py: 204]: res4_4_branch2c_w: (1024, 256, 1, 1, 1) => s4.pathway0_res4.branch2.c.weight: (1024, 256, 1, 1, 1)
[INFO: checkpoint.py: 204]: t_res2_1_branch2c_w: (32, 8, 1, 1, 1) => s2.pathway1_res1.branch2.c.weight: (32, 8, 1, 1, 1)
[INFO: checkpoint.py: 204]: t_res5_0_branch1_w: (256, 128, 1, 1, 1) => s5.pathway1_res0.branch1.weight: (256, 128, 1, 1, 1)
[INFO: checkpoint.py: 204]: res2_2_branch2c_bn_riv: (256,) => s2.pathway0_res2.branch2.c_bn.running_var: (256,)
Traceback (most recent call last):
File "tools/run_net.py", line 152, in
main()
File "tools/run_net.py", line 148, in main
test(cfg=cfg)
File "/data/yangyang/code/sf/sf4/tools/test_net.py", line 102, in test
convert_from_caffe2=cfg.TEST.CHECKPOINT_TYPE == "caffe2",
File "/data/yangyang/code/sf/sf4/slowfast/utils/checkpoint.py", line 197, in load_checkpoint
tuple(ms.state_dict()[converted_key].shape),
AssertionError: t_res4_5_branch2c_bn_subsample_w: (256, 128, 7, 1, 1) does not match s4_fuse.conv_f2s.weight: (256, 128, 5, 1, 1)

i download the model directly from https://dl.fbaipublicfiles.com/pyslowfast/model_zoo/SLOWFAST_8x8_R50.pkl

My env info is:
python 3.6.9 in anaconda
CentOS7.2

Pretrained models not in .pkl format ?

Well , we all know about security risks that come with unpickling files from internet...
Is it possible to present models in Model Zoo not in pickled format ?

error while load trained model

Hi team,
Fantastic work, I changed the TRANS_FUNC into basic_transform. while testing the test set, this error occurred. could you help me fix it. thank you so much.

Traceback (most recent call last):
File "tools/run_net.py", line 152, in
main()
File "tools/run_net.py", line 148, in main
test(cfg=cfg)
File "/root/slowfast-1/tools/test_net.py", line 106, in test
cu.load_checkpoint(last_checkpoint, model, cfg.NUM_GPUS > 1)
File "/root/slowfast/slowfast/utils/checkpoint.py", line 214, in load_checkpoint
checkpoint = torch.load(path_to_checkpoint, map_location="cpu")
File "/root/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/root/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 620, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 3084711 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount.load() > 0 INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1573049301898/work/c10/util/intrusive_ptr.h:348, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /opt/conda/conda-bld/pytorch_1573049301898/work/c10/util/intrusive_ptr.h:348)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f6791e85687 in /root/anaconda3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0x1436a9f (0x7f6799cffa9f in /root/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: THStorage_free + 0x17 (0x7f679a428947 in /root/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: + 0x3ec4cd (0x7f67c6b2a4cd in /root/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #20: __libc_start_main + 0xe7 (0x7f67cb60cb97 in /lib/x86_64-linux-gnu/libc.so.6)

已放弃 (核心已转储)

filter visualization

During the talk, it was mentioned that the filter visualization code proposed in "https://link.springer.com/content/pdf/10.1007%2Fs11263-019-01225-w.pdf" will be made available as part of the codebase. When can we expect it to be available?

No instructions for using this package?

After install the package, I found that there are a lot of missing instruction information for this packge, for example,

How to prepare the training data from benchmark datasets.
Example commands to train the networks.
Example commands to test the networks.

weights shape not matched when using pretrained SLOWFAST_8x8_R50

when using pretrained model SLOWFAST_8x8_R50 to train model, it occurs with the following error:

AssertionError: t_res4_5_branch2c_bn_subsample_w: (256, 128, 7, 1, 1) does not match s4_fuse.conv_f2s.weight: (256, 128, 5, 1, 1)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte

When I run "SLOWFAST_8x8_R50" model with the command below, I got "UnicodeDecodeError" error.

I also got 'RuntimeError: Invalid magic number; corrupt file?' when I load the weight file with "bytes" encoding.

Steps to reproduce

# version info
$ python --version
Python 3.7.2
$ pip freeze | grep -E "(av|torch)"
av==6.2.0
PyWavelets==1.0.2
torch==1.3.0
torchgeometry==0.1.2
torchvision==0.4.1

$ python tools/run_net.py \
  --cfg configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml \
  DATA.PATH_TO_DATA_DIR ${path_to_test_data} \
  TEST.CHECKPOINT_FILE_PATH ${path_to_model}/SLOWFAST_8x8_R50.pkl \
  TRAIN.ENABLE False \

Observed Results

Traceback (most recent call last):
  File "tools/run_net.py", line 152, in <module>
    main()
  File "tools/run_net.py", line 148, in main
    test(cfg=cfg)
  File "/home/peterk/Workspace/slowfast/tools/test_net.py", line 102, in test
    convert_from_caffe2=cfg.TEST.CHECKPOINT_TYPE == "caffe2",
  File "/home/peterk/Workspace/slowfast/slowfast/utils/checkpoint.py", line 214, in load_checkpoint
    checkpoint = torch.load(path_to_checkpoint, map_location=torch.device("cpu"))
  File "/home/peterk/.local/anaconda3/envs/e_torch/lib/python3.7/site-packages/torch/serialization.py", line 426, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/peterk/.local/anaconda3/envs/e_torch/lib/python3.7/site-packages/torch/serialization.py", line 603, in _load
    magic_number = pickle_module.load(f, **pickle_load_args)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 1: invalid continuation byte

about fine tuning kinetics 600 dataset

hi,I use the pretrained model which based on kinetics 400 to fine tune kinetics 600.
Because the classes number is different,so I ignore fc parameters during loading the pretrained caffe2 model.When I try to train the network,several problems occur.
1 distributed train,nan appear in s2 or s3 .I thought maybe data wrong ,but is not.When i use single
gpu,all things is ok
2 I have trained for 2-3 weeks,loss decreases slowly and top1 error is about 45-47% ,top 5 error is about 25-27%,can you provide any suggestion?
Thanks a lot!

Is person-level classifier for AVA trained end-to-end with SlowFast?

The paper says: " We extract region-of-interest (RoI) features [6] at the last feature map of res5. We extend each 2D RoI at a frame into a 3D RoI by replicating it along the temporal axis, following [9]. We compute RoI features by mRoIAlign [10] spatially, and global average pooling temporally. The RoI features are then max-pooled and fed to a per-class, sigmoid-based classifier for multi-label prediction."

As far as I understand there is a separate classifier that operates on the extracted features from the SlowFast. So this person-level action classifier is a different neural net that takes the features of each person (i.e. detections=>ROI-Pool from SlowFast=>Pool=>ClassifierNN) as input and predicts sigmoid-ed classes, right?
Therefore SlowFast network is not actually trained on person-level labels, it is trained on scene-level labels. And person-level classifier is trained separately?

AVA pretrained model

Hi,

Thanks for sharing this code.

Do you have an estimated timeframe to make available the AVA pre-trained models?

Query: Accesing intermediate layer features

Hello Team,

Thank you for releasing the great framework.

I am trying to train the model on a different dataset and was thinking of extracting the i3d or slowfast features (output after the average pooling layer) once that is done. As I understand, this could be done by tweaking the code in ./slowfast/build/lib/slowfast/models/head_helper.py

 def forward(self, inputs):
        assert (
            len(inputs) == self.num_pathways
        ), "Input tensor does not contain {} pathway".format(self.num_pathways)
        pool_out = []
        for pathway in range(self.num_pathways):
            m = getattr(self, "pathway{}_avgpool".format(pathway))
            pool_out.append(m(inputs[pathway]))
        x = torch.cat(pool_out, 1)
        # (N, C, T, H, W) -> (N, T, H, W, C).
        x = x.permute((0, 2, 3, 4, 1))
        feat = x # save the feature
        # Perform dropout.
        if hasattr(self, "dropout"):
            x = self.dropout(x)
        x = self.projection(x)

        # Performs fully convlutional inference.
        if not self.training:
            x = self.act(x)
            x = x.mean([1, 2, 3])

        x = x.view(x.shape[0], -1)
        return x, feat

Something like the above. But is there already an existing way to do this?

Testing On SLOWFAST_4x16_R50.yaml Model

I have single GPU in my machine and trying to run it by modifying the cfg to 1 GPU instead of 8 GPUs.
Also i have downloaded the weights and placed inside the checkpoints folder.
SLOWFAST_4x16_R50.pkl

Everything is compile successfully but when i run:
python tools/run_net.py --cfg configs/Kinetics/SLOWFAST_4x16_R50.yaml

This is what i get

raise NotImplementedError("Unknown way to load checkpoint.")
NotImplementedError: Unknown way to load checkpoint.

Cannot reproduce the result in Paper using the data of Nonlocal-Net

We have used the official code and run the config SLOWONLY_8x8_R50.yaml. The setting is the same as the setting stated in the paper. And we get the Kinetics data in Nonlocal-Net. But our reproduced result of 8*8 in val is 73.97, nearly 1% drop compared in your model zoo.
I want to know the reason and what details do we ignore？

What‘s the meaning of precise BN?

UnicodeDecodeError when loading pickle file

Hi all,
I installed the required software with Python 3.7. I am trying to run a model from the model Zoo and have downloaded it. When I am running the command

python tools/run_net.py --cfg configs/Kinetics/I3D_NLN_8x8_R50.yaml DATA.PATH_TO_DATA_DIR ./files TEST.CHECKPOINT_FILE_PATH ./models/I3D_NLN_8x8_R50.pkl TRAIN.ENABLE False NUM_GPUS 1

I am getting the following error message:

How can I solve this?

Cheers,
Andi

Question about selective decoding and video meta information

Hi, thank you for the great codebase. I was reading the code on dataloader. It seems that from this line, all the meta information is an empty dictionary. So if this is the case, are we still doing selective decoding?

And BTW, could you explain more on what is selective decoding and what is its advantage? Thank you very much.

How to fine-tune SlowFast Kinetics model

Hi, thank you for sharing the great code.

We try to fine-tune SlowFast Kinetics model with UCF101 dataset, could you provide fine-tune code?
Thank you very much.

what's the pipeline about ava dataset

Use video segmentation annotation for finetune on Charades ?

Why is softmax not applied at training time?

I noticed activation (which is softmax) is only applied at test time. Why is that? Usually it is trained with softmax, shouldn't the CrossEntropy loss not work because there would be negative outputs and log is not defined on negative domain?

        # Performs fully convolutional inference.
        if not self.training:
            x = self.act(x)
            x = x.mean([1, 2, 3])

Also why the self.projection said to be fully-convolutional? It is a Fully-Connected layer so in principle it shouldn't be invariant to the input size of the image. I guess you just mean that the weights are initialized similar to Conv layers, right?

.csv files for Kinetics Dataset

Hi, Nice repo!
Could you please provide train.csv, val.csv, test.csv mentioned in DATASET.md` ?
Also if possible, could you share the link of your Kinetics dataset copy? Since many links are unavailable now.

Thank you

Kinetics Dataset

Hi, Nice code!
I am a student of Seoul National University in Korea.

Could you please share the Kinetics dataset?
I have a little bit of trouble downloading Kinetics dataset you introduced.
Because some of links of Youtube video disappeared :(

Best,
Myeongho Jeon

How is this demo "ava_demo.gif" generated?

small gap in testing

Hi Thanks for releasing the code
I tested the slowfast 50 model on K400 with Pytorch 1.3.
Both 8x8 model and 4x16 model gets 1.6% lower top1 acc. compared with reported scores.

I am trying to figure out the cause of this small gap.
Does the official label follow the alphabetic order?
Is this something expected for weights converted from caffe?

Thanks

Script for counting flops

Hi,

Thanks very much for releasing the code of this awesome work! --- Could you possibly also share the script for counting flops?

Fine-tune script requested in issue#18

Thank you so much for building the code base. I wonder if it's possible to send more people the fine-tune script as mentioned in #18 ?

RunTimeError: CUDA memory requirements

Hi,

I have installed PySlowFast for Python 3.7 and I have got CUDA 10.0 installed. I am now trying to run

python tools/run_net.py --cfg configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml DATA.PATH_TO_DATA_DIR ./files TEST.CHECKPOINT_FILE_PATH ./models/SLOWFAST_8x8_R50.pkl TRAIN.ENABLE False TEST.CHECKPOINT_TYPE caffe2 NUM_GPUS 1

and I am getting a CUDA out of memory error. My GPU card has got 6GB. I tried it as well at a GPU with 12GB and the same error occured. here is the error message:

RuntimeError: CUDA out of memory. Tried to allocate 300.00 MiB (GPU 0; 5.93 GiB total capacity; 4.99 GiB already allocated; 232.31 MiB free; 17.16 MiB cached)

Is there anything which I need to adapt?

Regards, Andi

Charades Model

Hi, I'm working on the experiment on Charades.
Is that possible to release the model for charades that achieved 45.2% mAP? It seems not included in the model zoo.
I'm not sure whether it needs to be on-hold because of privacy, it would be perfect if this model can be released

Is it possible to just use pre-trained model without GPU

def build_model(cfg):
    """
    Builds the video model.
    Args:
        cfg (configs): configs that contains the hyper-parameters to build the
        backbone. Details can be seen in slowfast/config/defaults.py.
    """
    assert (
        cfg.MODEL.ARCH in _MODEL_TYPES.keys()
    ), "Model type '{}' not supported".format(cfg.MODEL.ARCH)
    assert (
        cfg.NUM_GPUS <= torch.cuda.device_count()
    ), "Cannot use more GPU devices than available"

    # Construct the model
    model = _MODEL_TYPES[cfg.MODEL.ARCH](cfg)
    # Determine the GPU used by the current process
    cur_device = torch.cuda.current_device()
    # Transfer the model to the current GPU device
    model = model.cuda(device=cur_device)
    # Use multi-process data parallel model in the multi-gpu setting
    if cfg.NUM_GPUS > 1:
        # Make model replica operate on the current device
        model = torch.nn.parallel.DistributedDataParallel(
            module=model, device_ids=[cur_device], output_device=cur_device
        )
    return model

I feel this project very intriguing But it seems like very sparse documentation.
I wanted to use the pre-trained SlowFast R-50 model but from the above code looks like to load the model I need GPU?

len(sys.argv) == 1 is no need when parsing the arguments

Thank you for your great work!

I notice that you set "opts" as only one positional argument and its default as None.
So. I believe you allow me to use this command without any arguments (automatically set as None).
However, once I execute the following command,

python tools/run_net.py

I got the following error.

usage: run_net.py [-h] [--shard_id SHARD_ID] [--num_shards NUM_SHARDS]
                  [--init_method INIT_METHOD] [--cfg CFG_FILE]
                  ...

Provide SlowFast video training and testing pipeline.

positional arguments:
  opts                  See slowfast/config/defaults.py for all options

optional arguments:
  -h, --help            show this help message and exit
  --shard_id SHARD_ID   The shard id of current node, Starts from 0 to
                        num_shards - 1
  --num_shards NUM_SHARDS
                        Number of shards using by the job
  --init_method INIT_METHOD
                        Initialization method, includes TCP or shared file-
                        system
  --cfg CFG_FILE        Path to the config file

This error occurs because you need us to set at least one arguments as below regardless of setting a default value:

if len(sys.argv) == 1:
        parser.print_help()
        sys.exit(1)

I would suggest that you removed these portions from the code so that we used the command without any arguments.

Thank you for your consideration!

Estimated Training Time & CPU Usage

Thanks for your work! When we train SLOWFAST_4x16_R50 on Kinetics with 8 gpus, we find the estimated training time is fluctuated widely, and all cpus are running at 100%. Not sure whether they are normal behaviors ?

As shown in the figure below, the eta becomes 30 days from 3 days, then goes back to normal. Based on the time interval (saving time) between two checkpoints, the training will take more than 10 days, rather than 5 days shown in the log.

All cpu cores are running up to 100% usage.

GPU usage is quite high. Do you think they are normal behaviors, or something was not be set properly? Our batchsize is 96. Looking forward to your advice. Thanks!

How to calculate the mAP?

How to calculate the mAP in the Ch5. of the paper "SlowFast Networks for Video Recognition" ?

Can you provide a Dockerfile ?

Can you provide a Dockerfile for the reseaon I'v failed to configure the environment?

How does image resolution affect accuracy?

Hi thanks for great repo!

Did you have a chance to experiment how input image resolution affects accuracy? I would imagine you use 224x224 for training because of the memory constraints and large dataset, but if one would use 640x640 would that make a large difference (excluding actions which involve small objects, e.g. smoking)?

Kinetics Dataset

Someone has posted a link to the kinetics dataset.
facebookresearch/video-nonlocal-net#67

It may have a better val result with pretrained model.

hyper-parameters about AVA experiment

Hi,

Thanks for sharing the code. I notice that the hyper-parameters in the paper are different with the config files in the codebase (e.g. learning rate schedule), so I suppose the AVA hyper-parameters may be also different. If possible, could you please provide the hyper-parameters to train the AVA pipeline, especially for small amount of GPUs to reproduce the results in the paper? Thanks.

Reproduction and comparision to video-nonlocal-net

Hi, Thank you for making this code available.

I am trying to reproduce the result from your code. I got 72.6% top1 accuracy for I3D, from (Kinetics/I3D_8x8_R50.yaml)[https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/I3D_8x8_R50.yaml], on 8 GPUs without modifying any other parameter, and for 196 epochs.

I suppose that is from the fact that version of the kinetics dataset that I have is missing about 600 validation and 9K training videos. It would be a great help you please make some comment about this?

Another question about comparing the results of this repo with (video-nonlocal-net repo)[https://github.com/facebookresearch/video-nonlocal-net]. For instance, I3D_8x8_R50 have an accuracy of 73.5 here and 73.5 there but the initialisation here is random their (video-nonlocal-net repo) uses imagenet initialization. However, the number of epochs there is much less (~100 epochs).

What makes random initialisation work as good as imagenet initialisation, as in video-nl-net repo? Is it just the number of epochs or something else?
How does one compare them?

Looking forward to your answers,
Gurkirt

[ERROR: decoder.py: 88]: mmco: unref short failure

Thank you for the awesome work.

I was trying to run the run_net.py code to test the pretrained model after following the installation instruction. However, I ran into the error:

[ERROR: decoder.py: 88]: mmco: unref short failure
Failed to decode with pyav with exception: 'av.container.input.InputContainer' object has no attribute 'close'

The command I used is:

python tools/run_net.py --cfg configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml TEST.CHECKPOINT_FILE_PATH SLOWFAST_8x8_R50.pkl TRAIN.ENABLE False NUM_GPUS 1 TEST.CHECKPOINT_TYPE caffe2

After wandering around the internet, I have modified line 165 of decode.py to:
del container

Nevertheless, the error line [ERROR: decoder.py: 88]: mmco: unref short failure is still there and the code seems to be suspending there.

[WARNING: decoder.py: 89]: Failed to parse extradata

Hello, thank you for your work!

I encountered ffmpeg warning, saying [WARNING: decoder.py: 89]: Failed to parse extradata.
I've just looked up what causes this warning, but I don't have any ideas.

Because it's just the warning, can I just ignore them? Or I might make some mistakes?

Thank you for your great work again!

R101 weights in modelzoo are missing

R101 weights in modelzoo are missing, do you plan to release them?

Should the BASE_LR be increased linearly with the total mini-batch size?

In appendix A of the paper, the total mini-batch size is 1024 and the corresponding BASE_LR is set as 1.6. While in the config file SLOWFAST_16x8_R101_50_50.yaml, the total mini-batch size is 64 and the corresponding BASE_LR is 0.1. Both of them are 16 times smaller than those reported in the paper.

Can I assume that the BASE_LR should be changed linearly with the total batch size in my own experiment setting?

Thank you!

facebookresearch / slowfast Goto Github PK

slowfast's Introduction

PySlowFast

Introduction

Updates

License

Model Zoo and Baselines

Installation

Quick Start

Visualization Tools

Contributors

Citing PySlowFast

slowfast's People

Contributors

Stargazers

Watchers

Forkers

slowfast's Issues

Steps to reproduce

Observed Results

Recommend Projects

Recommend Topics

Recommend Org