Giter VIP home page Giter VIP logo

pseudo-3d-pytorch's Introduction

Pseudo-3D Residual Networks

This repo implements the network structure of P3D[1] with PyTorch, pre-trained model weights are converted from caffemodel, which is supported from the author's repo

Requirements:

  • pytorch
  • numpy

Structure details

In the author's official repo, only P3D-199 is released. Besides this deepest P3D-199, I also implement P3D-63 and P3D-131, which are respectively modified from ResNet50-3D and ResNet101-3D, the two nets may bring more convenience to users who have only memory-limited GPUs.

Pretrained weights

(Pretrained weights of P3D63 and P3D131 are not yet supported)

(tips: I feel sorry to canceal the download urls of pretrained weights because of some private reasons. For more information you could send emails to me.) (New tips: Model weights now are available.)

1, P3D-199 trained on Kinetics dataset:

BaiduYun url

2, P3D-199 trianed on Kinetics Optical Flow (TVL1):

BaiduYun url

Example Code

from __future__ import print_function
from p3d_model import *
import torch

model = P3D199(pretrained=True,num_classes=400)
model = model.cuda()
data=torch.autograd.Variable(torch.rand(10,3,16,160,160)).cuda()   # if modality=='Flow', please change the 2nd dimension 3==>2
out=model(data)
print(out.size(),out)

Ablation settings

  1. ST-Structures:

    All P3D models in this repo support various forms of ST-Structures like ('A','B','C') ,('A','B') and ('A'), code is as follows.

    model = P3D63(ST_struc=('A','B'))
    model = P3D131(ST_struc=('C'))
    
  2. Flow and RGB models:

    Set parameter modality='RGB' as 'RGB' model, 'Flow' as flow model. Flow model i trained on TVL1 optical flow images.

    model= P3D199(pretrained=True,modality='Flow')
    
  3. Finetune the model

    when finetuning the models on your custom dataset, use get_optim_policies() to set different learning speed for different layers. e.g. When dataset is small, Only need to train several deepest layers, set slow_rate=0.8 in code, and change the following lr_mult,decay_mult.


please cite this repo if you take use of it.

Experiment Result (Out of the paper)

(All the following results are generated by End-to-End manners).

Some of them have outperforms state of the arts.

  • Action recognition(mean accuracy on UCF101):
modality/model RGB Flow Fusion
P3D199 (Sports-1M) 88.5% - -
P3D199 (Kinetics) 91.2% 92.4% 98.3%
  • Action localization(mAP on Thumos14):

steps: perframe+watershed

Step perframe localization
P3D199(Sports-1M 0.451 0.25
P3D199(Kinetics) 0.569(fused) 0.307

Reference:

[1]Learning Spatio-Temporal Representation with Pseudo-3D Residual,ICCV2017

pseudo-3d-pytorch's People

Contributors

qijiezhao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.