Topic: video-understanding Goto Github

Some thing interesting about video-understanding

👇 Here are 176 public repositories matching this topic...

alibaba-mmai-research / tadaconv

video-understanding,[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.

Organization: alibaba-mmai-research

Home Page: https://tadaconv-iclr2022.github.io

action-recognition pytorch action-localization video-understanding video-classification self-supervised-learning tadaconv

amazon-science / video-contrastive-learning

video-understanding,Video Contrastive Learning with Global Context, ICCVW 2021

Organization: amazon-science

computer-vision video-understanding self-supervised-learning contrastive-learning iccv-2021

antoyang / frozenbilm

video-understanding,[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

User: antoyang

Home Page: https://arxiv.org/abs/2206.08155

multimodal-learning video-understanding vqa weakly-supervised-learning large-language-models pre-training video-question-answering videoqa vision-and-language visual-question-answering

antoyang / just-ask

video-understanding,[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

User: antoyang

Home Page: https://arxiv.org/abs/2012.00451

vqa visual-question-answering videoqa video-question-answering video-understanding question-generation weakly-supervised-learning vision-and-language pre-training multimodal-learning

antoyang / tubedetr

video-understanding,[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

User: antoyang

spatio-temporal-video-grounding stvg vidstg hc-stvg vision-and-language multimodal-learning video-understanding visual-grounding

antoyang / vidchapters

video-understanding,[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

User: antoyang

Home Page: http://arxiv.org/abs/2309.13952

dense-video-captioning multimodal-learning pre-training temporal-language-grounding video-captioning video-understanding vision-and-language weakly-supervised-learning vid2seq video-chapter-generation

boheumd / ma-lmm

video-understanding,(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

User: boheumd

Home Page: https://boheumd.github.io/MA-LMM/

llm video-understanding

chihyaoma / activity-recognition-with-cnn-and-rnn

video-understanding,Temporal Segments LSTM and Temporal-Inception for Activity Recognition

User: chihyaoma

activity-recognition video-understanding torch lstm-neural-networks convolutional-neural-networks

chinancheng / awesome-activity-prediction

video-understanding,Paper list of activity prediction and related area

User: chinancheng

awesome-list action-recognition activity-recognition activity-prediction action-prediction activity-understanding video-understanding

cmhungsteve / sstda

video-understanding,[CVPR 2020] Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation (PyTorch)

User: cmhungsteve

Home Page: https://arxiv.org/abs/2003.02824

cvpr2020 pytorch domain-adaptation domain-discrepancy temporal-dynamics video action-segmentation self-supervised-learning video-understanding

cogito2012 / dear

video-understanding,[ICCV 2021 Oral] Deep Evidential Action Recognition

User: cogito2012

action-recognition openset-recognition video-understanding uncertainty-quantification evidential-deep-learning debiasing model-calibration ood-detection

fabienbaradel / object_level_visual_reasoning

video-understanding,Pytorch Implementation of "Object level Visual Reasoning in Videos", F. Baradel, N. Neverova, C. Wolf, J. Mille, G. Mori , ECCV 2018

User: fabienbaradel

eccv-2018 computer-vision video-understanding human-object-interaction

ferreirafabio / video2tfrecord

video-understanding,Easily convert RGB video data (e.g. .avi) to the TensorFlow tfrecords file format for training e.g. a NN in TensorFlow. This implementation allows to limit the number of frames per video to be stored in the tfrecords.

User: ferreirafabio

video-understanding deep-learning preprocessor tensorflow tensorflow-tfrecords opencv neural-network optical-flow

henghuiding / mevis

video-understanding,[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

User: henghuiding

Home Page: https://henghuiding.github.io/MeViS/

multimodal-learning referring-expression-comprehension referring-expression-segmentation referring-video-object-segmentation video-understanding mevis-dataset mose-dataset

hustvl / tevit

video-understanding,Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Organization: hustvl

Home Page: https://arxiv.org/abs/2204.08412

instance-segmentation video-instance-segmentation video-understanding

jinwchoi / awesome-action-recognition

video-understanding,A curated list of action recognition and related area resources

User: jinwchoi

awesome-list awesome action-recognition action-classification action-detection activity-recognition activity-understanding video-understanding video-recognition video-processing

junweiliang / multiverse

video-understanding,Dataset, code and model for the CVPR'20 paper "The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction". And for the ECCV'20 SimAug paper.

User: junweiliang

Home Page: https://next.cs.cmu.edu/multiverse/

trajectory-prediction trajectory-prediction-benchmark computer-vision video-understanding 3d-simulation

mcg-nju / tdn

video-understanding,[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition

Organization: mcg-nju

Home Page: https://arxiv.org/abs/2012.10071

action-recognition video-understanding video-classification cvpr2021 pytorch temporal-modeling

mcg-nju / videomae

video-understanding,[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Organization: mcg-nju

Home Page: https://arxiv.org/abs/2203.12602

self-supervised-learning action-recognition video-understanding masked-autoencoder transformer vision-transformer video-transformer mae pytorch video-representation-learning

mit-han-lab / temporal-shift-module

video-understanding,[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

Organization: mit-han-lab

Home Page: https://arxiv.org/abs/1811.08383

acceleration low-latency temporal-modeling video-understanding efficient-model nvidia-jetson-nano tsm

movienet / movienet-tools

video-understanding,Tools for movie and video research

Organization: movienet

Home Page: http://movienet.github.io

movie computer-vision video-understanding action-recognition deep-learning vision-language cross-modality shot-detection person-analysis

nvlabs / step

video-understanding,STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)

Organization: nvlabs

action-detection spatial-temporal spatio-temporal video-action action-recognition ava ava-dataset ucf101 ucf101-dataset amp

open-mmlab / mmaction

video-understanding,An open-source toolbox for action understanding based on PyTorch

Organization: open-mmlab

Home Page: https://open-mmlab.github.io/

action-recognition action-detection video-understanding pytorch temporal-action-detection temporal-action-localization spatial-temporal-action-detection

open-mmlab / mmaction2

video-understanding,OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Organization: open-mmlab

Home Page: https://mmaction2.readthedocs.io

action-recognition temporal-action-localization pytorch video-understanding tsn i3d slowfast ava spatial-temporal-action-detection benchmark

opengvlab / ask-anything

video-understanding,[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Organization: opengvlab

Home Page: https://vchat.opengvlab.com/

captioning-videos chatgpt gradio langchain video-question-answering video-understanding stablelm chat video big-model

opengvlab / internvideo

video-understanding,Video Foundation Models & Data for Multimodal Understanding

Organization: opengvlab

foundation-models video-understanding vision-transformer action-recognition masked-autoencoder multimodal open-set-recognition spatio-temporal-action-localization temporal-action-localization video-question-answering

opengvlab / videomaev2

video-understanding,[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Organization: opengvlab

Home Page: https://arxiv.org/abs/2303.16727

cvpr2023 foundation-model self-supervised-learning video-understanding action-detection action-recognition temporal-action-detection

paddlepaddle / paddlevideo

video-understanding,Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video tagging and sport action detection.

Organization: paddlepaddle

video-recognition tsm slowfast tsn bmn action-recognition youtube-8m kinetics400 video-understanding activitynet

pku-yuangroup / chat-univi

video-understanding,[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Organization: pku-yuangroup

Home Page: https://arxiv.org/abs/2311.08046

image-understanding large-language-models video-understanding vision-language-model

rlleshi / phar

video-understanding,deep learning sex position classifier

User: rlleshi

action-recognition deep-learning porn-filter pornhub pytorch video-classification video-understanding human-action-recognition sex sex-classifier

rohitgirdhar / actionvlad

video-understanding,ActionVLAD for video action classification (CVPR 2017)

User: rohitgirdhar

Home Page: https://rohitgirdhar.github.io/ActionVLAD/

action-recognition deep-learning tensorflow video-processing video-understanding

rohitgirdhar / cater

video-understanding,CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

User: rohitgirdhar

Home Page: https://rohitgirdhar.github.io/CATER/

video-recognition action-recognition deep-learning clevr video-understanding

showlab / awesome-video-diffusion

video-understanding,A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.

Organization: showlab

awesome diffusion-models video-editing video-generation video-understanding video-restoration text-to-motion text-to-video

theshadow29 / awesome-grounding

video-understanding,awesome grounding: A curated list of research papers in visual grounding

User: theshadow29

computer-vision natural-language-processing grounding awesome-list papers arxiv visual-grounding image-grounding video-understanding video-grounding

ustc-video-understanding / i3d_finetune

video-understanding,TensorFlow code for finetuning I3D model on UCF101.

Organization: ustc-video-understanding

action-recognition video-understanding cnn deep-learning i3d

v-iashin / specvqgan

video-understanding,Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

User: v-iashin

Home Page: https://v-iashin.github.io/SpecVQGAN

transformer vqvae gan pytorch audio-generation video-features melgan multi-modal video-understanding vggsound

wangheda / youtube-8m

video-understanding,The 2nd place Solution to the Youtube-8M Video Understanding Challenge by Team Monkeytyping (based on tensorflow)

User: wangheda

Home Page: https://arxiv.org/abs/1706.05150

youtube-8m video-understanding computer-vision deep-learning ensemble-learning tensorflow

whwu95 / bike

video-understanding,【CVPR'2023】Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

User: whwu95

Home Page: https://arxiv.org/abs/2301.00182

action-recognition cross-modal-learning video-language-understanding video-recognition video-understanding

whwu95 / cap4video

video-understanding,【CVPR'2023 Highlight】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

User: whwu95

Home Page: https://arxiv.org/abs/2301.00184

cross-modal-learning video-language-understanding video-text-retrieval video-understanding

whwu95 / mvfnet

video-understanding,【AAAI'2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition

User: whwu95

Home Page: https://arxiv.org/abs/2012.06977

efficient-video-recognition data-preparation model-zoo video-understanding temporal-modeling

whwu95 / text4vis

video-understanding,【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

User: whwu95

cross-modal-learning transfer-learning video-recognition video-understanding action-recognition

wujie1010 / awesome-temporally-language-grounding

video-understanding,A curated list of “Temporally Language Grounding” and related area

User: wujie1010

video-understanding language-grounding temporal-activity-localization temporal-language-grounding charades activitynet-captions

xyzforever / bevt

video-understanding,PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529

User: xyzforever

action-recognition bert deep-learning masked-autoencoder pytorch video-understanding foundation-models self-supervised-learning video-representation-learning

yjxiong / action-detection

video-understanding,temporal action detection with SSN

User: yjxiong

action-recognition action-detection temporal-activity-localization video-understanding structured-segment-networks

yjxiong / temporal-segment-networks

video-understanding,Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

User: yjxiong

temporal-segment-networks action-recognition video-understanding

yjxiong / tsn-pytorch

video-understanding,Temporal Segment Networks (TSN) in PyTorch

User: yjxiong

action-recognition deep-learning video-understanding pytorch temporal-segment-networks

yoosan / video-understanding-dataset

video-understanding,A collection of recent video understanding datasets, under construction!

User: yoosan

video-understanding datasets computer-vision action-recognition

youngwanlee / vov3d

video-understanding,Efficient 3D Backbone Network for Temporal Modeling

User: youngwanlee

Home Page: https://arxiv.org/abs/2012.00317

backbone-networks temporal-modeling efficient-model vovnet vov3d video-understanding 3d-cnn-architecture

yuhaocheng / pyanomaly

video-understanding,Useful Toolbox for Anomaly Detection

User: yuhaocheng

anomaly-detection artificial-intelligence artificial-neural-networks computer-vision machine-learning multimedia python pytroch video-analysis video-anomaly-detection video-understanding

zhang-can / pan-pytorch

video-understanding,[Codes of paper]: PAN: Towards Fast Action Recognition via Learning Persistence of Appearance

User: zhang-can

Home Page: https://arxiv.org/abs/2008.03462

action-recognition video-understanding motion-representation

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.