Giter VIP home page Giter VIP logo

uvast's Introduction

UVAST: Unified Fully and Weakly Supervised Temporal Action Segmentation via Sequence to Sequence Translation

Official PyTorch implementation of the ECCV 2022 paper: UVAST: Unified Fully and Weakly Supervised Temporal Action Segmentation via Sequence to Sequence Translation The code allows users to reproduce and extend the results of our method. Please cite the paper when reporting, reproducing or extending the results.

Overview

This repository implements UVAST, a method for temporal action segmentation via sequence-to-sequence translation. Details regarding the required environment, datasets, training scripts and pretrained models can be found below.


ECCV2022

Enviroment

Pytorch == 1.10.0+cu102, torchvision == 0.11.1, python == 3.9.7, CUDA==10.2

Enviroment Setup

Install the required libaries as follows:

conda clean -a -y
conda create -n uvast python=3.9.7 numpy
conda activate uvast
conda install  --insecure pytorch=1.10.0 torchvision=0.11.1 torchaudio=0.10.0 cudatoolkit=11.3.1  -c pytorch
python -c "import torch; print(torch.__version__)"
conda install -c conda-forge tqdm
conda install -c conda-forge matplotlib
conda install -c conda-forge einops
conda install -c conda-forge torchinfo
conda install -c anaconda pandas
conda install -c conda-forge tensorboardx
conda install -c anaconda ipykernel
conda install ipython
conda install pip

Clone this repository

git clone https://github.com/boschresearch/uvast.git
cd uvast

Datasets

Use this link to download the features and the ground truth labels for the GTEA, 50Salads and Breakfast datasets (~30GB).

Extract data.zip so that data is placed inside the uvast folder. Otherwise you need to modify the --data_root flag to point to the data folder.

Compute mean durations

To compute the mean durations to be used in FIFA/Viterbi call (and optionally pass --data_root):

python compute_mean_dur.py

Training

We train the model in a two stages process: In the first stage, we train the encoder and the transcript decoder using Eq. (6) from the paper (and without the alignment decoder). In the second stage, we optionally train the alignment decoder (alternatively, Viterbi or FIFA can be used to compute durations).

All training scrips for all three datasets are provided with the pretrained_models. For each of the scripts you need to specify the --data_root (if data is not placed inside the uvast folder) and --experiment_path to specify where to save the models. For more information regarding the flags, please look into the information for each flag in run.py. An example training script for the first split of the gtea dataset is provided below:

Training script for stage 1:

python run.py --use_cuda --step_size 800 --dataset gtea --split 1 --split_segments --use_pe_tgt --do_framewise_loss --do_framewise_loss_g --framewise_loss_g_apply_nothing --do_segwise_loss --do_segwise_loss_g --segwise_loss_g_apply_logsoftmax --do_crossattention_action_loss_nll

Training script for stage 2:

python run.py --use_cuda --dataset gtea --split 1 --split_segments --use_pe_tgt --use_alignment_dec --do_crossattention_dur_loss_ce --aug_rnd_drop --pretrained_model pretrained_models/gtea/split1/gtea_split1_stage1.model

Note that for this stage you need to specify the pretrained model from the first stage via the --pretrained_model flag.

Evaluation

While the transcript decoder predicts the sequence of actions in the video, we propose three different approaches for predicting durations: a learnable alignment decoder (stage 2), or FIFA/Viterbi. All evaluation scripts along with pretrained model are provided in pretrained_models. To run the inference code the flag --inference_only needs to be added as well as --path_inference_model to point to the model to be evaluated.

An example script for testing a model is provided below:

Evaluate Alignment Decoder:

python run.py --use_cuda --dataset gtea --split 1 --path_inference_model pretrained_models/gtea/split1/gtea_split1_stage2.model --inference_only --split_segments --use_pe_tgt --use_alignment_dec 

Evaluate with Viterbi:

python run.py --use_cuda --dataset gtea --split 1 --use_viterbi --viterbi_sample_rate 1 --path_inference_model pretrained_models/gtea/split1/gtea_split1_stage1.model --inference_only --split_segments --use_pe_tgt

Evaluate with FIFA:

python run.py --use_cuda --dataset gtea --split 1 --use_fifa --fifa_init_dur --path_inference_model pretrained_models/gtea/split1/gtea_split1_stage2.model --inference_only --split_segments --use_pe_tgt --use_alignment_dec

Citation

If this code is useful in your research we would kindly ask you to cite our paper.

@inproceedings{uvast2022ECCV,
 title={Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation},
 author={Nadine Behrmann and S. Alireza Golestaneh and Zico Kolter and Juergen Gall and Mehdi Noroozi},
 booktitle={ECCV},
 year={2022}
}

License

This project is open-sourced under the AGPL-3.0 license. See the License file for details.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Purpose of the Project

This software is a research prototype, solely developed for and published as part of the publication cited above. It will neither be maintained nor monitored in any way.

Contact

Please feel free to open an issue or contact us personally if you have questions, need help, or need explanations. Write to one of the following email addresses, and maybe put the other in the cc:

[email protected]

[email protected]

uvast's People

Contributors

nadinebehrmann avatar bhavyaalekhya avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.