Giter VIP home page Giter VIP logo

playablevideogeneration's Introduction

Playable Video Generation



Figure 1. Illustration of the proposed CADDY model for playable video generation.


Playable Video Generation
Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci
ArXiv

Paper: arXiv: Coming soon
Website
Live Demo

Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety.

Overview

Given a set of completely unlabeled videos, we jointly learn a set of discrete actions and a video generation model conditioned on the learned actions. At test time, the user can control the generated video on-the-fly providing action labels as if he or she was playing a videogame. We name our method CADDY. Our architecture for unsupervised playable video generation is composed by several components. An encoder E extracts frame representations from the input sequence. A temporal model estimates the successive states using a recurrent dynamics network R and an action network A which predicts the action label corresponding to the current action performed in the input sequence. Finally, a decoder D reconstructs the input frames. The model is trained using reconstruction as the main driving loss.

Installation

Conda

The complete environment for execution can be installed with:

conda env create -f env.yml

conda activate video-generation

Docker

Build the docker image docker build -t video-generation:1.0 .

Run the docker image. Mount the root directory to /video-generation in the docker container: docker run -it --gpus all --ipc=host -v /path/to/directory/video-generation:/video-generation video-generation:1.0 /bin/bash

Directory structure

Please create the following directories in the root of the project:

  • results
  • checkpoints
  • data

Datasets

Datasets can be downloaded at the following link: Google Drive

  • Breakout: Coming soon
  • BAIR: bair_256_ours.tar.gz
  • Tennis: Coming soon

Please extract them under the data folder

Pretrained Models

Pretrained models can be downloaded at the following link: Google Drive

Please place the directories under the checkpoints folder

Playing

After downloading the checkpoints, the models can be played with the following commands:

  • Bair: python play.py --config configs/01_bair.yaml

  • Breakout: python play.py configs/breakout/02_breakout.yaml

  • Tennis: python play.py --config configs/03_tennis.yaml

Training

The models can be trained with the following commands:

python train.py --config configs/<config_file>

Multi-gpu support is active by default. Runs can be logged through Weights and Biases by running before execution of the training command: wandb init

Evaluation

Evaluation requires two steps. First, an evaluation dataset must be build. Second, evaluation is carried our on the evaluation dataset. To build the evaluation dataset please issue:

python build_evaluation_dataset.py --config configs/<config_file>

To run evaluation issue:

python evaluate_dataset.py --config configs/evaluation/configs/<config_file>

Evaluation results are saved under the evaluation_results directory.

playablevideogeneration's People

Contributors

willi-menapace avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.