Giter VIP home page Giter VIP logo

webvid's Introduction

WebVid Dataset ๐Ÿ•ธ๐ŸŽฅ

Large-scale text-video dataset, containing 10 million video-text pairs scraped from the stock footage sites. This dataset was used for large-scale pretraining to achieve state-of-the-art end-to-end retrieval in our frozen-in-time work: the code of which can be found here

Terms of Access ๐Ÿ”“

You must not use the content in this dataset if you do not agree to the terms outlined in TERMS.md.

We do not own the copyright to any of the collected data and its use is authorised via the Intellectual Property Officeโ€™s Exceptions to Copyright for Non-Commercial Research and Private Study.

Metadata ๐Ÿ“

Please see above before downloading.

2.5M Subset

  • train (640MB) wget http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_train.csv
  • val (1.3MB) wget http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_val.csv

10M

  • train (2.7GB)wget http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv
  • val (1.3MB) wget http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv

Download โฌ‡๏ธ

Our method:

  1. Download csv file(s) above to this repository
  2. pip install pandas numpy requests mpi4py
  3. To download on one job: python download.py --csv_path results_2M_train.csv --partitions 1 --part 0 --data_dir ./data --processes 8. You can split this across N concurrent jobs by choosing --partitions N partitions and running each job with different --part $idx. You can also specify the number of processes, recommended one per cpu.

video2dataset:

  1. pip install video2dataset
  2. Example downloading script. video2dataset has many options for subsampling the input data (FPS, resolution, cut detection, optical flow, etc.) so this script can be greatly modified to enrich/standardize the output dataset.
  3. Load into nicely batched tensors like this

Download CLIP Features โฌ‡๏ธ

CLIP ViT-B/32 Features of this dataset, extracted at 1FPS are available to download at https://huggingface.co/datasets/iejMac/CLIP-WebVid, credit to iejMac. The pipeline for extracting clip features can be found here https://github.com/iejMac/clip-video-encode, see the example at the bottom of this README.

N.B: CLIP features could be slightly biased / degraded due to the watermarks, which were not removed during extraction.

Disclaimer โš ๏ธ

We note that data sourced from the web may be prone to biases and may contain graphic content. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.

Cite ๐Ÿ“‹

If you use this dataset in your research, please cite:

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Max Bain, Arsha Nagrani, Gรผl Varol, Andrew Zisserman.

@InProceedings{Bain21,
  author       = "Max Bain and Arsha Nagrani and G{\"u}l Varol and Andrew Zisserman",
  title        = "Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval",
  booktitle    = "IEEE International Conference on Computer Vision",
  year         = "2021",
}

FAQs ๐Ÿ™‹

Q1: Can you provide the original videos for download?

A1: Since we do not own the videos in the dataset, we cannot legally provide them to you. The video owner(s) can choose to delete it at anytime, in which case the video will no longer be available in the dataset. Due to this, unfortunately, some videos in the dataset will be lost over time, and we are unable to help with this issue. However, the sources are official and we expect the large majority of videos to be available for the forseeable future.

Q2: Is it normal that a subset of videos cannot be retrieved from the provided URLs?

A2: Yes. See Q1.

Q3: I noticed there are watermarks on the videos, how will this affect training?

A3: We found we were still able to achieve top performance (with the watermarks) on downstream text-to-video retrieval, both for finetuning and zero-shot settings. We expect similar results on other video language tasks but didn't test these in the paper. If you do use this dataset for other video-language tasks, we'd be interested to hear how it goes.

Contact Us

If you have a question not provided in the FAQs above, please create an issue in this repository.

If you would like to share feedback or report concerns, please email me at [email protected]

webvid's People

Contributors

m-bain avatar iejmac avatar kundamwiza avatar kanik155 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.