Giter VIP home page Giter VIP logo

large-scale-vision's Introduction

Large-scale-Vision

  • SAM: Segment Anything [Paper] [Dataset]

    • SA-1B Dataset: Total number of images: 11M Total number of masks: 1.1B Average masks per image: 100 Average image resolution: 1500ร—2250 pixels
  • ImageNet: A Large-Scale Hierarchical Image Database [Paper] [Dataset]

    • This dataset spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images.
  • MVImgNet: A Large-scale Dataset of Multi-view Images [Paper] [Dataset]

    • It contains 6.5 million frames from 219,188 videos crossing objects from 238 classes, with rich annotations of object masks, camera parameters, and point clouds.
  • Large Scale Visual Food Recognition [Paper] [Dataset]

    • A new large-scale high-quality food recognition benchmark Food2K, which is the largest food image dataset with 2,000 categories and 1,036,564 images
  • LogoDet-3K: A Large-scale Image Dataset for Logo Detection [Paper] [Dataset]

    • A new large-scale logo dataset LogoDet-3K1 with 3,000 classes, 194,261 objects and 158,652 images, which is the largest logo classes with full annotation.
  • Objects365: A Large-scale, High-quality Dataset for Object Detection [Paper] [Dataset]

    • Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories 2 million images 30 million bounding boxes
  • TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild [Paper] [Dataset]

    • A Large-Scale Dataset and Benchmark for Object Tracking in the Wild.Large Scale Dataset > 30K Video Sequences, Object Tracking > 14M Bounding Boxes
  • YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark [Paper] [Dataset]

    • It also has the following features. 5000+ high-resolution YouTube videos 90+ semantic categories 7800+ unique objects 190k+ high-quality manual annotations 340+ minutes duration
  • DINOv2: Learning Robust Visual Features without Supervision [Paper]

    • LVD-142M Dataset: A curated dataset containing 142M images
  • JFT-300M: Revisiting Unreasonable Effectiveness of Data in Deep Learning Era [Paper]

    • An internal Google dataset, containing over one billion labels for the 300M images. Of the billion image labels, approximately 375M are selected via an algorithm that aims to maximize label precision of selected images.
    • V-MoE adopts JFT-300M as its pretraining dataset
    • ViT-22B extends JFT to around 4B images as its pretraining dataset
  • YFCC100M: The New Data in Multimedia Research[Paper] [Dataset]

    • The largest publicly and freely useable multimedia collection, containing the metadata of around 99.2 million photos and 0.8 million videos from Flickr
  • Open Images V7: From colouring-in to pointillism: revisiting semantic segmentation supervision[Paper] [Dataset]

    • Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives. It has 15,851,536 boxes on 600 classes, 2,785,498 instance segmentations on 350 classes, 3,284,280 relationship annotations on 1,466 relationships, 675,155 localized narratives, 66,391,027 point-level annotations on 5,827 classes ,61,404,966 image-level labels on 20,638 classes

Multimodal

  • LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs [Paper] [Dataset]

    • Number of unique samples 413M Number with height or width >= 1024 26M Number with height and width >= 1024 9.6M Number with height or width >= 512 112M Number with height and width >= 512 67M Number with height or width >= 256 268M Number with height and width >= 256 211M
  • LAION-5B: An open large-scale dataset for training next generation image-text models [Paper] [Dataset]

    • A dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M
  • Redcaps: Web-curated imagetext data created by the people, for the people [Paper] [Dataset]

    • 12M image-text pairs
  • Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark [Paper] [Dataset]

    • A large-scale Chinese VLP dataset with 100 million image-text pairs, covering a wide range of concepts.
  • WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning [Paper] [Dataset]

    • A large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages
  • COYO-700M: Image-Text Pair Dataset [Dataset]

    • A large-scale dataset that contains 747M image-text pairs
  • Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts [Paper] [Dataset]

    • A dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-trainin
  • ALIGN 1.8B: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [Paper]

    • A large-Scale noisy image-Text dataset with 1.8B image-text pairs
  • ALT200M: Scaling Up Vision-Language Pre-training for Image Captioning [Paper]

    • A large-scale image-text dataset consisting of up to 200 million image-text pairs from web based on the alt attribute of the images

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.