Light

yioutpi / large-scale-vision Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 7 KB

License: MIT License

large-scale-vision's Introduction

Large-scale-Vision

SAM: Segment Anything [Paper] [Dataset]
- SA-1B Dataset: Total number of images: 11M Total number of masks: 1.1B Average masks per image: 100 Average image resolution: 1500×2250 pixels
ImageNet: A Large-Scale Hierarchical Image Database [Paper] [Dataset]
- This dataset spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images.
MVImgNet: A Large-scale Dataset of Multi-view Images [Paper] [Dataset]
- It contains 6.5 million frames from 219,188 videos crossing objects from 238 classes, with rich annotations of object masks, camera parameters, and point clouds.
Large Scale Visual Food Recognition [Paper] [Dataset]
- A new large-scale high-quality food recognition benchmark Food2K, which is the largest food image dataset with 2,000 categories and 1,036,564 images
LogoDet-3K: A Large-scale Image Dataset for Logo Detection [Paper] [Dataset]
- A new large-scale logo dataset LogoDet-3K1 with 3,000 classes, 194,261 objects and 158,652 images, which is the largest logo classes with full annotation.
Objects365: A Large-scale, High-quality Dataset for Object Detection [Paper] [Dataset]
- Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories 2 million images 30 million bounding boxes
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild [Paper] [Dataset]
- A Large-Scale Dataset and Benchmark for Object Tracking in the Wild.Large Scale Dataset > 30K Video Sequences, Object Tracking > 14M Bounding Boxes
YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark [Paper] [Dataset]
- It also has the following features. 5000+ high-resolution YouTube videos 90+ semantic categories 7800+ unique objects 190k+ high-quality manual annotations 340+ minutes duration
DINOv2: Learning Robust Visual Features without Supervision [Paper]
- LVD-142M Dataset: A curated dataset containing 142M images
JFT-300M: Revisiting Unreasonable Effectiveness of Data in Deep Learning Era [Paper]
- An internal Google dataset, containing over one billion labels for the 300M images. Of the billion image labels, approximately 375M are selected via an algorithm that aims to maximize label precision of selected images.
- V-MoE adopts JFT-300M as its pretraining dataset
- ViT-22B extends JFT to around 4B images as its pretraining dataset
YFCC100M: The New Data in Multimedia Research[Paper] [Dataset]
- The largest publicly and freely useable multimedia collection, containing the metadata of around 99.2 million photos and 0.8 million videos from Flickr
Open Images V7: From colouring-in to pointillism: revisiting semantic segmentation supervision[Paper] [Dataset]
- Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives. It has 15,851,536 boxes on 600 classes, 2,785,498 instance segmentations on 350 classes, 3,284,280 relationship annotations on 1,466 relationships, 675,155 localized narratives, 66,391,027 point-level annotations on 5,827 classes ,61,404,966 image-level labels on 20,638 classes

Multimodal

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs [Paper] [Dataset]
- Number of unique samples 413M Number with height or width >= 1024 26M Number with height and width >= 1024 9.6M Number with height or width >= 512 112M Number with height and width >= 512 67M Number with height or width >= 256 268M Number with height and width >= 256 211M
LAION-5B: An open large-scale dataset for training next generation image-text models [Paper] [Dataset]
- A dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M
Redcaps: Web-curated imagetext data created by the people, for the people [Paper] [Dataset]
- 12M image-text pairs
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark [Paper] [Dataset]
- A large-scale Chinese VLP dataset with 100 million image-text pairs, covering a wide range of concepts.
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning [Paper] [Dataset]
- A large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages
COYO-700M: Image-Text Pair Dataset [Dataset]
- A large-scale dataset that contains 747M image-text pairs
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts [Paper] [Dataset]
- A dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-trainin
ALIGN 1.8B: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [Paper]
- A large-Scale noisy image-Text dataset with 1.8B image-text pairs
ALT200M: Scaling Up Vision-Language Pre-training for Image Captioning [Paper]
- A large-scale image-text dataset consisting of up to 200 million image-text pairs from web based on the alt attribute of the images

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.