Giter VIP home page Giter VIP logo

cosmo's Introduction

CosMo

CosMo

Paper Link, Model Link, Dataset Link

  • Cosmo, a fully open-source and comprehensive interleaved pre-training framework, is meticulously crafted for image and video processing.

Its primary focus lies on In-context Learning.

figures/main_ppl.png

News

  • 2/Jan/2024. We provide the preprocess scripts to prepare the pre-training/downstream dataset.
  • 2/Jan/2024. We release Howto-Interlink7M dataset. See Huggingface List View for details.

Functionality of This Code

  • Provides an Interleaved Image/Video Dataloader.
  • Integration with Webdataset.
  • Utilizes Huggingface Transformers along with Deepspeed for training.
  • Incorporates contrastive loss.
  • Enables few-shot evaluation.
  • Supports instruction tuning.

Model Card

See MODEL_CARD.md for more details.

HowToInterlink7M Dataset

See HowToInterlink7M.md.

Install

See INSTALL.md.

Dataset Preparation

See DATASET.md.

Pre-training

See PRETRAIN.md.

Few-shot Evaluation without Tuning

This code support 44 downstream datasets. Include but not limited to COCO, FLICKR30K, OK-VQA, TextVQA, VizWiz, VQAV2, Hatefulmemes, Vatex, TGIF, MSVD, MSRVTT.

See Evaluation.md

Instruction Tuning

See TUNING.md

Citation

If you find our work helps, please consider cite the following work

@article{wang2024cosmo,
  title={COSMO: Contrastive Streamlined Multimodal Model with Interleaved Pre-Training},
  author={Wang, Alex Jinpeng and Li, Linjie and Lin, Kevin Qinghong and Wang Jianfeng and Lin, Kevin and Yang, Zhengyuan  and Wang, Lijuan and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2401.00849},
  year={2024}
}

Contact

Email: awinyimgprocess at gmail dot com

Acknowledgement

Our work are mainly based on the following works:

MMC4,Open-flamingo, Open-CLIP, Huggingface Transformer and WebDataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.