Giter VIP home page Giter VIP logo

mix-generation's Introduction

MixGen: A New Multi-Modal Data Augmentation

This is the official PyTorch implementation of MixGen, which is a joint data augmentation technique for vision-language representation learning to improve data efficiency.

Here are some image-text pairs generated by MixGen,

How to use

MixGen is an input-level data augmentation technique, which can be plugged-and-played into existing vision-language learning methods with minimal code change.

Here we adopt ALBEF, NeurIPS'21 as an illustrating example. We only need to add one line between dataloader and model forward here.

That is, change from

for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
    optimizer.zero_grad()

to

import mixgen as mg
for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
    image, text = mg.mixgen(image, text, num=16)
    optimizer.zero_grad()

And that's it!!! No more changes needed to be made. You can simply kicoff training just like ALBEF does,

python -m torch.distributed.launch --nproc_per_node=8 --use_env Pretrain.py

Citation

If you find MixGen useful in your research, please kindly consider to cite the following paper.

@InProceedings{Hao_2023_WACV,
    author    = {Hao, Xiaoshuai and Zhu, Yi and Appalaraju, Srikar and Zhang, Aston and Zhang, Wanqian and Li, Bo and Li, Mu},
    title     = {MixGen: A New Multi-Modal Data Augmentation},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {January},
    year      = {2023},
    pages     = {379-389}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

mix-generation's People

Contributors

amazon-auto avatar bryanyzhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mix-generation's Issues

关于将mixgen应用于更多特殊情况的问题

居然找到了李沐老师团队的论文,而且还是多模态的方向,很快啊,咱就上手实验了。
但是在我的代码中,texts到达readme中对应步骤的时候,是已经经过处理变成了与images对应的,包含batch_size等维度的三维tensor,而不是str类型。似乎不能和readme中一样进行简单调用实现,会出现tensor无法和str相加的错误。想问问老师们碰到这种情况该如何解决呢?

Need your help.

Hello, thank you for your excellent work MixGen and code sharing. I have two questions that I would like to call for your help:

  1. In addition to being used in the pre-training process, is MixGen also used in downstream tasks, such as fine-tuning on the COCO data set for image-text retrieval task.
  2. Considering that the augmentation operation is performed in mini-batch and there involves text concat, will this have an impact on the contrastive learning pre-training objective (as in common, other samples in mini-batch are regarded as negative samples).

Percentage of images using MixGen

The second page says "Given a minibatch of B randomly sampled image-text pairs, MixGen replaces the first M training samples with the newly generated pairs... By default, we set M = B/4 in Algorithm 1", And the fifth page says "a minibatch will contain 384 existing samples and 128 new image-text pairs."

But I look inside the code, MixGen replaces the whole batch with the newly generated pairs.

image

Is there a sampling operation behind it to select M newly generated pairs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.