The mix-generation from amazon-science

mix-generation's Introduction

MixGen: A New Multi-Modal Data Augmentation

This is the official PyTorch implementation of MixGen, which is a joint data augmentation technique for vision-language representation learning to improve data efficiency.

Here are some image-text pairs generated by MixGen,

How to use

MixGen is an input-level data augmentation technique, which can be plugged-and-played into existing vision-language learning methods with minimal code change.

Here we adopt ALBEF, NeurIPS'21 as an illustrating example. We only need to add one line between dataloader and model forward here.

That is, change from

for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
    optimizer.zero_grad()

import mixgen as mg
for i, (image, text) in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
    image, text = mg.mixgen(image, text, num=16)
    optimizer.zero_grad()

And that's it!!! No more changes needed to be made. You can simply kicoff training just like ALBEF does,

python -m torch.distributed.launch --nproc_per_node=8 --use_env Pretrain.py

Citation

If you find MixGen useful in your research, please kindly consider to cite the following paper.

@InProceedings{Hao_2023_WACV,
    author    = {Hao, Xiaoshuai and Zhu, Yi and Appalaraju, Srikar and Zhang, Aston and Zhang, Wanqian and Li, Bo and Li, Mu},
    title     = {MixGen: A New Multi-Modal Data Augmentation},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {January},
    year      = {2023},
    pages     = {379-389}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

mix-generation's People

Contributors

Stargazers

Watchers

mix-generation's Issues

关于将mixgen应用于更多特殊情况的问题

居然找到了李沐老师团队的论文，而且还是多模态的方向，很快啊，咱就上手实验了。
但是在我的代码中，texts到达readme中对应步骤的时候，是已经经过处理变成了与images对应的，包含batch_size等维度的三维tensor,而不是str类型。似乎不能和readme中一样进行简单调用实现，会出现tensor无法和str相加的错误。想问问老师们碰到这种情况该如何解决呢？

why x.unsqueeze(dim=0)

Hi,

Thanks for your great work and make it public.

I wonder why .unsqueeze(dim=0) is used here. It seems that this is not a necessary operation.

Best

Need your help.

Hello, thank you for your excellent work MixGen and code sharing. I have two questions that I would like to call for your help:

In addition to being used in the pre-training process, is MixGen also used in downstream tasks, such as fine-tuning on the COCO data set for image-text retrieval task.
Considering that the augmentation operation is performed in mini-batch and there involves text concat, will this have an impact on the contrastive learning pre-training objective (as in common, other samples in mini-batch are regarded as negative samples).

Percentage of images using MixGen

The second page says "Given a minibatch of B randomly sampled image-text pairs, MixGen replaces the first M training samples with the newly generated pairs... By default, we set M = B/4 in Algorithm 1", And the fifth page says "a minibatch will contain 384 existing samples and 128 new image-text pairs."

But I look inside the code, MixGen replaces the whole batch with the newly generated pairs.

Is there a sampling operation behind it to select M newly generated pairs?

Recommend Projects

amazon-science / mix-generation Goto Github PK

mix-generation's Introduction

MixGen: A New Multi-Modal Data Augmentation

How to use

Citation

Security

License

mix-generation's People

Contributors

Stargazers

Watchers

Forkers

mix-generation's Issues

关于将mixgen应用于更多特殊情况的问题

why x.unsqueeze(dim=0)

Need your help.

Percentage of images using MixGen

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent