Giter VIP home page Giter VIP logo

vqamix's Introduction

VQAMix: Conditional Triplet Mixup for Medical Visual Question Answering paper

IEEE Transaction on Medical Imaging

This repository is the official implementation of VQAMix for the visual question answering task in medical domain. In this paper, we propose a simple yet effective data augmentation method, VQAMix, to mitigate the data limitation problem. Specifically, VQAMix generates more labeled training samples by linearly combining a pair of VQA samples, which can be easily embedded into any visual-language model to boost performance.

This repository is based on and inspired by @Jin-Hwa Kim's work and @Aizo-ai's work. We sincerely thank for their sharing of the codes.

Citation

Please cite this paper in your publications if it helps your research

@article{gong2022vqamix,
  title={VQAMix: Conditional Triplet Mixup for Medical Visual Question Answering},
  author={Haifan Gong and Guanqi Chen and Mingzhi Mao and Zhen Li and Guanbin Li},
  journal={IEEE Trans. on Medical Imaging},
  year={2022}
}

Overview of the vqamix framework

In VQAMix, two image-question pairs {Vi, Qi} and {Vj, Qj} are mixed. When the mixed sample is sent to the VQA model, the linguistic feature extracted from Qi will interact with the visual feature extracted from Vj, which constructs a new connection {Vj, Qi}. So is {Vi, Qj}. Thus, the label for the mixed image-question pair consists of four answer labels (Yi for {Vi, Qi}, Yj for {Vj, Qj}, Yk for {Vi, Qj} and Yl for {Vj, Qi}). And the weights of those answer labels are the probabilities of occurrence of those imagequestion pairs. The answer A is encoded as a one-hot vector Y.

Details of the vqamix framework

An overview of our proposed VQAMix enhanced by Learning with Missing Labels (LML) and Learning with Conditional-mixed Labels (LCL) strategies. Two VQA samples are combined linearly in the training phase. To ensure that the mixed label can be used to supervise the learning of VQA models, both LML and LCL scheme discards those two unspecified labels to solve the missing labels issue. Moreover, to avoid meaningless answers, the LCL scheme further utilizes the category of the question to avoid the model suffering from meaningless mixed labels.

Prerequisites

torch 1.6.0+ torchvision 0.6+

Dataset and Pre-trained Models

The processed data should be downloaded via Baidu Drive with the extract code: ioz1. Or you can download the data from the previous work MMQlink. The downloaded file should be extracted to data_RAD/ and data_PATH directory.

The trained models are available at Baidu Drive with extract code: i800.

Training and Testing

Just run the run_rad.sh run_path.sh for training and evaluation. The result json file can be found in the directory results/.

Comaprison with the sota

Comparison

License

MIT License

More information

If you have any problem, no hesitate contact us at [email protected]

vqamix's People

Contributors

haifangong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

vqamix's Issues

dataset

Hello! Which file needs to be downloaded in the dataset folder and which file is generated by code? Could you please provide some more information? Thank you very much!

embed_tfidf_weights.pkl

请问embed_tfidf_weights.pkl和pretrained_ae.pth和biowordvec_init_200d.npy这些是怎样生成的,感谢您的回答

About the processed data

Thank you very much for your work, I think your work is significant for research in this field, but when we reproduce your code, we found that The link to this processed data that you provided is saved_Models, We would appreciate you if you could provide your processed data.

code for processing the datasets

Thank you very much for your work. Now I want to train the model with other datasets (such as VQA-Med). What do I do with my data? We would appreciate it if you could provide us with your code for processing the data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.