text2image
This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN
This repo is not completely.
Network Structure
The structure of the spatial-semantic aware convolutional network (SSACN) is shown as below
Requirements
- python 3.6+
- pytorch 1.0+
- numpy
- matplotlib
- opencv
Or install full requirements by running:
pip install -r requirements.txt
TODO
- instruction to prepare dataset
- remove all unnecessary files
- add link to download our pre-trained model
- clean code including comments
- instruction for training
- instruction for evaluation
Prepare data
- Download the preprocessed metadata for birds coco and save them to
data/
- Download the birds image data. Extract them to
data/birds/
- Download coco dataset and extract the images to
data/coco/
Pre-trained text encoder
- Download the pre-trained text encoder for CUB and save it to
DAMSMencoders/bird/inception/
- Download the pre-trained text encoder for coco and save it to
DAMSMencoders/coco/inception/
Trained model
you can download our trained models from our onedrive repo
Start training
See opts.py
for the options.
Evaluation
please run IS.py
and test_lpips.py
(remember to change the image path) to evaluate the IS
and diversity
scores, respectively.
For evaluating the FID
score, please use this repo https://github.com/bioinf-jku/TTUR.
Performance
You will get the scores close to below after training under xe loss for xxxxx epochs:
Qualitative Results
Some qualitative results on coco and birds dataset from different methods are shown as follows:
The predicted mask maps on different stages are shown as as follows:
Reference
If you find this repo helpful in your research, please consider citing our paper:
@article{liao2021text,
title={Text to Image Generation with Semantic-Spatial Aware GAN},
author={Liao, Wentong and Hu, Kai and Yang, Michael Ying and Rosenhahn, Bodo},
journal={arXiv preprint arXiv:2104.00567},
year={2021}
}
The code is released for academic research use only. For commercial use, please contact Wentong Liao.
Acknowledgements
This implementation borrows part of the code from DF-GAN.