Enhance the AttnGAN model using state-of-the-art technology such as BERT and CLIP models for richer text interpretation and more detailed image outputs.
- master: The original branch of AttGAN updated to latest torch versions with improved-gan from OpenAI for Inception Score calculation. Serves as our baseline for DAMSM with RNN text encoder and CNN image encoder.
- bert: DAMSM with BERT based text encoder and CNN image encoder
- clip: DAMSM with RNN based text encoder and CLIP image encoder
- clip-text-image: DAMSM with CLIP text encoder and CLIP image encoder
- bert-clip: DAMSM with BERT text encoder and CLIP image encoder
- Download preprocessed metadata forcoco and save them to
data/
- Download coco dataset and extract the images to
data/coco/
pip install
the following packages:
python-dateutil
easydict
pandas
torchfile
nltk
scikit-image==0.19.0
torch