Spaces:
Build error
Build error
| title: Text-to-Image Synthesis with AttnGAN | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: gray | |
| sdk: streamlit | |
| sdk_version: 1.40.1 | |
| app_file: app.py | |
| pinned: false | |
| #### Python 3.7+ and Pytorch 1.x | |
| Referenced from: https://github.com/taoxugit/AttnGAN | |
| ## Play with this model: [Demo Link](https://share.streamlit.io/gladiator07/text-to-image-synthesis-with-attngan/main/app.py) | |
| ## Sneak-peek into the webapp | |
|  | |
|  | |
| # AttnGAN | |
| Pytorch implementation for reproducing AttnGAN results in the paper [AttnGAN: Fine-Grained Text to Image Generation | |
| with Attentional Generative Adversarial Networks](http://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_AttnGAN_Fine-Grained_Text_CVPR_2018_paper.pdf) by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. (This work was performed when Tao was an intern with Microsoft Research). | |
|  | |
| **Data** | |
| 1. Download preprocessed metadata for [birds](https://drive.google.com/open?id=1O_LtUP9sch09QH3s_EBAgLEctBQ5JBSJ) [coco](https://drive.google.com/open?id=1rSnbIGNDGZeHlsUlLdahj0RJ9oo6lgH9) and save them to `data/` | |
| 2. Download the [birds](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) image data. Extract them to `data/birds/` | |
| 3. Download [coco](http://cocodataset.org/#download) dataset and extract the images to `data/coco/` | |
| **Training** | |
| - Pre-train DAMSM models: | |
| - For bird dataset: `python pretrain_DAMSM.py --cfg cfg/DAMSM/bird.yml --gpu 0` | |
| - For coco dataset: `python pretrain_DAMSM.py --cfg cfg/DAMSM/coco.yml --gpu 1` | |
| - Train AttnGAN models: | |
| - For bird dataset: `python main.py --cfg cfg/bird_attn2.yml --gpu 2` | |
| - For coco dataset: `python main.py --cfg cfg/coco_attn2.yml --gpu 3` | |
| - `*.yml` files are example configuration files for training/evaluation our models. | |
| **Pretrained Model** | |
| - [DAMSM for bird](https://drive.google.com/open?id=1GNUKjVeyWYBJ8hEU-yrfYQpDOkxEyP3V). Download and save it to `DAMSMencoders/` | |
| - [DAMSM for coco](https://drive.google.com/open?id=1zIrXCE9F6yfbEJIbNP5-YrEe2pZcPSGJ). Download and save it to `DAMSMencoders/` | |
| - [AttnGAN for bird](https://drive.google.com/open?id=1lqNG75suOuR_8gjoEPYNp8VyT_ufPPig). Download and save it to `models/` | |
| - [AttnGAN for coco](https://drive.google.com/open?id=1i9Xkg9nU74RAvkcqKE-rJYhjvzKAMnCi). Download and save it to `models/` | |
| - [AttnDCGAN for bird](https://drive.google.com/open?id=19TG0JUoXurxsmZLaJ82Yo6O0UJ6aDBpg). Download and save it to `models/` | |
| - This is an variant of AttnGAN which applies the proposed attention mechanisms to DCGAN framework. | |
| **Sampling** | |
| - Run `python main.py --cfg cfg/eval_bird.yml --gpu 1` to generate examples from captions in files listed in "./data/birds/example_filenames.txt". Results are saved to `DAMSMencoders/`. | |
| - Change the `eval_*.yml` files to generate images from other pre-trained models. | |
| - Input your own sentence in "./data/birds/example_captions.txt" if you wannt to generate images from customized sentences. | |
| **Validation** | |
| - To generate images for all captions in the validation dataset, change B_VALIDATION to True in the eval_*.yml. and then run `python main.py --cfg cfg/eval_bird.yml --gpu 1` | |
| - We compute inception score for models trained on birds using [StackGAN-inception-model](https://github.com/hanzhanggit/StackGAN-inception-model). | |
| - We compute inception score for models trained on coco using [improved-gan/inception_score](https://github.com/openai/improved-gan/tree/master/inception_score). | |
| ### Creating an API | |
| [Evaluation code](eval) embedded into a callable containerized API is included in the `eval\` folder. | |
| ### Citing AttnGAN | |
| If you find AttnGAN useful in your research, please consider citing: | |
| ``` | |
| @article{Tao18attngan, | |
| author = {Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He}, | |
| title = {AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks}, | |
| Year = {2018}, | |
| booktitle = {{CVPR}} | |
| } | |
| ``` | |
| **Reference** | |
| - [StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks](https://arxiv.org/abs/1710.10916) [[code]](https://github.com/hanzhanggit/StackGAN-v2) | |
| - [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434) [[code]](https://github.com/carpedm20/DCGAN-tensorflow) | |
| ### References | |
| - [Research Paper](https://arxiv.org/abs/1711.10485) | |
| - [Explanation of the paper](https://www.youtube.com/watch?v=Epvh4EvznUA) | |
| - [Python 3.x implementation - Tensorflow](https://github.com/taki0112/AttnGAN-Tensorflow) | |
| - [Python 2.x implementation - PyTorch](https://github.com/taoxugit/AttnGAN) | |
| #### Note: This is a rough Readme as I am quite overloaded with work right now, this Readme will be updated soon with all the details (results, benchmarks, training hardware, model configurations, etc) |