Generative Adversarial Text to Image Synthesis
mixing via deep representations. In ICML, 2013.
Denton, E. L., Chintala, S., Fergus, R., et al. Deep gener-
ative image models using a laplacian pyramid of adver-
sarial networks. In NIPS, 2015.
Donahue, J., Hendricks, L. A., Guadarrama, S., Rohrbach,
M., Venugopalan, S., Saenko, K., and Darrell, T. Long-
term recurrent convolutional networks for visual recog-
nition and description. In CVPR, 2015.
Dosovitskiy, A., Tobias Springenberg, J., and Brox, T.
Learning to generate chairs with convolutional neural
networks. In CVPR, 2015.
Farhadi, A., Endres, I., Hoiem, D., and Forsyth, D. De-
scribing objects by their attributes. In CVPR, 2009.
Fu, Y., Hospedales, T. M., Xiang, T., Fu, Z., and Gong, S.
Transductive multi-view embedding for zero-shot recog-
nition and annotation. In ECCV, 2014.
Gauthier, J. Conditional generative adversarial nets for
convolutional face generation. Technical report, 2015.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Bengio,
Y. Generative adversarial nets. In NIPS, 2014.
Gregor, K., Danihelka, I., Graves, A., Rezende, D., and
Wierstra, D. Draw: A recurrent neural network for image
generation. In ICML, 2015.
Hochreiter, S. and Schmidhuber, J. Long short-term mem-
ory. Neural computation, 9(8):1735–1780, 1997.
Ioffe, S. and Szegedy, C. Batch normalization: Accelerat-
ing deep network training by reducing internal covariate
shift. In ICML, 2015.
Karpathy, A. and Li, F. Deep visual-semantic alignments
for generating image descriptions. In CVPR, 2015.
Kiros, R., Salakhutdinov, R., and Zemel, R. S. Unify-
ing visual-semantic embeddings with multimodal neural
language models. In ACL, 2014.
Kumar, N., Berg, A. C., Belhumeur, P. N., and Nayar, S. K.
Attribute and simile classifiers for face verification. In
ICCV, 2009.
Lampert, C. H., Nickisch, H., and Harmeling, S. Attribute-
based classification for zero-shot visual object catego-
rization. TPAMI, 36(3):453–465, 2014.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. Microsoft
coco: Common objects in context. In ECCV. 2014.
Mansimov, E., Parisotto, E., Ba, J. L., and Salakhutdi-
nov, R. Generating images from captions with attention.
ICLR, 2016.
Mao, J., Xu, W., Yang, Y., Wang, J., and Yuille, A. Deep
captioning with multimodal recurrent neural networks
(m-rnn). ICLR, 2015.
Mirza, M. and Osindero, S. Conditional generative adver-
sarial nets. arXiv preprint arXiv:1411.1784, 2014.
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng,
A. Y. Multimodal deep learning. In ICML, 2011.
Parikh, D. and Grauman, K. Relative attributes. In ICCV,
2011.
Radford, A., Metz, L., and Chintala, S. Unsupervised rep-
resentation learning with deep convolutional generative
adversarial networks. 2016.
Reed, S., Sohn, K., Zhang, Y., and Lee, H. Learning to dis-
entangle factors of variation with manifold interaction.
In ICML, 2014.
Reed, S., Zhang, Y., Zhang, Y., and Lee, H. Deep visual
analogy-making. In NIPS, 2015.
Reed, S., Akata, Z., Lee, H., and Schiele, B. Learning deep
representations for fine-grained visual descriptions. In
CVPR, 2016.
Ren, M., Kiros, R., and Zemel, R. Exploring models and
data for image question answering. In NIPS, 2015.
Sohn, K., Shang, W., and Lee, H. Improved multimodal
deep learning with variation of information. In NIPS,
2014.
Srivastava, N. and Salakhutdinov, R. R. Multimodal learn-
ing with deep boltzmann machines. In NIPS, 2012.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. Going deeper with convolutions. In CVPR,
2015.
Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. Show
and tell: A neural image caption generator. In CVPR,
2015.
Wah, C., Branson, S., Welinder, P., Perona, P., and Be-
longie, S. The caltech-ucsd birds-200-2011 dataset.
2011.
Wang, P., Wu, Q., Shen, C., Hengel, A. v. d., and Dick, A.
Explicit knowledge-based reasoning for visual question
answering. arXiv preprint arXiv:1511.02570, 2015.