• Stars
    star
    773
  • Rank 58,789 (Top 2 %)
  • Language
    TeX
  • Created almost 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era


arXiv Survey Maintenance PR's Welcome GitHub license

This project is associated with our survey paper which comprehensively contextualizes the advance of the recent Multimodal Image Synthesis & Editing (MISE) and formulates taxonomies according to data modality and model architectures. The survey is featured on DeepAI and 机器之心.

Multimodal Image Synthesis and Editing: A Survey and Taxonomy [Paper] [Project]
Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewsk,
Christian Theobalt, Eric Xing


PR's Welcome You are welcome to promote papers via pull request.
The process to submit a pull request:

  • a. Fork the project into your own repository.
  • b. Add the Title, Author, Conference, Paper link, Project link, and Code link in README.md with below format:
**Title**<br>
*Author*<br>
Conference
[[Paper](Paper link)]
[[Code](Project link)]
[[Project](Code link)]
  • c. Submit the pull request to this branch.

Related Surveys & Projects

Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, Andreas Dengel
Neural Networks 2021 [Paper]

GAN Inversion: A Survey
Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
TPAMI 2022 [Paper] [Project]

Deep Image Synthesis from Intuitive User Input: A Review and Perspectives
Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang
Computational Visual Media 2022 [Paper]

Awesome-Text-to-Image


Table of Contents (Work in Progress)

Methods:

Modalities & Datasets:

NeRF-based-Methods

Local 3D Editing via 3D Distillation of CLIP Knowledge
Junha Hyung, Sungwon Hwang, Daejin Kim, Hyunji Lee, Jaegul Choo
CVPR 2023 [Paper]

RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models
Xingchen Zhou, Ying He, F. Richard Yu, Jianqiang Li, You Li
IJCAI 2023 [Paper]

DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation
Yukun Huang, Jianan Wang, Yukai Shi, Xianbiao Qi, Zheng-Jun Zha, Lei Zhang
arxiv 2023 [Paper] [Project]

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia, Mallikarjun B R, Ayush Tewari, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt
arxiv 2023 [Paper] [Project]

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields
Ori Gordon, Omri Avrahami, Dani Lischinski
arxiv 2023 [Paper] [Project]

OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields
Youtan Yin, Zhoujie Fu, Fan Yang, Guosheng Lin
arxiv 2023 [Paper] [Project] [Code]

HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance
Junzhe Zhu, Peiye Zhuang
arxiv 2023 [Paper] [Project]

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
arxiv 2023 [Paper] [Project]

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields
Jingbo Zhang, Xiaoyu Li, Ziyu Wan, Can Wang, Jing Liao
arxiv 2023 [Paper] [Project]

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong
arxiv 2023 [Paper] [Project]

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model
Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun
arxiv 2023 [Paper] [Project] [Code]

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout
Yiqi Lin, Haotian Bai, Sijia Li, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang
arxiv 2023 [Paper]

Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes
Dana Cohen-Bar, Elad Richardson, Gal Metzer, Raja Giryes, Daniel Cohen-Or
arxiv 2023 [Paper] [Project] [Code]

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa
arxiv 2023 [Paper] [Project] [Code]

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Jaehoon Ko, Hyeonsu Kim, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim
arxiv 2023 [Paper] [Project] [Code]

Text-To-4D Dynamic Scene Generation
Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman
arxiv 2023 [Paper] [Project]

Magic3D: High-Resolution Text-to-3D Content Creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin
CVPR 2023 [Paper] [Project]

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
Gwanghyun Kim, Se Young Chun
CVPR 2023 [Paper] [Code] [Project]

Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models
Gang Li, Heliang Zheng, Chaoyue Wang, Chang Li, Changwen Zheng, Dacheng Tao
arxiv 2022 [Paper] [Project]

DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall
arxiv 2022 [Paper] [Project]

Zero-Shot Text-Guided Object Generation with Dream Fields
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole
CVPR 2022 [Paper] [Code] [Project]

IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu
SIGGRAPH Asia 2022 [Paper] [Code] [Project]

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields
Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
arxiv 2022 [Paper] [Code] [Project]

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
CVPR 2022 [Paper] [Code] [Project]

CG-NeRF: Conditional Generative Neural Radiance Fields
Kyungmin Jo, Gyumin Shim, Sanghun Jung, Soyoung Yang, Jaegul Choo
arxiv 2021 [Paper]

Zero-Shot Text-Guided Object Generation with Dream Fields
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole
arxiv 2021 [Paper] [Project]

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis
Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, Juyong Zhang
ICCV 2021 [Paper] [Code] [Project] [Video]


Diffusion-based-Methods

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Dongxu Li, Junnan Li, Steven C.H. Hoi
Arxiv 2023 [Paper] [Project] [Code]

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka
Arxiv 2023 [Paper] [Project] [Code]

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani Yael, Pritch Michael, Rubinstein Kfir Aberman
CVPR 2023 [Paper] [Project] [Code]

Multi-Concept Customization of Text-to-Image Diffusion
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu
CVPR 2023 [Paper] [Project] [Code]

Collaborative Diffusion for Multi-Modal Face Generation and Editing
Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
CVPR 2023 [Paper] [Project] [Code]

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Narek Tumanyan, Michal Geyer, Shai Bagon, Tali Dekel
CVPR 2023 [Paper] [Project] [Code]

SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren
CVPR 2023 [Paper] [Project] [Code]

NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or
CVPR 2023 [Paper] [Project] [Code]

Paint by Example: Exemplar-Based Image Editing With Diffusion Models
Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen
CVPR 2023 [Paper] [Demo] [Code]

SpaText: Spatio-Textual Representation for Controllable Image Generation
Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin
CVPR 2023 [Paper] [Project]

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis
CVPR 2023 [Paper] [Project]

InstructPix2Pix Learning to Follow Image Editing Instructions
Tim Brooks, Aleksander Holynski, Alexei A. Efros
CVPR 2023 [Paper] [Project] [[Code]https://github.com/timothybrooks/instruct-pix2pix)]

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
Nithin Gopalakrishnan Nair, Chaminda Bandara, Vishal M Patel
CVPR 2023 [Paper] [Project] [Code]

DiffEdit: Diffusion-based semantic image editing with mask guidance
Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord
CVPR 2023 [Paper]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu
Arxiv 2022 [Paper] [Project]

Prompt-to-Prompt Image Editing with Cross-Attention Control
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman1 Yael Pritch, Daniel Cohen-Or
Arxiv 2022 [Paper] [Project] [Code]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
Arxiv 2022 [Paper] [Project] [Code]

Text2Human: Text-Driven Controllable Human Image Generation
Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu
SIGGRAPH 2022 [Paper] [Project] [Code]

[DALL-E 2] Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
[Paper] [Code]

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
CVPR 2022 [Paper] [Code]

v objective diffusion
Katherine Crowson
[Code]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen
arxiv 2021 [Paper] [Code]

Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo
arxiv 2021 [Paper] [Code]

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Gwanghyun Kim, Jong Chul Ye
arxiv 2021 [Paper]

Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami, Dani Lischinski, Ohad Fried
CVPR 2022 [Paper] [Project] [Code]


Autoregressive-Methods

MaskGIT: Masked Generative Image Transformer
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
arxiv 2022 [Paper]

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
arxiv 2021 [Paper] [Project]

NÃœWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan
arxiv 2021 [Paper] [Code] [Video]

L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae
arxiv 2021 [Paper] [Code]

M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
NeurIPS 2021 [Paper]

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer
NeurIPS 2021 [Paper] [Code] [Project]

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu
ACM MM 2021 [Paper] [Code]

Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu
ACM MM 2021 [Paper] [Code]

Taming Transformers for High-Resolution Image Synthesis
Patrick Esser, Robin Rombach, Björn Ommer
CVPR 2021 [Paper] [Code] [Project]

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
Alex Shonenkov and Michael Konstantinov
arxiv 2022 [Code]

Generate Images from Texts in Russian (ruDALL-E)
[Code] [Project]

Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever
arxiv 2021 [Paper] [Code] [Project]

Compositional Transformers for Scene Generation
Drew A. Hudson, C. Lawrence Zitnick
NeurIPS 2021 [Paper] [Code]

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi
EMNLP 2020 [Paper] [Code]

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning
Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
AAAI 2022 [Paper]


Image-Quantizer

[TE-VQGAN] Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation
Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi
arxiv 2021 [Paper] [Code]

[ViT-VQGAN] Vector-quantized Image Modeling with Improved VQGAN
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu
arxiv 2021 [Paper]

[PeCo] PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu
arxiv 2021 [Paper]

[VQ-GAN] Taming Transformers for High-Resolution Image Synthesis
Patrick Esser, Robin Rombach, Björn Ommer
CVPR 2021 [Paper] [Code]

[Gumbel-VQ] vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski, Steffen Schneider, Michael Auli
ICLR 2020 [Paper] [Code]

[EM VQ-VAE] Theory and Experiments on Vector Quantized Autoencoders
Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar
arxiv 2018 [Paper] [Code]

[VQ-VAE] Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu
NIPS 2017 [Paper] [Code]

[VQ-VAE2 or EMA-VQ] Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi, Aaron van den Oord, Oriol Vinyals
NIPS 2019 [Paper] [Code]

[Discrete VAE] Discrete Variational Autoencoders
Jason Tyler Rolfe
ICLR 2017 [Paper] [Code]

[DVAE++] DVAE++: Discrete Variational Autoencoders with Overlapping Transformations
Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash
ICML 2018 [Paper] [Code]

[DVAE#] DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors
Arash Vahdat, Evgeny Andriyash, William G. Macready
NIPS 2018 [Paper] [Code]


GAN-based-Methods

GauGAN2
NVIDIA
[Project] [Video]

Multimodal Conditional Image Synthesis with Product-of-Experts GANs
Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu
arxiv 2021 [Paper]

RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis from Constrained Prior Knowledge
Jun Cheng, Fuxiang Wu, Yanling Tian, Lei Wang, Dapeng Tao
TCSVT 2021 [Paper]

TRGAN: Text to Image Generation Through Optimizing Initial Image
Liang Zhao, Xinwei Li, Pingda Huang, Zhikui Chen, Yanqi Dai, Tianyu Li
ICONIP 2021 [Paper]

Audio-Driven Emotional Video Portraits [Audio2Image]
Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu
CVPR 2021 [Paper] [Code] [Project]

SketchyCOCO: Image Generation from Freehand Scene Sketches
Chengying Gao, Qi Liu, Qi Xu, Limin Wang, Jianzhuang Liu, Changqing Zou
CVPR 2020 [Paper] [Code] [Project]

Direct Speech-to-Image Translation [Audio2Image]
Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao
JSTSP 2020 [Paper] [Code] [Project]

MirrorGAN: Learning Text-to-image Generation by Redescription [Text2Image]
Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao
CVPR 2019 [Paper] [Code]

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [Text2Image]
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He
CVPR 2018 [Paper] [Code]

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
CVPR 2017 [Paper] [Code]

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
TPAMI 2018 [Paper] [Code]

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
ICCV 2017 [Paper] [Code]


GAN-Inversion-Methods

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt
SIGGRAPH 2023 [Paper] [Code]

HairCLIP: Design Your Hair by Text and Reference Image
Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu
arxiv 2021 [Paper] [Code]

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+ GAN Space Optimization
Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, Qiang Liu
arxiv 2021 [Paper] [Code]

StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation
Umut Kocasari, Alara Dirik, Mert Tiftikci, Pinar Yanardag
WACV 2022 [Paper] [Code] [Project]

Cycle-Consistent Inverse GAN for Text-to-Image Synthesis
Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao
ACM MM 2021 [Paper]

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski
ICCV 2021 [Paper] [Code] [Video]

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
ICCV 2021 [Paper] [Code] [Project]

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu
CVPR 2021 [Paper] [Code] [Video]

Paint by Word
David Bau, Alex Andonian, Audrey Cui, YeonHwan Park, Ali Jahanian, Aude Oliva, Antonio Torralba
arxiv 2021 [Paper]


Other-Methods

Language-Driven Image Style Transfer
Tsu-Jui Fu, Xin Eric Wang, William Yang Wang
arxiv 2021 [Paper]

CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon, Jong Chul Ye
arxiv 2021 [Paper] [Code]



Text-Encoding

FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela
arxiv 2021 [Paper]

Learning Transferable Visual Models From Natural Language Supervision (CLIP)
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
arxiv 2021 [Paper] [Code]


Audio-Encoding

Wav2CLIP: Learning Robust Audio Representations From CLIP (Wav2CLIP)
Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello
ICASSP 2022 [Paper] [Code]

Datasets

Multimodal CelebA-HQ (https://github.com/IIGROUP/MM-CelebA-HQ-Dataset)

DeepFashion MultiModal (https://github.com/yumingj/DeepFashion-MultiModal)