• Stars
    star
    145
  • Rank 254,144 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Latent-based SR using MoE and frequency augmented VAE decoder

IMAGE SUPER-RESOLUTION VIA LATENT DIFFUSION: A SAMPLING-SPACE MIXTURE OF EXPERTS AND FREQUENCY-AUGMENTED DECODER APPROACH

OverView

The recent use of diffusion prior, enhanced by pre-trained text-image models, has markedly elevated the performance of image super-resolution (SR). In this project, we propose Frequency-Augmented VAE(FA_VAE), a frequency compensation module that enhances the frequency components and thus alleviates the reconstruction distortion caused by compression of latent space. As FA_VAE is a stand-alone module, we also apply it to Image Reconstruction and Text-to-Image Generation and show examples in the repo. In addition, we propose to use Sample-Space Mixture of Experts (SS-MoE) to achieve more powerful latent-based SR, which steadily improves the capacity of the model without a significant increase in inference costs.

Dependencies and Installation

# git clone this repository
git clone https://github.com/tencent-ailab/Frequency_Aug_VAE_MoESR.git
cd MOE_SR

# Create a conda environment and activate it
conda env create --file my_environment.yaml
conda activate moe_sr

# Install xformers(optionally)
conda install xformers -c xformers/label/dev

# Install taming
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e .

Inference

Super Resolution

8x SR

Download model_stage{x}.ckpt and vq-f4/fa_vae.pth to sr_8x_inf/models and run the following command.

cd sr_8x_inf
sh inf_moe_8x.sh

Txt2Img with FA_VAE

Latent-based Text-to-Image Generation(eg. SD1.5) also suffers from VAE's reconstruction accuracy, so the FA_VAE we proposed in SR can also be useful. We customize and train a FA_VAE for SD1.5. To be specific, lr input and lr-conditioned fuse layers are removed, as shown in the following figure.

For practice, the only thing we need to do is to replace the original decoder with FA_VAE decoder. Download SD1.5 base models(eg. realv) and kl-f8/fa_vae.pth and set path in get latent.py and decode.py, then run the following command.

cd vae_txt2img_inf
# get diffusion latent
python3 get_latent.py
# decode to get images
python3 decode.py

Reuslts

8x SR

Quality Result

Quantity Result

Image Rescontruction with FA_VAE

Benefiting from the frequency-augmented decoder, it can be seen that distortion, especially face distortion, can be fixed mostly when comparing the updated VAE and SD1.5's original VAE. We evaluate the reconstruction performance on coco val 2017 and a private dataset that consists of 1000 high-quality images collected from the Internet. The evaluation tool is from IQA-PyTorch.

Quality Result

Quantity Result

COCO 2017 (256x256, val, 5000 images)

Model psnr ssim lpips fid
sd15 original 25.40 0.7418 0.0746 17.66
ours 26.04 0.7576 0.0702 16.12

Private test set (256x256, 1000 images)

Model psnr ssim lpips fid
sd15 original 27.64 0.8376 0.0524 19.62
ours 28.50 0.8563 0.0424 15.71

Txt2Img with FA_VAE

Furthermore, we validate the effectiveness of FA_VAE on Text-to-Image Generation. As shown in the figure, it can also restore the distortion compared with the original VAE. Note that, we only replace the SD1.5's decoder with FA_VAE, which means it is compatible with all SD1.5 base models.

Citation

Please cite us if our work is useful for your research.

@inproceedings{luo2023Image,
    author = {Luo, Feng and Xiang, Jinxi and Zhang, Jun and Han, Xiao and Yang, Wei},
    title = {Image Super-resolution via Latent Diffusion: a Sampling-space Mixture of Experts and Frequency-augmented Decoder Approach},
    booktitle = {arXiv preprint arXiv:2310.12004},
    year = {2023}
}

License

This project is released under the Apache 2.0 license.

Acknowledgement

This project is based on StableSR, Latent Diffusion and BasicSR. Thanks for their awesome work.

Contact

If you have any questions, please feel free to contact with me at [email protected].

More Repositories

1

IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Jupyter Notebook
5,177
star
2

V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
Python
2,182
star
3

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Python
768
star
4

hifi3dface

Code and data for our paper "High-Fidelity 3D Digital Human Creation from RGB-D Selfies".
Python
758
star
5

hok_env

Honor of Kings AI Open Environment of Tencent
Python
616
star
6

pika

a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi
Python
338
star
7

grover

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data
Python
325
star
8

bddm

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
Python
217
star
9

FRA-RIR

Python
169
star
10

PCDMs

Implementation code:Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
Jupyter Notebook
150
star
11

DrugOOD

OOD Dataset Curator and Benchmark for AI-aided Drug Discovery
Python
149
star
12

tleague_projpage

Jinja
135
star
13

3m-asr

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Python
119
star
14

TLeague

Python
79
star
15

RLogist

RLogist = RL (reinforcement learning) + Pathologist
Python
65
star
16

CogKernel

Python
44
star
17

MDM

MDM
Python
43
star
18

UltraDualPathCompression

A Pytorch-based implementation of the compression and decompression module in "Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression".
Jupyter Notebook
36
star
19

Lodoss

Python
34
star
20

mini-hok

Mini HoK: a novel MARL benchmark based on the popular mobile game, Honor of Kings, to address limitations in existing environments such as complexity and accessibility.
Python
29
star
21

TriNet

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.
Python
26
star
22

ICML21_OAXE

Python
25
star
23

season

[EMNLP 2022] Salience Allocation as Guidance for Abstractive Summarization
Python
22
star
24

hokoff

Python
21
star
25

Leopard

The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"
18
star
26

hifi3dface_projpage

Project page for our paper "High-Fidelity 3D Digital Human Creation from RGB-D Selfies".
HTML
16
star
27

GrndPodcastSum

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"
Python
15
star
28

OASum

13
star
29

EMNLP21_SemEq

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".
Python
12
star
30

learning_singing_from_speech

Project page for our paper "DurIAN : DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System".
10
star
31

valuationgame

Jupyter Notebook
9
star
32

Arena

Python
8
star
33

MetaLogic

Python
8
star
34

ZED

This is the repository for EMNLP 2022 paper "Efficient Zero-shot Event Extraction with Context-Definition Alignment"
Python
8
star
35

machine-translation

Open source on machine translation
7
star
36

TPolicies

Python
6
star
37

zebra-inference

Python
5
star
38

Interformer

Jupyter Notebook
5
star
39

FOLNet

This repository includes the code for First-Order Logic Network (FOLNet).
Python
4
star
40

TLeagueAutoBuild

Python
4
star
41

TImitate

Python
2
star
42

siam

2
star