• Stars
    star
    168
  • Rank 225,507 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2022 Oral] PyTorch re-implementation for "MAXIM: Multi-Axis MLP for Image Processing", with *training code*. Official Jax repo: https://github.com/google-research/maxim

MAXIM: Multi-Axis MLP for Image Processing (CVPR 2022 Oral)

PWC PWC PWC PWC PWC PWC

PWC PWC

PWC PWC PWC PWC PWC

PWC PWC

This repo is a PyTorch re-implementation of [CVPR 2022 Oral] paper: "MAXIM: Multi-Axis MLP for Image Processing" by Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li

Google Research, University of Texas at Austin

Disclaimer: This repo is currently working in progress. No timelines are guaranteed.

News

  • April 12, 2022: Initialize PyTorch repo for MAXIM.
  • March 29, 2022: The official JAX code and models have been released at [google-research/maxim]
  • March 29, 2022: MAXIM is selected for an ORAL presentation at CVPR 2022 🎉
  • March 3, 2022: Paper accepted at CVPR 2022.

Abstract: Recent progress on Transformers and multi-layer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for low-level vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks. In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature conditioning. Both these modules are exclusively based on MLPs, but also benefit from being both global and `fully-convolutional', two properties that are desirable for image processing. Our extensive experimental results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks, including denoising, deblurring, deraining, dehazing, and enhancement while requiring fewer or comparable numbers of parameters and FLOPs than competitive models.


Architecture

Model overview

Installation

TBD

Results and Pre-trained models

TBD

Demo

Results

Image Denoising (click to expand)
Image Deblurring (click to expand)

Synthetic blur

Realistic blur

Image Deraining (click to expand)

Rain streak

Rain drop

Image Dehazing (click to expand)
Image Enhancement (click to expand)

Citation

Should you find this repository useful, please consider citing:

@article{tu2022maxim,
  title={MAXIM: Multi-Axis MLP for Image Processing},
  author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
  journal={CVPR},
  year={2022},
}

Acknowledgement

This repository is built on Restormer. Our work is also inspired by HiT, MPRNet, and HINet.

More Repositories

1

VIDEVAL

[IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
MATLAB
124
star
2

BVQA_Benchmark

A resource list and performance benchmark for blind video quality assessment (BVQA) models on user-generated content (UGC) datasets. [IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
Python
116
star
3

RAPIQUE

[IEEE OJSP'2021] "RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content", Zhengzhong Tu, Xiangxu Yu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
MATLAB
48
star
4

COVER

🏆 [CVPRW 2024] COVER: A Comprehensive Video Quality Evaluator. 🥇 Winner solution for Video Quality Assessment Challenge at the 1st AIS 2024 workshop @ CVPR 2024
Python
37
star
5

MAXIM

[CVPR 2022] Unofficial repository for "MAXIM: Multi-Axis MLP for Image Processing". Official repo: https://github.com/google-research/maxim
20
star
6

Temporal_Pooling

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment
Python
7
star
7

VMEON-pytorch

This is a GitHub copy of [ACM Multimedia'18] End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks.
Python
4
star
8

DebandingNet

Python
3
star
9

LIVE-YT-Banding-Database

This is the repository for LIVE-YouTube banding database.
2
star
10

zhengzhongtu

Personal Website
TypeScript
1
star