facebookresearch/MaskFormer

Stars
1,149
Rank 40,592 (Top 0.8 %)
Language
Python
License
Other
Created over 3 years ago
Updated almost 3 years ago

facebookresearch/MaskFormer

facebookresearch

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Per-Pixel Classification is Not All You Need for Semantic Segmentation (NeurIPS 2021, spotlight)

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation

Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

[arXiv] [Project] [BibTeX]

Mask2Former

Checkout Mask2Former, a universal architecture based on MaskFormer meta-architecture that achieves SOTA on panoptic, instance and semantic segmentation across four popular datasets (ADE20K, Cityscapes, COCO, Mapillary Vistas).

Features

Better results while being more efficient.
Unified view of semantic- and instance-level segmentation tasks.
Support major semantic segmentation datasets: ADE20K, Cityscapes, COCO-Stuff, Mapillary Vistas.
Support ALL Detectron2 models.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for MaskFormer.

See Getting Started with MaskFormer.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the MaskFormer Model Zoo.

License

Shield:

The majority of MaskFormer is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license.

Citing MaskFormer

If you use MaskFormer in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}

llama

Inference code for LLaMA models

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook

Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

fastText

Library for fast text representation and classification.

faiss

A library for efficient similarity search and clustering of dense vectors.

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

codellama

Inference code for CodeLlama models

sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook

detr

End-to-End Object Detection with Transformers

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook

ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

pifuhd

High-Resolution 3D Human Digitization from A Single Image.

hydra

Hydra is a framework for elegantly configuring complex applications

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

AnimatedDrawings

Code to accompany "A Method for Animating Children's Drawings of the Human Figure"

ImageBind

ImageBind One Embedding Space to Bind Them All

llama-recipes

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

Jupyter Notebook

pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Jupyter Notebook

DensePose

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

Jupyter Notebook

pytext

A natural language modeling framework based on PyTorch

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

metaseq

Repo for external large-scale work

demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

ConvNeXt

Code release for ConvNeXt model

dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

AugLy

A data augmentations library for audio, image, text, and video.

Kats

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

DrQA

Reading Wikipedia to Answer Open-Domain Questions

sapiens

High-resolution models for human tasks.

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722

StarSpace

Learning embeddings for classification, retrieval and ranking.

lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

fairseq-lua

Facebook AI Research Sequence-to-Sequence Toolkit

nevergrad

A Python toolbox for performing gradient-free optimization

deit

Official DeiT repository

dlrm

An implementation of a deep learning recommendation model (DLRM)

ReAgent

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

LASER

Language-Agnostic SEntence Representations

VideoPose3D

Efficient 3D human pose estimation in video using 2D keypoint trajectories

PyTorch-BigGraph

Generate embeddings from large-scale graph-structured data.

deepmask

Torch implementation of DeepMask and SharpMask

MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings

vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

Jupyter Notebook

pytorchvideo

A deep learning library for video understanding research.

XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining.

audio2photoreal

Code and dataset for photorealistic Codec Avatars driven from audio

ijepa

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.

co-tracker

CoTracker is a model for tracking any point (pixel) on a video.

Jupyter Notebook

hiplot

HiPlot makes understanding high dimensional data easy

fairscale

PyTorch extensions for high performance and large scale training.

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

InferSent

InferSent sentence embeddings

Jupyter Notebook

Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

pyrobot

PyRobot: An Open Source Robotics Research Platform

darkforestGo

DarkForest, the Facebook Go engine.

ELF

An End-To-End, Lightweight and Flexible Platform for Game Research

pycls

Codebase for Image Classification Research, written in PyTorch.

esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

frankmocap

A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

video-nonlocal-net

Non-local Neural Networks for Video Classification

SentEval

A python tool for evaluating the quality of sentence embeddings.

habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

ResNeXt

Implementation of a classification framework from the paper Aggregated Residual Transformations for Deep Neural Networks

SparseConvNet

Submanifold sparse convolutional networks

schedule_free

Schedule-Free Optimization in PyTorch

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

swav

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

TensorComprehensions

A domain specific language to express machine learning workloads.

Mask2Former

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

fvcore

Collection of common code that's shared among different research projects in FAIR computer vision team.

TransCoder

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf

poincare-embeddings

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

votenet

Deep Hough Voting for 3D Object Detection in Point Clouds

pytorch_GAN_zoo

A mix of GAN implementations including progressive growing

ClassyVision

An end-to-end PyTorch framework for image and video classification

deepcluster

Deep Clustering for Unsupervised Learning of Visual Features

higher

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

UnsupervisedMT

Phrase-Based & Neural Unsupervised Machine Translation

consistent_depth

We estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.

ConvNeXt-V2

Code release for ConvNeXt V2 model

Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

end-to-end-negotiator

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

DomainBed

DomainBed is a suite to test domain generalization algorithms

multipathnet

A Torch implementation of the object detection network from "A MultiPath Network for Object Detection" (https://arxiv.org/abs/1604.02135)

CommAI-env

A platform for developing AI systems as described in A Roadmap towards Machine Intelligence - http://arxiv.org/abs/1511.08130

theseus

A library for differentiable nonlinear optimization

DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

CrypTen

A framework for Privacy Preserving Machine Learning

denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

DeepSDF

Learning Continuous Signed Distance Functions for Shape Representation