Discover facebookresearch/sapiens Open Source project

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

44,989

segment-anything

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

42,134

Detectron

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

25,771

fairseq

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

25,718

detectron2

Library for fast text representation and classification.

25,567

fastText

HTML

24,973

faiss

A library for efficient similarity search and clustering of dense vectors.

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

24,035

audiocraft

Inference code for CodeLlama models

19,691

codellama

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

13,303

sam2

End-to-End Object Detection with Transformers

11,906

detr

Foundational Models for State-of-the-Art Speech and Text Translation

11,076

seamless_communication

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

10,584

ParlAI

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

10,085

maskrcnn-benchmark

High-Resolution 3D Human Digitization from A Single Image.

9,104

pifuhd

Hydra is a framework for elegantly configuring complex applications

8,923

hydra

Implementation of Nougat Neural Optical Understanding for Academic Documents

8,550

nougat

Code to accompany "A Method for Animating Children's Drawings of the Human Figure"

8,088

AnimatedDrawings

ImageBind One Embedding Space to Bind Them All

8,032

ImageBind

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

7,630

llama-recipes

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

7,402

pytorch3d

PyTorch code and models for the DINOv2 self-supervised learning method.

7,322

dinov2

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

7,278

DensePose

A natural language modeling framework based on PyTorch

6,547

pytext

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

6,357

DiT

Repo for external large-scale work

5,995

metaseq

Code for the paper Hybrid Spectrogram and Waveform Source Separation

5,947

demucs

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

5,886

SlowFast

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

5,678

mae

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

5,495

mmf

Code release for ConvNeXt model

5,235

ConvNeXt

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

4,971

dino

A data augmentations library for audio, image, text, and video.

4,830

AugLy

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

4,739

Kats

Reading Wikipedia to Answer Open-Domain Questions

4,387

DrQA

Hackable and optimized Transformers building blocks, supporting a composable construction.

4,374

xformers

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722

4,191

moco

Learning embeddings for classification, retrieval and ranking.

4,035

StarSpace

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

3,856

lingua

Facebook AI Research Sequence-to-Sequence Toolkit

3,829

fairseq-lua

A Python toolbox for performing gradient-free optimization

3,765

nevergrad

3,446

deit

Official DeiT repository

An implementation of a deep learning recommendation model (DLRM)

3,425

dlrm

A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

3,417

ReAgent

Language-Agnostic SEntence Representations

3,395

LASER

Efficient 3D human pose estimation in video using 2D keypoint trajectories

3,308

VideoPose3D

Generate embeddings from large-scale graph-structured data.

3,294

PyTorch-BigGraph

Torch implementation of DeepMask and SharpMask

3,238

deepmask

A library for Multilingual Unsupervised or Supervised word Embeddings

3,113

MUSE

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

3,094

vissl

A deep learning library for video understanding research.

3,038

pytorchvideo

PyTorch original implementation of Cross-lingual Language Model Pretraining.

2,885

XLM

Code and dataset for photorealistic Codec Avatars driven from audio

2,763

audio2photoreal

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

2,696

ijepa

PyTorch code and models for V-JEPA self-supervised learning from video.

2,670

jepa

A flexible, high-performance 3D simulator for Embodied AI research.

2,646

habitat-sim

CoTracker is a model for tracking any point (pixel) on a video.

2,621

co-tracker

HiPlot makes understanding high dimensional data easy

2,564

hiplot

TypeScript

2,481

fairscale

PyTorch extensions for high performance and large scale training.

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

2,319

encodec

InferSent sentence embeddings

2,313

InferSent

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

2,264

Pearl

PyRobot: An Open Source Robotics Research Platform

2,193

pyrobot

DarkForest, the Facebook Go engine.

2,109

darkforestGo

2,108

ELF

An End-To-End, Lightweight and Flexible Platform for Game Research

Codebase for Image Classification Research, written in PyTorch.

2,089

pycls

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

2,053

esm

A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

2,026

frankmocap

Non-local Neural Networks for Video Classification

1,972

video-nonlocal-net

A python tool for evaluating the quality of sentence embeddings.

1,931

SentEval

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

1,930

habitat-lab

Implementation of a classification framework from the paper Aggregated Residual Transformations for Deep Neural Networks

1,867

ResNeXt

Submanifold sparse convolutional networks

1,863

SparseConvNet

Schedule-Free Optimization in PyTorch

1,847

schedule_free

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

1,842

chameleon

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

1,811

swav

A domain specific language to express machine learning workloads.

1,790

TensorComprehensions

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

1,747

Mask2Former

Collection of common code that's shared among different research projects in FAIR computer vision team.

1,638

fvcore

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf

1,623

TransCoder

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

1,611

poincare-embeddings

Deep Hough Voting for 3D Object Detection in Point Clouds

1,587

votenet

A mix of GAN implementations including progressive growing

1,563

pytorch_GAN_zoo

An end-to-end PyTorch framework for image and video classification

1,554

ClassyVision

Deep Clustering for Unsupervised Learning of Visual Features

1,552

deepcluster

higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

1,544

higher

Phrase-Based & Neural Unsupervised Machine Translation

1,524

UnsupervisedMT

We estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.

1,496

consistent_depth

Code release for ConvNeXt V2 model

1,479

ConvNeXt-V2

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

1,454

Detic

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

1,446

end-to-end-negotiator

DomainBed is a suite to test domain generalization algorithms

1,368

DomainBed

A Torch implementation of the object detection network from "A MultiPath Network for Object Detection" (https://arxiv.org/abs/1604.02135)

1,355

multipathnet

A platform for developing AI systems as described in A Roadmap towards Machine Intelligence - http://arxiv.org/abs/1511.08130

1,349

CommAI-env

1,324

theseus

A library for differentiable nonlinear optimization

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

1,306

DPR

A framework for Privacy Preserving Machine Learning

1,292

CrypTen

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

1,283

denoiser

Learning Continuous Signed Distance Functions for Shape Representation

1,272

DeepSDF

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

1,191

100

TimeSformer