• Stars
    star
    7,576
  • Rank 4,765 (Top 0.1 %)
  • Language
    Python
  • License
    Other
  • Created 7 months ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Alt text for the image Ferret: Refer and Ground Anything Anywhere at Any Granularity

An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response. [Paper]

Haoxuan You*, Haotian Zhang*, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang [*: equal contribution]

Overview


Diagram of Ferret Model.

Key Contributions:

  • Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM.
  • GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset.
  • Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.

Release

  • [10/30] πŸ”₯ We released the code of FERRET model.

Usage and License Notices: The data, and code is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Contents

Install

  1. Clone this repository and navigate to FERRET folder
git clone https://github.com/apple/ml-ferret
cd ml-ferret
  1. Install Package
conda create -n ferret python=3.10 -y
conda activate ferret
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install pycocotools
pip install protobuf==3.20.0
  1. Install additional packages for training cases
pip install ninja
pip install flash-attn --no-build-isolation

Train

FERRET is trained on 8 A100 GPUs with 80GB memory. To train on fewer GPUs, you can reduce the per_device_train_batch_size and increase the gradient_accumulation_steps accordingly. Always keep the global batch size the same: per_device_train_batch_size x gradient_accumulation_steps x num_gpus.

Hyperparameters

We use a similar set of hyperparameters as LLaVA(Vicuna) in finetuning.

Hyperparameter Global Batch Size Learning rate Epochs Max length Weight decay
FERRET-7B 128 2e-5 3 2048 0
FERRET-13B 128 2e-5 3 2048 0

Prepare Vicuna checkpoint and LLaVA's projector

Before you start, prepare our base model Vicuna, which is an instruction-tuned chatbot. Please download its weights following the instructions here. Vicuna v1.3 is used in FERRET.

Then download LLaVA's first-stage pre-trained projector weight (7B, 13B).

FERRET Training

The scripts are provided (7B, 13B).

Evaluation

Please see this doc for the details.

Demo

To run our demo, you need to train FERRET and use the checkpoints locally. Gradio web UI is used. Please run the following commands one by one.

Launch a controller

python -m ferret.serve.controller --host 0.0.0.0 --port 10000

Launch a gradio web server.

python -m ferret.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --add_region_feature

Launch a model worker

This is the worker that load the ckpt and do the inference on the GPU. Each worker is responsible for a single model specified in --model-path.

CUDA_VISIBLE_DEVICES=0 python -m ferret.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./checkpoints/FERRET-13B-v0 --add_region_feature

Wait until the process finishes loading the model and you see "Uvicorn running on ...". Now, refresh your Gradio web UI, and you will see the model you just launched in the model list.


Example of Ferret Interactive Demo.

Citation

If you find Ferret useful, please cite using this BibTeX:

@article{you2023ferret,
  title={Ferret: Refer and Ground Anything Anywhere at Any Granularity},
  author={You, Haoxuan and Zhang, Haotian and Gan, Zhe and Du, Xianzhi and Zhang, Bowen and Wang, Zirui and Cao, Liangliang and Chang, Shih-Fu and Yang, Yinfei},
  journal={arXiv preprint arXiv:2310.07704},
  year={2023}
}

Acknowledgement

  • LLaVA: the codebase we built upon.
  • Vicuna: the LLM codebase.

More Repositories

1

swift

The Swift Programming Language
C++
65,899
star
2

ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon
Python
16,002
star
3

swift-evolution

This maintains proposals for changes and user-visible enhancements to the Swift Programming Language.
Markdown
15,013
star
4

foundationdb

FoundationDB - the open source, distributed, transactional key-value store
C++
13,947
star
5

turicreate

Turi Create simplifies the development of custom machine learning models.
C++
11,153
star
6

darwin-xnu

The Darwin Kernel (mirror). This repository is a pure mirror and contributions are currently not accepted via pull-requests, please submit your contributions via https://developer.apple.com/bug-reporting/
C
10,558
star
7

swift-package-manager

The Package Manager for the Swift Programming Language
Swift
9,592
star
8

swift-nio

Event-driven network application framework for high performance protocol servers & clients, non-blocking.
Swift
7,274
star
9

swift-algorithms

Commonly used sequence and collection algorithms for Swift
Swift
5,622
star
10

swift-corelibs-foundation

The Foundation Project, providing core utilities, internationalization, and OS independence
Swift
5,189
star
11

swift-protobuf

Plugin and runtime library for using protobuf with Swift
Swift
4,446
star
12

password-manager-resources

A place for creators and users of password managers to collaborate on resources to make password management better.
JavaScript
4,010
star
13

coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Python
3,974
star
14

ml-mgie

Python
3,682
star
15

tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Shell
3,643
star
16

swift-collections

Commonly used data structures for Swift
Swift
3,434
star
17

pkl

A configuration as code language with rich validation and tooling.
Java
3,360
star
18

swift-argument-parser

Straightforward, type-safe argument parsing for Swift
Swift
3,163
star
19

sourcekit-lsp

Language Server Protocol implementation for Swift and C-based languages
Swift
3,110
star
20

swift-log

A Logging API for Swift
Swift
2,931
star
21

swift-syntax

A set of Swift libraries for parsing, inspecting, generating, and transforming Swift source code.
Swift
2,887
star
22

swift-async-algorithms

Async Algorithms for Swift
Swift
2,695
star
23

swift-markdown

A Swift package for parsing, building, editing, and analyzing Markdown documents.
Swift
2,586
star
24

HomeKitADK

C
2,456
star
25

ml-ane-transformers

Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
Python
2,431
star
26

swift-corelibs-libdispatch

The libdispatch Project, (a.k.a. Grand Central Dispatch), for concurrency on multicore hardware
C
2,420
star
27

swift-format

Formatting technology for Swift source code
Swift
2,261
star
28

homebrew-apple

Ruby
2,227
star
29

swift-foundation

The Foundation project
Swift
2,088
star
30

cups

Apple CUPS Sources
C
1,828
star
31

sample-food-truck

SwiftUI sample code from WWDC22
Swift
1,695
star
32

ml-fastvit

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
Python
1,693
star
33

ml-cvnets

CVNets: A library for training computer vision networks
Python
1,664
star
34

swift-book

The Swift Programming Language book
Markdown
1,616
star
35

swift-numerics

Advanced mathematical types and functions for Swift
Swift
1,602
star
36

ml-hypersim

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Python
1,495
star
37

swift-crypto

Open-source implementation of a substantial portion of the API of Apple CryptoKit suitable for use on Linux platforms.
C
1,400
star
38

swift-docker

Docker Official Image packaging for Swift
Dockerfile
1,331
star
39

ml-neuman

Official repository of NeuMan: Neural Human Radiance Field from a Single Video (ECCV 2022)
Python
1,233
star
40

swift-openapi-generator

Generate Swift client and server code from an OpenAPI document.
Swift
1,142
star
41

swift-system

Low-level system calls and types for Swift
Swift
1,137
star
42

swift-corelibs-xctest

The XCTest Project, A Swift core library for providing unit test support
Swift
1,120
star
43

swift-docc

Documentation compiler that produces rich API reference documentation and interactive tutorials for your Swift framework or package.
Swift
1,093
star
44

swift-llbuild

A low-level build system, used by Xcode and the Swift Package Manager
C++
1,067
star
45

swift-atomics

Low-level atomic operations for Swift
Swift
1,004
star
46

swift-testing

Swift
981
star
47

servicetalk

A networking framework that evolves with your application
Java
881
star
48

swift-http-types

Version-independent HTTP currency types for Swift
Swift
815
star
49

swift-llvm

LLVM
815
star
50

swift-driver

Swift compiler driver reimplementation in Swift
Swift
764
star
51

swift-protobuf-plugin

Moved to apple/swift-protobuf
757
star
52

swift-lldb

This is the version of LLDB that supports the Swift programming language & REPL.
C++
673
star
53

swift-clang

C++
673
star
54

unityplugins

C#
645
star
55

ml-mobileone

This repository contains the official implementation of the research paper, "An Improved One millisecond Mobile Backbone".
Swift
641
star
56

ml-gaudi

602
star
57

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models
Python
602
star
58

swift-metrics

Metrics API for Swift
Swift
602
star
59

axlearn

Python
564
star
60

swift-distributed-actors

Peer-to-peer cluster implementation for Swift Distributed Actors
Swift
562
star
61

ARKitScenes

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.
Python
552
star
62

sample-backyard-birds

Swift
506
star
63

device-management

Device management schema data for MDM.
506
star
64

ccs-calendarserver

The Calendar and Contacts Server.
Python
470
star
65

ml-facelit

Official repository of FaceLit: Neural 3D Relightable Faces (CVPR 2023)
Python
457
star
66

swift-3-api-guidelines-review

Swift
452
star
67

swift-org-website

Swift.org website
SCSS
438
star
68

GCGC

Jupyter Notebook
436
star
69

swift-nio-http2

HTTP/2 support for SwiftNIO
Swift
405
star
70

swift-tools-support-core

Contains common infrastructural code for both SwiftPM and llbuild.
Swift
390
star
71

swift-nio-ssh

SwiftNIO SSH is a programmatic implementation of SSH using SwiftNIO
Swift
364
star
72

swift-nio-ssl

TLS Support for SwiftNIO, based on BoringSSL.
C
345
star
73

ml-gmpi

Official PyTorch implementation of GMPI (ECCV 2022, Oral Presentation)
Python
329
star
74

example-package-dealer

Example package for use with the Swift Package Manager
Swift
319
star
75

swift-collections-benchmark

A benchmarking tool for Swift Collection algorithms
Swift
316
star
76

example-package-playingcard

Example package for use with the Swift Package Manager
Swift
308
star
77

swift-docc-render

Web renderer for Swift-DocC documentation.
JavaScript
300
star
78

indexstore-db

Index database library for use with sourcekit-lsp
C++
299
star
79

swift-playdate-examples

A technical demonstration of Embedded Swift running on Playdate by Panic
Swift
295
star
80

swift-docc-plugin

Swift Package Manager command plugin for Swift-DocC
Swift
295
star
81

ml-hierarchical-confusion-matrix

Neo: Hierarchical Confusion Matrix Visualization (CHI 2022)
TypeScript
292
star
82

ml-gsn

Python
284
star
83

swift-llbuild2

A fresh take on a low-level build system API.
Swift
280
star
84

swift-source-compat-suite

The infrastructure and project index comprising the Swift source compatibility suite.
Python
278
star
85

sample-cloudkit-sharing

Swift
275
star
86

swift-xcode-playground-support

Logging and communication to allow Swift toolchains to communicate with Xcode.
Swift
270
star
87

swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.
Swift
263
star
88

ml-sigma-reparam

Python
255
star
89

swift-standard-library-preview

Swift
253
star
90

swift-nio-transport-services

Extensions for SwiftNIO to support Apple platforms as first-class citizens.
Swift
252
star
91

swift-stress-tester

Stress testing utilities for Swift's tooling
Swift
207
star
92

swift-service-discovery

A service discovery API for Swift.
Swift
203
star
93

swift-certificates

An implementation of X.509 for Swift
Swift
195
star
94

swift-nio-examples

examples of how to use swift-nio
Swift
195
star
95

swift-cluster-membership

Distributed Membership Protocol implementations in Swift
Swift
191
star
96

swift-aoc-starter-example

Swift starter project for solving Advent of Code challenges.
Swift
189
star
97

sample-cloudkit-coredatasync

Swift
187
star
98

swift-distributed-tracing

Instrumentation library for Swift server applications
Swift
186
star
99

pfl-research

Simulation framework for accelerating research in Private Federated Learning
Python
186
star
100

swift-internals

HTML
182
star