• Stars
    star
    2,147
  • Rank 21,492 (Top 0.5 %)
  • Language
    Python
  • License
    Other
  • Created about 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

A picture speaks volumes, as do the words that frame it.

Static Badge arXiv preprint Homepage Hits Static Badge Gradio demo


Introduction Video ๐ŸŽฅ

Turn on the music if possible ๐ŸŽง

trex2_ongithub.mp4

News ๐Ÿ“ฐ

  • 2024-03-27: Online Gradio demo hosted at https://huggingface.co/spaces/Mountchicken/T-Rex2
  • 2024-03-26: New Gradio demo available๐Ÿค—! Get our API but don't know how to use it? We are now providing a local Gradio demo that provides GUI interface to support your usage. Check here: Gradio APP

Contents ๐Ÿ“œ

1. Introduction ๐Ÿ“š

Object detection, the ability to locate and identify objects within an image, is a cornerstone of computer vision, pivotal to applications ranging from autonomous driving to content moderation. A notable limitation of traditional object detection models is their closed-set nature. These models are trained on a predetermined set of categories, confining their ability to recognize only those specific categories. The training process itself is arduous, demanding expert knowledge, extensive datasets, and intricate model tuning to achieve desirable accuracy. Moreover, the introduction of a novel object category, exacerbates these challenges, necessitating the entire process to be repeated.

T-Rex2 addresses these limitations by integrating both text and visual prompts in one model, thereby harnessing the strengths of both modalities. The synergy of text and visual prompts equips T-Rex2 with robust zero-shot capabilities, making it a versatile tool in the ever-changing landscape of object detection.

What Can T-Rex Do ๐Ÿ“

T-Rex2 is well-suited for a variety of real-world applications, including but not limited to: agriculture, industry, livstock and wild animals monitoring, biology, medicine, OCR, retail, electronics, transportation, logistics, and more. T-Rex2 mainly supports three major workflows including interactive visual prompt workflow, generic visual prompt workflow and text prompt workflow. It can cover most of the application scenarios that require object detection

workflows.mp4

2. Try Demo ๐ŸŽฎ

We are now opening online demo for T-Rex2. Check our demo here

3. API Usage Examples๐Ÿ“š

We are now opening free API access to T-Rex2. For educators, students, and researchers, we offer an API with extensive usage times to support your educational and research endeavors. You can get API at here request API.

Setup

Install the API package and acquire the API token from the email.

git clone https://github.com/IDEA-Research/T-Rex.git
cd T-Rex
pip install dds-cloudapi-sdk==0.1.1
pip install -v -e .

Interactive Visual Prompt API

  • In interactive visual prompt workflow, users can provide visual prompts in boxes or points format on a given image to specify the object to be detected.

    python demo_examples/interactive_inference.py --token <your_token> 
    • You are supposed get the following visualization results at demo_vis/

Generic Visual Prompt API

  • In generic visual prompt workflow, users can provide visual prompts on one reference image and detect on the other image.

    python demo_examples/generic_inference.py --token <your_token> 
    • You are supposed get the following visualization results at demo_vis/
      + =

Customize Visual Prompt Embedding API

In this workflow, you cam customize a visual embedding for a object category using multiple images. With this embedding, you can detect on any images.

python demo_examples/customize_embedding.py --token <your_token> 
  • You are supposed to get a download link for this visual prompt embedding in safetensors format. Save it and let's use it for embedding_inference.

Embedding Inference API

With the visual prompt embeddings generated from the previous API. You can use it detect on any images.

  python demo_examples/embedding_inference.py --token <your_token> 

4. Local Gradio Demo with API๐ŸŽจ

4.1. Setup

  • Install T-Rex2 API if you haven't done so
- install gradio and other dependencies
```bash
# install gradio and other dependencies
pip install gradio==4.22.0
pip install gradio-image-prompter

4.2. Run the Gradio Demo

python gradio_demo.py --trex2_api_token <your_token>

4.3. Basic Operations

  • Draw Box: Draw a box on the image to specify the object to be detected. Drag the left mouse button to draw a box.
  • Draw Point: Draw a point on the image to specify the object to be detected. Click the left mouse button to draw a point.
  • Interactive Visual Prompt: Provide visual prompts in boxes or points format on a given image to specify the object to be detected. The Input Target Image and Interactive Visual Prompt Image should be the same
  • Generic Visual Prompt: Provide visual prompts on multiple reference images and detect on the other image.

5. Related Works

๐Ÿ”ฅ We release the training and inference code and demo link of DINOv, which can handle in-context visual prompts for open-set and referring detection & segmentation. Check it out!

BibTeX ๐Ÿ“š

@misc{jiang2024trex2,
      title={T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy}, 
      author={Qing Jiang and Feng Li and Zhaoyang Zeng and Tianhe Ren and Shilong Liu and Lei Zhang},
      year={2024},
      eprint={2403.14610},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

More Repositories

1

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Jupyter Notebook
14,724
star
2

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Python
6,003
star
3

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Python
2,160
star
4

DWPose

"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
Python
2,136
star
5

detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
Python
2,001
star
6

awesome-detection-transformer

Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
1,261
star
7

MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Python
1,149
star
8

Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Python
680
star
9

OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
Python
650
star
10

Motion-X

[NeurIPS 2023] Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"
Python
542
star
11

DN-DETR

[CVPR 2022 Oral] Official implementation of DN-DETR
Python
535
star
12

DAB-DETR

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
Jupyter Notebook
499
star
13

OSX

[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer"
Python
291
star
14

HumanTOMATO

[ICML 2024] ๐Ÿ…HumanTOMATO: Text-aligned Whole-body Motion Generation
Python
276
star
15

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Python
226
star
16

deepdataspace

The Go-To Choice for CV Data Visualization, Annotation, and Model Analysis.
TypeScript
212
star
17

Stable-DINO

[ICCV 2023] Official implementation of the paper "Detection Transformer with Stable Matching"
Python
203
star
18

Lite-DETR

[CVPR 2023] Official implementation of the paper "Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR"
Python
182
star
19

DreamWaltz

[NeurIPS 2023] Official implementation of the paper "DreamWaltz: Make a Scene with Complex 3D Animatable Avatars".
Python
176
star
20

MP-Former

[CVPR 2023] Official implementation of the paper: MP-Former: Mask-Piloted Transformer for Image Segmentation
Python
99
star
21

HumanSD

The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
Python
92
star
22

HumanArt

The official implementation of CVPR 2023 paper "Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes"
86
star
23

ED-Pose

The official repo for [ICLR'23] "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "
Python
73
star
24

DQ-DETR

[AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
54
star
25

DisCo-CLIP

Official PyTorch implementation of the paper "DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training".
Python
47
star
26

LipsFormer

Python
34
star
27

DiffHOI

Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"
Python
29
star
28

hana

Implementation and checkpoints of Imagen, Google's text-to-image synthesis neural network, in Pytorch
Python
17
star
29

TOSS

[ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"
Python
15
star
30

IYFC

C++
9
star
31

TAPTR

6
star
32

detrex-storage

2
star