• Stars
    star
    936
  • Rank 47,099 (Top 1.0 %)
  • Language
    Python
  • License
    MIT License
  • Created 6 months ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. πŸ”₯

multimodal-maestro


version license python-version Gradio Colab

πŸ‘‹ hello

Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!

πŸ’» install

⚠️ Our package has been renamed to maestro. Install the package in a 3.11>=Python>=3.8 environment.

pip install maestro

πŸ”Œ API

🚧 The project is still under construction. The redesigned API is coming soon.

maestro-docs-Snap

πŸ§‘β€πŸ³ prompting cookbooks

Description Colab
Prompt LMMs with Multimodal Maestro Colab
Manually annotate ONE image and let GPT-4V annotate ALL of them Colab

πŸš€ example

Find dog.

>>> The dog is prominently featured in the center of the image with the label [9].
πŸ‘‰ read more
  • load image

    import cv2
    
    image = cv2.imread("...")
  • create and refine marks

    import maestro
    
    generator = maestro.SegmentAnythingMarkGenerator(device='cuda')
    marks = generator.generate(image=image)
    marks = maestro.refine_marks(marks=marks)
  • visualize marks

    mark_visualizer = maestro.MarkVisualizer()
    marked_image = mark_visualizer.visualize(image=image, marks=marks)

    image-vs-marked-image

  • prompt

    prompt = "Find dog."
    
    response = maestro.prompt_image(api_key=api_key, image=marked_image, prompt=prompt)
    >>> "The dog is prominently featured in the center of the image with the label [9]."
    
  • extract related marks

    masks = maestro.extract_relevant_masks(text=response, detections=refined_marks)
    >>> {'6': array([
    ...     [False, False, False, ..., False, False, False],
    ...     [False, False, False, ..., False, False, False],
    ...     [False, False, False, ..., False, False, False],
    ...     ...,
    ...     [ True,  True,  True, ..., False, False, False],
    ...     [ True,  True,  True, ..., False, False, False],
    ...     [ True,  True,  True, ..., False, False, False]])
    ... }
    

multimodal-maestro

🚧 roadmap

  • Rewriting the maestro API.
  • Update HF space.
  • Documentation page.
  • Add GroundingDINO prompting strategy.
  • CovVLM demo.
  • Qwen-VL demo.

πŸ’œ acknowledgement

🦸 contribution

We would love your help in making this repository even better! If you noticed any bug, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.

More Repositories

1

supervision

We write your reusable computer vision tools. πŸ’œ
Python
13,787
star
2

notebooks

Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.
Jupyter Notebook
4,102
star
3

awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API πŸ”₯
Python
1,577
star
4

inference

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Python
1,009
star
5

webcamGPT

webcamGPT - chat with video stream πŸ’¬ + πŸ“Έ
Python
236
star
6

roboflow-100-benchmark

Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
Jupyter Notebook
223
star
7

roboflow-python

The official Roboflow Python package. Manage your datasets, models, and deployments. Roboflow has everything you need to build a computer vision application.
Python
201
star
8

dji-aerial-georeferencing

Detect objects in drone videos and plot them on a map
JavaScript
156
star
9

neuralhash-collisions

A catalog of naturally occurring images whose Apple NeuralHash is identical.
JavaScript
149
star
10

template-python

A template repo holding our common setup for a python project
Python
66
star
11

video-inference

Example showing how to do inference on a video file with Roboflow Infer
Shell
47
star
12

auto-annotate

A simple tool for automatic image annotation using Roboflow API
Python
36
star
13

roboflow-computer-vision-utilities

Interface with the Roboflow API and Python package for running inference (receiving predictions) and customizing result images from your Roboflow Train computer vision models.
Python
31
star
14

cvevals

Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, models hosted on Roboflow)
Python
29
star
15

homepage-demo

Build an in-browser model experience like the one on the Roboflow homepage.
JavaScript
27
star
16

blackjack-basic-strategy

A computer vision powered Blackjack basic strategy app powered by Roboflow.
JavaScript
27
star
17

quickstart-python

Start using computer vision in two minutes with our interactive Python notebook experience.
Jupyter Notebook
23
star
18

deploy-models-with-grpc-pytorch-asyncio

Article about deploying machine learning models using grpc, pytorch and asyncio
Python
23
star
19

roboflow-collect

Passively collect images for computer vision datasets on the edge.
Python
20
star
20

polygonzone

A web utility to draw polygons and retrieve their coordinates for computer vision applications.
JavaScript
19
star
21

gpt-checkup

Monitor the performance of OpenAI's GPT-4V model over time.
HTML
19
star
22

clip_video_app

Flask-based web application designed to compare text and image embeddings using the CLIP model.
Python
18
star
23

supashim

Use Supabase as a drop-in replacement for Firebase
JavaScript
17
star
24

roboflow-api-snippets

repo for versioning snippets that show how to use Roboflow APIs
Python
17
star
25

RoboflowExpoExample

Java
15
star
26

cookbooks

Templates for computer vision projects, referenced in Roboflow blog posts.
Python
14
star
27

rabbit-deterrence

Uses computer vision to deter rabbits from eating your vegetables
Python
14
star
28

rickblocker

Audio visual mitigation of Rickrolls using computer vision.
JavaScript
14
star
29

inference-server-old

Object detection inference with Roboflow Train models on NVIDIA Jetson devices.
JavaScript
13
star
30

roboflow-ios-starter

Official starter project for building iOS apps with Roboflow.
Swift
12
star
31

cog-vlm-client

Simple CogVLM client script
Python
12
star
32

inference-client

Python
11
star
33

magic-scissors

Synthetic data for object detection and segmentation
Python
9
star
34

roboflow-react-app

react starter app for roboflow inference
JavaScript
8
star
35

roboflow-nest

Using Roboflow with the Nest camera API
JavaScript
8
star
36

yolov5-custom-training-tutorial

Jupyter Notebook
8
star
37

OBS-Controller

This is a public repo for the Roboflow OBS Gesture Controller. The gesture controller currently responds to four gestures, "Up", "Down", "Stop", and "Grab". Performing these gestures will allow you to transition scenes and grab source objects inside of OBS.
TypeScript
8
star
38

streamlit-web-app

A web-based application for testing models trained with Roboflow. Powered by Streamlit.
Python
7
star
39

inference-dashboard-example

Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
Python
6
star
40

roboflow-cli

Command Line Interface for Roboflow
JavaScript
5
star
41

yolov8-OpenVINO

Deploy a YOLOv8 model (ONNX format) to an Amazon SageMaker endpoint for serving inference requests using ONNXRuntime
Jupyter Notebook
5
star
42

foundation-vision-benchmark

A qualitative set of tests for use in evaluating the capabilities of foundation vision models.
4
star
43

roboflow-100-3d-website

roboflow-100-3d-website
JavaScript
4
star
44

streamlit-bccd

Streamlit App for Blood Cell Count Dataset
Python
4
star
45

roboflow-jetson-license-plate

Mashup Roboflow Object Detection with OCR to read license plates.
Python
4
star
46

stable-diffusion-demo

Generating 1k images using Stable Diffusion and uploading them into your Roboflow project
Jupyter Notebook
4
star
47

scavenger-hunt

Roboflow SXSW Scavenger Hunt game.
JavaScript
4
star
48

supervision-annotators-hf-space

Demo of Annotators through Gradio
Python
4
star
49

trt-demos

This is a repo for Roboflow TFT python examples.
Python
3
star
50

model-library

3
star
51

roboflow-object-counting

Interface with the Roboflow API and Python package for object counting in your computer vision models.
Jupyter Notebook
3
star
52

roboflow-node

Roboflow CLI and API module for node
JavaScript
3
star
53

roboflow-red

A visual way to interact with computer vision using Node-RED
JavaScript
3
star
54

synthetic-fruit-dataset

Code for Roboflow's How to Create a Synthetic Dataset tutorial.
JavaScript
3
star
55

roboflow-swift

Swift
2
star
56

fast-ai-resnet32

Jupyter Notebook
2
star
57

roboflow-swift-examples

Swift
2
star
58

c3-sapphire-rapids

Jupyter Notebook
2
star
59

roboflow-object-tracking

Python
1
star
60

smooth-frame

Python
1
star
61

tao-toolkit-with-roboflow

Jupyter Notebook
1
star
62

ODinW-RF100-challenge-issues

ODinW RF100 πŸ“Έ challenge issues/discussions repository
1
star
63

clip-benchmark

Python
1
star
64

yolov8-website

Source code for the yolov8.com website.
CSS
1
star
65

external-bugtracker

1
star
66

stacked-boxes-email-notification

A small project demonstrating how Roboflow's Inference APIs can be used to trigger email notifications.
Python
1
star
67

server-benchmark

A script you can use to benchmark the Roboflow Deploy targets with your custom trained model on your hardware.
JavaScript
1
star
68

lenny

Lenny uses 500+ blog posts, 100+ docs pages, and Roboflow developer documentation to answer questions about computer vision and Roboflow.
HTML
1
star