• Stars
    star
    143
  • Rank 255,526 (Top 6 %)
  • Language CMake
  • License
    GNU General Publi...
  • Created about 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ROS wrapper for Meta's Segment-Anything model

ROS SAM

This package is what the name suggests: Meta's segment-anything wrapped in a ROS node. In this wrapper we offer...

  • ROS services for segmenting images using point and box queries.
  • An RQT interface for specifying point queries interactively.
  • A Python client which handles the serialization of queries.

Installation

Installation is easy:

  1. Start by cloning this package into your ROS environment.
  2. Download the checkpoints for the desired SAM models to the models directory in this package.
  3. Install SAM by running pip install git+https://github.com/facebookresearch/segment-anything.git.

Using ROS SAM standalone

Run the SAM ROS node using rosrun:

rosrun ros_sam sam_node.py

The node has two parameters:

  • ~model SAM model to use, defaults to vit_h. Check SAM documentation for options.
  • ~cuda whether to use CUDA and which device, defaults to cuda. Use cpu if you have no CUDA.If you want to use a specific GPU, set someting like cuda:1.

The node currently offers a single service ros_sam/segment which can be called to segment an image. Check rossrv show ros_sam/Segmentation for request and response specifications.

You can test SAM by starting the node and then running rosrun ros_sam sam_test.py. This should yield the following result:

ROS Services

ros_sam offers a single service segment of the type ros_sam/Segmentation.srv. The service definition is

sensor_msgs/Image        image            # Image to segment
geometry_msgs/Point[]    query_points     # Points to start segmentation from
int32[]                  query_labels     # Mark points as positive or negative samples
std_msgs/Int32MultiArray boxes            # Boxes can only be positive samples
bool                     multimask        # Generate multiple masks
bool                     logits           # Send back logits

---

sensor_msgs/Image[]   masks            # Masks generated for the query
float32[]             scores           # Scores for the masks
sensor_msgs/Image[]   logits           # Logit activations of the masks

The service request takes input image, input point prompts, corresponding labels and the box prompt. The service response contains the segmentation masks, confidence scores and the logit activations of the masks.

To learn more about the types and use of different queries, please refer to the original SAM tutorial

The service calls are wrapped up conveniently in the ROS SAM client.

Using with RQT click interface

To use the GUI install the following in your ROS workspace:

rqt_image_view_seg

Run the launch file:

roslaunch ros_sam gui_test.launch

Check the terminal and wait until the SAM model has finished loading.

There will be two windows loaded. One will have the header rqt_image_view_seg__ImageView and the other rqt_image_view__ImageView, note the lack of _seg. The first window is where you should click, so select the topic of the camera you want to view from the drop down. In the second window you should select /rqt_image_segmentation/masked_image. This is where the segmented image will be displayed.

Using ROS SAM client

Alternatively, if you don't feel like assembling the service calls yourself, one can use the ROS SAM client instead of the service calls.

Initialize the client with the service name of the SAM segmentation service

from ros_sam import SAMClient
sam_client = SAMClient('ros_sam')

Call the segment method with the input image, input prompt points and corresponding labels. This returns 3 segmentation masks for the object and their corresponding confidence scores

img = cv2.imread('path/to/image.png')
points = np.array([[100, 100], [200, 200], [300, 300]])
labels = [1, 1, 0]
masks, scores = sam_client.segment(img, points, labels)

Additional utilities for visualizing segmentation masks and input prompts

from ros_sam import show_mask, show_points
show_mask(masks[0], plt.gca())
show_points(points, np.asarray(labels), plt.gca())

Citing ROS SAM

If you use ROS SAM in your work, please cite our paper:

@article{buchanan2023online,
  title={Online Estimation of Articulated Objects with Factor Graphs using Vision and Proprioceptive Sensing},
  author={Buchanan, Russell and R{\"o}fer, Adrian and Moura, Jo{\~a}o and Valada, Abhinav and Vijayakumar, Sethu},
  journal={arXiv preprint arXiv:2309.16343},
  year={2023}
}

And please also cite the original Segment Anything paper:

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

More Repositories

1

LCDNet

PyTorch code for training LCDNet for loop closure detection in LiDAR SLAM. http://rl.uni-freiburg.de/research/lidar-slam-lc
Python
159
star
2

CL-SLAM

Continual SLAM: Beyond Lifelong Simultaneous Localization and Mapping through Continual Learning. http://continual-slam.cs.uni-freiburg.de
Python
122
star
3

PanopticBEV

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images. http://panoptic-bev.cs.uni-freiburg.de
Python
119
star
4

EfficientLPS

PyTorch code for training EfficientLPS for LiDAR panoptic segmentation. https://rl.uni-freiburg.de/research/lidar-panoptic
Python
92
star
5

MM-DistillNet

PyTorch code for training MM-DistillNet for multimodal knowledge distillation. http://rl.uni-freiburg.de/research/multimodal-distill
Python
58
star
6

PADLoC

LiDAR-Based Deep Loop Closure Detection and Registration using Panoptic Attention
Python
50
star
7

CURB-SG

[ICRA 2024] Collaborative Dynamic 3D Scene Graphs for Automated Driving
C++
46
star
8

mobile-rl

Learning Navigation for Arbitrary Mobile Manipulation Motions in Unseen and Dynamic Environments. http://mobile-rl.cs.uni-freiburg.de
Python
43
star
9

MoMa-LLM

Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation. Project website: http://moma-llm.cs.uni-freiburg.de
Python
37
star
10

BEVCar

[IROS2024] Camera-Radar Fusion for BEV Map and Object Segmentation
Python
33
star
11

Batch3DMOT

3D Multi-Object Tracking Using Graph Neural Networks with Cross-Edge Modality Attention. http://batch3dmot.cs.uni-freiburg.de
Python
31
star
12

Panoptic-Tracking

Python
25
star
13

SPINO

Few-Shot Panoptic Segmentation With Foundation Models
Python
24
star
14

CoDEPS

Continual Learning for Depth Estimation and Panoptic Segmentation
Python
24
star
15

SkyEye

SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images
Python
23
star
16

DynaFill

Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via Geometry-Aware Adversarial Learning
Python
22
star
17

CARTO

Official Implementation of CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
Jupyter Notebook
20
star
18

kinematic-feasibility-rl

Learning Kinematic Feasibility through Reinforcement Leanring: http://rl.uni-freiburg.de/research/kinematic-feasibility-rl
EmberScript
19
star
19

MDPCalib

Automatic Target-Less Camera-LiDAR Calibration from Motion and Deep Point Correspondences
16
star
20

RaLF

RaLF: Flow-based Global and Metric Radar Localization in LiDAR Maps
Python
14
star
21

HIMOS

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation. Project website: http://himos.cs.uni-freiburg.de
Python
13
star
22

CEILing

Python
13
star
23

Active-Particle-Filter-Networks

Official repository for Active Particle Filter Networks: Efficient Active Localization in Continuous Action Spaces and Large Maps
Python
11
star
24

CenterGrasp

Python
10
star
25

Multi-Object-Search

Learning Long-Horizon Robot Exploration Strategies for Multi-Object Search in Continuous Action Spaces. http://multi-object-search.cs.uni-freiburg.de
Python
10
star
26

Dav-Nav

Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds. http://dav-nav.cs.uni-freiburg.de
Python
8
star
27

PASTEL

A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation
7
star
28

amodal-panoptic

Python
6
star
29

Semantic-Search

Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation. Project Website: http://semantic-search.cs.uni-freiburg.de
Jupyter Notebook
5
star
30

TAPAS

PyTorch code for TAPAS-GMM.
4
star
31

bopt_gmm

Shell
2
star
32

bask

PyTorch code for Bayesian Scene Keypoints.
Python
2
star
33

APSNet

Python
1
star
34

rl_tasks

Python
1
star
35

PAPS

Python
1
star
36

INoD

INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields.
Python
1
star