AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection
By Thanh-Toan Do*, Anh Nguyen*, Ian Reid (* equal contribution)
Contents
Requirements
-
Caffe
- Install Caffe: Caffe installation instructions.
- Caffe must be built with support for Python layers.
-
Hardware
- To train a full AffordanceNet, you'll need a GPU with ~11GB (e.g. Titan, K20, K40, Tesla, ...).
- To test a full AffordanceNet, you'll need ~6GB GPU.
-
[Optional] For robotic demo
Installation
-
Clone the AffordanceNet repository into your
$AffordanceNet_ROOT
folder. -
Build
Caffe
andpycaffe
:cd $AffordanceNet_ROOT/caffe-affordance-net
# Now follow the Caffe installation instructions: http://caffe.berkeleyvision.org/installation.html
# If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
-
Build the Cython modules:
cd $AffordanceNet_ROOT/lib
make
-
Download pretrained weights (Google Drive, One Drive). This weight is trained on the training set of the IIT-AFF dataset:
- Extract the file you downloaded to
$AffordanceNet_ROOT
- Make sure you have the caffemodel file like this:
'$AffordanceNet_ROOT/pretrained/AffordanceNet_200K.caffemodel
- Extract the file you downloaded to
Demo
After successfully completing installation, you'll be ready to run the demo.
-
Export pycaffe path:
export PYTHONPATH=$AffordanceNet_ROOT/caffe-affordance-net/python:$PYTHONPATH
-
Demo on static images:
cd $AffordanceNet_ROOT/tools
python demo_img.py
- You should see the detected objects and their affordances.
-
(Optional) Demo on depth camera (such as Asus Xtion):
- With AffordanceNet and the depth camera, you can easily select the interested object and its affordances for robotic applications such as grasping, pouring, etc.
- First, launch your depth camera with ROS, OpenNI, etc.
cd $AffordanceNet_ROOT/tools
python demo_asus.py
- You may want to change the object id and/or affordance id (line
380
,381
indemo_asus.py
). Currently, we select thebottle
and itsgrasp
affordance. - The 3D grasp pose can be visualized with rviz. You should see something like this:
Training
-
We train AffordanceNet on IIT-AFF dataset
- We need to format IIT-AFF dataset as in Pascal-VOC dataset for training.
- For your convinience, we did it for you. Just download this file (Google Drive, One Drive) and extract it into your
$AffordanceNet_ROOT
folder. - The extracted folder should contain three sub-folders:
$AffordanceNet_ROOT/data/cache
,$AffordanceNet_ROOT/data/imagenet_models
, and$AffordanceNet_ROOT/data/VOCdevkit2012
.
-
Train AffordanceNet:
cd $AffordanceNet_ROOT
./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] [NET] [--set ...]
- e.g.:
./experiments/scripts/faster_rcnn_end2end.sh 0 VGG16 pascal_voc
- We use
pascal_voc
alias although we're training using the IIT-AFF dataset.
Notes
- AffordanceNet vs. Mask-RCNN: AffordanceNet can be considered as a general version of Mask-RCNN when we have multiple classes inside each instance.
- The current network achitecture is slightly diffrent from the paper, but it achieves the same accuracy.
- Train AffordanceNet on your data:
- Format your images as in Pascal-VOC dataset (as in
$AffordanceNet_ROOT/data/VOCdevkit2012
folder). - Prepare the affordance masks (as in
$AffordanceNet_ROOT/data/cache
folder): For each object in the image, we need to create a mask and save as a .sm file. See$AffordanceNet_ROOT/utils
for details.
- Format your images as in Pascal-VOC dataset (as in
Citing AffordanceNet
If you find AffordanceNet useful in your research, please consider citing:
@inproceedings{AffordanceNet18,
title={AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection},
author={Do, Thanh-Toan and Nguyen, Anh and Reid, Ian},
booktitle={International Conference on Robotics and Automation (ICRA)},
year={2018}
}
If you use IIT-AFF dataset, please consider citing:
@inproceedings{Nguyen17,
title={Object-Based Affordances Detection with Convolutional Neural Networks and Dense Conditional Random Fields},
author={Nguyen, Anh and Kanoulas, Dimitrios and Caldwell, Darwin G and Tsagarakis, Nikos G},
booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2017},
}
License
MIT License
Acknowledgement
This repo used a lot of source code from Faster-RCNN