PoseLSTM and PoseNet implementation in PyTorch
This is the PyTorch implementation for PoseLSTM and PoseNet, developed based on Pix2Pix code.
Prerequisites
- Linux
- Python 3.5.2
- CPU or NVIDIA GPU + CUDA CuDNN
Getting Started
Installation
- Install PyTorch and dependencies from http://pytorch.org
- Clone this repo:
git clone https://github.com/hazirbas/posenet-pytorch
cd posenet-pytorch
pip install -r requirements.txt
PoseNet train/test
- Download a Cambridge Landscape dataset (e.g. KingsCollege) under datasets/ folder.
- Compute the mean image
python util/compute_image_mean.py --dataroot datasets/KingsCollege --height 256 --width 455 --save_resized_imgs
- Train a model:
python train.py --model posenet --dataroot ./datasets/KingsCollege --name posenet/KingsCollege/beta500 --beta 500 --gpu 0
- To view training errors and loss plots, set
--display_id 1
, runpython -m visdom.server
and click the URL http://localhost:8097. Checkpoints are saved under./checkpoints/posenet/KingsCollege/beta500/
. - Test the model:
python test.py --model posenet --dataroot ./datasets/KingsCollege --name posenet/KingsCollege/beta500 --gpu 0
The test errors will be saved to a text file under ./results/posenet/KingsCollege/beta500/
.
PoseLSTM train/test
- Train a model:
python train.py --model poselstm --dataroot ./datasets/KingsCollege --name poselstm/KingsCollege/beta500 --beta 500 --niter 1200 --gpu 0
- Test the model:
python test.py --model poselstm --dataroot ./datasets/KingsCollege --name poselstm/KingsCollege/beta500 --gpu 0
Initialize the network with the pretrained googlenet trained on the Places dataset
If you would like to initialize the network with the pretrained weights, download the places-googlenet.pickle file under the pretrained_models/ folder:
wget https://vision.in.tum.de/webarchive/hazirbas/poselstm-pytorch/places-googlenet.pickle
Optimization scheme and loss weights
- We use the training scheme defined in PoseLSTM
- Note that mean subtraction is not used in PoseLSTM models
- Results can be improved with a hyper-parameter search
Dataset | beta | PoseNet (CAFFE) | PoseNet | PoseLSTM (TF) | PoseLSTM |
---|---|---|---|---|---|
King's College | 500 | 1.92m 5.40° | 1.19m 4.51° | 0.99m 3.65° | 0.90m 3.96° |
Old Hospital | 1500 | 2.31m 5.38° | 1.91m 4.05° | 1.51m 4.29° | 1.79m 4.28° |
Shop Façade | 100 | 1.46m 8.08° | 1.30m 8.13° | 1.18m 7.44° | 0.98m 6.20° |
St Mary's Church | 250 | 2.65m 8.48° | 1.89m 7.27° | 1.52m 6.68° | 1.68m 6.41° |
Citation
@inproceedings{PoseNet15,
title={PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization},
author={Alex Kendall, Matthew Grimes and Roberto Cipolla },
journal={ICCV},
year={2015}
}
@inproceedings{PoseLSTM17,
author = {Florian Walch and Caner Hazirbas and Laura Leal-Taixé and Torsten Sattler and Sebastian Hilsenbeck and Daniel Cremers},
title = {Image-based localization using LSTMs for structured feature correlation},
month = {October},
year = {2017},
booktitle = {ICCV},
eprint = {1611.07890},
url = {https://github.com/NavVisResearch/NavVis-Indoor-Dataset},
}
Acknowledgments
Code is inspired by pytorch-CycleGAN-and-pix2pix.