• Stars
    star
    381
  • Rank 112,502 (Top 3 %)
  • Language
    C++
  • License
    MIT License
  • Created almost 6 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A simple monocular visual odometry (part of vSLAM) by ORB keypoints with initialization, tracking, local map and bundle adjustment. (WARNING: Hi, I'm sorry that this project is tuned for course demo, not for real world applications !!!)

Monocular Visual Odometry

A monocular visual odometry (VO) with 4 components: initialization, tracking, local map, and bundle adjustment.

I did this project after I read the Slambook.
It's also my final project for the course EESC-432 Advanced Computer Vision in NWU in 2019 March.

A demo:

In the above figure:
Left is a video and the detected key points.
Right is the camera trajectory corresponding to the left video: White line is from VO; Green line is ground truth. Red markers on white line are the keyframes. Points are the map points, where points with red color are newly triangulated.
You can download video here.

Report

My pdf-version course report is here. It has a more clear decription about the algorithms than this README, so I suggest to read it.

Directory

1. Algorithm

This VO is achieved by the following procedures/algorithms:

1.1. Initialization

Estimate relative camera pose:
Given a video, set the 1st frame(image) as reference, and do feature matching with the 2nd frame. Compute the Essential Matrix (E) and Homography Matrix (H) between the two frames. Compute their Symmetric Transfer Error by method in ORB-SLAM paper and choose the better one (i.e., choose H if H/(E+H)>0.45). Decompose E or H into the relative pose between two frames, which is the rotation (R) and translation (t). By using OpenCV, E gives 1 result, and H gives 2 results, satisfying the criteria that points are in front of camera. For E, only single result to choose; For H, choose the one that makes the image plane and world-points plane more parallel.

Keyframe and local map:
Insert both 1st and K_th frame as keyframe. Triangulate their inlier matched keypoints to obtain the points' world positions. These points are called map points and are pushed to local map.

Check Triangulation Result
If the median triangulation angle is smaller than threshold, I will abandon this 2nd frame, and repeat the above process on frame 3, 4, etc. If at frame K, the triangulation angle is large than threshold, the initialization is completed.

Change scale:
Scale the translation t to be the same length as the ground truth, so that I can make comparison with ground truth. Then, scale the map points correspondingly.

1.2. Tracking

Keep on estimating the next camera pose. First, find map points that are in the camera view. Do feature matching to find 2d-3d correspondance between 3d map points and 2d image keypoints. Estimate camera pose by RANSAC and PnP.

1.3. Local Map

Insert keyframe: If the relative pose between current frame and previous keyframe is large enough with a translation or rotation larger than the threshold, insert current frame as a keyframe.
Do feature matching between current and previous keyframe. Get inliers by epipoloar constraint. If a inlier cv::KeyPoint hasn't been triangulated before, then triangulate it and push it to local map.

Clean up local map: Remove map points that are: (1) not in current view, (2) whose view_angle is larger than threshold, (3) rarely be matched as inlier point. (See Slambook Chapter 9.4.)

Graph/Connections between map points and frames:
Graphs are built at two stages of the algorithm:

  1. After PnP, based on the 3d-2d correspondances, I update the connectionts between map points and current keypoints.
  2. During triangulation, I also update the 2d-3d correspondance between current keypoints and triangulated mappoints, by either a direct link or going through previous keypoints that have been triangulated.

1.4. Bundle Adjustment

Since I've built the graph in previous step, I know what the 3d-2d point correspondances are in all frames.

Apply optimization to the previous N frames, where the cost function is the sum of reprojection error of each 3d-2d point pair. By computing the deriviate wrt (1) points 3d pos and (2) camera poses, we can solve the optimization problem using Gauss-Newton Method and its variants. These are done by g2o and its built-in datatypes of VertexSBAPointXYZ, VertexSE3Expmap, and EdgeProjectXYZ2UV. See Slambook Chapter 4 and Chapter 7.8.2 for more details.

1.5. Other details

Image features:
Extract ORB keypoints and features. Then, a simple grid sampling is applied to obtain keypoints uniformly distributed across image.

Feature matching:
Two methods are implemented, where good match is:
(1) Feature's distance is smaller than threshold, described in Slambook.
(2) Ratio of smallest and second smallest distance is smaller than threshold, proposed in Prof. Lowe's 2004 SIFT paper.
The first one is adopted, which is easier to tune the parameters to generate fewer error matches.

2. File Structure

2.1. Folders

  • include/: c++ header files.
  • src/: c++ definitions.
  • test/: Testing scripts for c++ functions.
  • data/: Store images.

Main scripts and classes for VO are in include/my_slam/vo/. I referenced this structure from the Slambook Chapter 9.

2.2. Functions

Functions are declared in include/. Some of its folders contain a README. See the tree structure for overview:

include/
โ””โ”€โ”€ my_slam
    โ”œโ”€โ”€ basics
    โ”‚   โ”œโ”€โ”€ basics.h
    โ”‚   โ”œโ”€โ”€ config.h
    โ”‚   โ”œโ”€โ”€ yaml.h
    โ”‚   โ”œโ”€โ”€ eigen_funcs.h
    โ”‚   โ”œโ”€โ”€ opencv_funcs.h
    โ”‚   โ””โ”€โ”€ README.md
    โ”œโ”€โ”€ common_include.h
    โ”œโ”€โ”€ display
    โ”‚   โ”œโ”€โ”€ pcl_display.h
    โ”‚   โ””โ”€โ”€ pcl_display_lib.h
    โ”œโ”€โ”€ geometry
    โ”‚   โ”œโ”€โ”€ camera.h
    โ”‚   โ”œโ”€โ”€ epipolar_geometry.h
    โ”‚   โ”œโ”€โ”€ feature_match.h
    โ”‚   โ””โ”€โ”€ motion_estimation.h
    โ”œโ”€โ”€ optimization
    โ”‚   โ””โ”€โ”€ g2o_ba.h
    โ””โ”€โ”€ vo
        โ”œโ”€โ”€ frame.h
        โ”œโ”€โ”€ map.h
        โ”œโ”€โ”€ mappoint.h
        โ”œโ”€โ”€ README.md
        โ”œโ”€โ”€ vo_commons.h
        โ”œโ”€โ”€ vo_io.h
        โ””โ”€โ”€ vo.h

3. Dependencies

Require: OpenCV, Eigen, Sophus, g2o.
See details below:

(1) OpenCV 4.0
Tutorial for install OpenCV 4.0: link.

You may need a version newer than 3.4.5, because I used this function:
filterHomographyDecompByVisibleRefpoints, which appears in OpenCV 3.4.5.

(2) Eigen 3
It's about matrix arithmetic. See its official page. Install by:

$ sudo apt-get install libeigen3-dev

(Note: Eigen only has header files. No ".so" or ".a" files.)

(3) Sophus

It's based on Eigen, and contains datatypes for Lie Group and Lie Algebra (SE3/SO3/se3/so3).

Download this lib here: https://github.com/strasdat/Sophus. Do cmake and make. Since I failed to make install it, I manually moved โ€œ/Sophus/sophusโ€ to โ€œ/usr/include/sophusโ€, and moved โ€œlibSophus.soโ€ to โ€œusr/libโ€. Then, in my CMakeLists.txt, I add this: set (THIRD_PARTY_LIBS libSophus.so ).

If there is an error of "unit_complex_.real() = 1.;" replace it and its following line with "unit_complex_ = std::complex(1,0);"

(4) g2o

First install either of the following two packages:

$ sudo apt-get install libsuitesparse $ sudo apt-get install libsuitesparse-dev

Download here: https://github.com/RainerKuemmerle/g2o.
Checkout to the last version in year 2017. Do cmake, make, make install.

4. How to Run

Compile:

mkdir -p build lib bin   
cd build && cmake .. && make -j4 && cd ..  

Download some data:

mkdir -p data
cd data
git clone https://github.com/felixchenfy/Monocular-Visual-Odometry-Data
mv Monocular-Visual-Odometry-Data/* ./
rm Monocular-Visual-Odometry-Data
ls
# dataset_images_matlab  README.md  result  test_data

Then, take a look at the configurations in config/config.yaml. The file paths have already been configured, so you don't need to change anything at this moment.

Finally, run:

bin/run_vo config/config.yaml

5. Results

I tested the current implementation on TUM fr1_desk and fr1_xyz dataset, but both performances are bad. I guess its due to too few detected keypoints, which causes too few keypoints matches. The solution I guess is to use the ORB-SLAM's method for extracting enough uniformly destributed keypoints across different scales, and doing guided matching based on the estimated camera motion.

Despite bad performance on fr1 dataset, my program does work well on this New Tsukuba Stereo Database, whose images and scenes are synthetic and have abundant high quality keypoints. The results are shown below.

I tested my VO with 3 different settings: (1) No optimization. (2) Optimize on map points and current camera pose. (3) Optimize on previous 5 camera poses. See videos below:

(1) No optimization:

(2) Optimize on points + current pose:

(2) Optimize on prev 5 poses:

The result shows: (1) Optimization improves accuracy. (2) The estiamted trajectory is close to the ground truth.

6. Reference

(1) Slambook:
I read this Dr. Xiang Gao's Slambook before writing code. The book provides both vSLAM theory as well as easy-to-read code examples in every chapter.

The framework of my program is based on Chapter 9 of Slambook, which is a RGB-D visual odometry project. Classes declared in include/vo/ are based on this Chapter.

These files are mainly copied or built on top of the Slambook's code:

I also borrowed other codes from the slambook. But since they are small pieces and lines, I didn't list them here.

In short, the Slambook provides huge help for me and my this project.

(2) Matlab VO tutorial:
This is a matlab tutorial of monocular visual odometry. Since Slambook doesn't write a lot about monocular VO, I resorted to this Matlab tutorial for solution. It helped me a lot for getting clear the whole workflow.

The dataset I used is also the same as this Matlab tutorial, which is the New Tsukuba Stereo Database.

(3) ORB-SLAM/ORB-SLAM2 papers

I borrowed its code of the criteria for choosing Essential or Homography (for decomposition to obtain relative camera pose.). The copied functions are checkEssentialScore and checkHomographyScore in motion_estimation.h.

7. To Do

Improvements

  • In bundle adjustment, I cannot optimize (1) multiple frames and (b) map points at the same time. It returns huge error. I haven't figure out why.

  • If certain region of the image has only few keypoints, then extract more.

  • Utilize epipolar constraint to do feature matching.

More Repositories

1

Realtime-Action-Recognition

Apply ML to the skeletons from OpenPose; 9 actions; multiple people. (WARNING: I'm sorry that this is only good for course demo, not for real world applications !!! Those ary very difficult !!!)
Python
861
star
2

open3d_ros_pointcloud_conversion

2 Python API functions for point cloud conversion between Open3D and ROS. Compatible for XYZ and XYZRGB point type.
CMake
94
star
3

ros_yolo_as_template_matching

Run 3 scripts to (1) Synthesize images (by putting few template images onto backgrounds), (2) Train YOLOv3, and (3) Detect objects for: one image, images, video, webcam, or ROS topic.
Python
57
star
4

3D-Scanner-by-Baxter

Use a robot arm (Baxter) mounted with a depth camera to scan an object's 3D model.
Python
56
star
5

practice_motion_planning

Coding: โ‘ Path planning: RRT*, A*; โ‘ก Tracking: Optimization, PurePursuit, FollowLine. โ‘ขPlanning and control on a mobile manipulator
Python
48
star
6

ros_openpose_rgbd

Visualize 3d humans' skeletons(body+hands) in ros rviz. The 2d joints are detected by openpose; The depth is from depth image.
Python
39
star
7

ros_3d_pointing_detection

Which object a person is pointing at? Detect it by using YOLO, Openpose and depth image (under customized scene).
Python
37
star
8

Speech-Commands-Classification-by-LSTM-PyTorch

Classification of 11 types of audio clips using MFCCs features and LSTM. Pretrained on Speech Command Dataset with intensive data augmentation.
Jupyter Notebook
36
star
9

ros_detect_planes_from_depth_img

A python node to detect planes from depth image by using RANSAC algorithm. Input/Output from/to ROS topics.
Python
34
star
10

Detect-Object-and-6D-Pose

(1) 3D scan object by Baxter. (2) Label objects automatically by depth camera and (3) train Yolo. (4) [TODO; NOT DONE YET!!!] Finally, detect object and fit 3D model to know the 6D pose.
26
star
11

Mask-Objects-from-RGBD

Put objects on a plane. Use depth camera to find them and add label (for training Yolo).
Python
17
star
12

Data-Augment-and-Train-Yolo

Put masked object onto background images randomly to generate images. Train Yolo3.
Jupyter Notebook
16
star
13

API_for_Simulating_Multi-Link_System

Mathematica API for simulating the dynamics and collision of planar multi-link objects (by Euler-Lagrange equation).
Mathematica
14
star
14

Detect-Hand-Grasping-Object

A toy project: Detect my hand grasping object in the video. Backbone algorithms: SiamMask, Mask_RCNN, OpenPose
Jupyter Notebook
13
star
15

Data-Storage

Store some images, gifs, etc.
13
star
16

ros_pub_and_sub_rgbd_and_cloud

Python nodes to publish/subscribe RGB-D images and their point clouds (or any of them) to/from ROS topics.
Python
13
star
17

record_images_from_usbcam

Run one script and press 's'/'d' to save your laptop's camera images to disk. Two versions: (1) Python, and (2) ROS node.
Python
9
star
18

cpp_practice_image_processing

Implement: Sobel; Canny; Harris; Hough line; Fit line; RANSAC.
C++
8
star
19

Command_Robot_to_Move

Use voice to tell robot the target, then the robot detects it and moves there. (LSTM, YOLO, Plane detection, Motion planning, ...)
Jupyter Notebook
6
star
20

ros_turtlebot_control

ROS services for controlling Turtlebot3 to target pose by `Move to Pose` algorithm.
Python
5
star
21

ros_speech_commands_classification

(1) Press key to record audio; (2) Speak a word to microphone; (3) Finally, see the classification result on GUI and ROS topic.
Python
5
star
22

Voice_Control_Turtlebot--Masters_Final

A toy project of using voice to tell a Turtlebot Robot to detect and move to target, achieved by 4 components (1) speech classification, (2) object detection, (3) plane detection, and (4) control of wheel motion.
5
star
23

DQN_SwingUpPendulum

Using Deep Q-network to train an AI to play swing-up pendulum game
Python
3
star
24

ros_record_rgbd_images

Press key to record color/depth images from ROS topics or Realsense. Key `a` for saving single image; `s` for starting continuous recording; `d` for stop recording. `q` for quit.
Python
3
star
25

ros_images_publisher

A python script to publish color or depth images from a folder to ROS topic.
Python
2
star
26

Monocular-Visual-Odometry-Data

Only my VO project's test data and results
2
star
27

Baxter_Picks_Up_Dices

A readme for the CV in ME495's final project โ€œBaxter Robot picking up dicesโ€. In short, (1) detecting dices using graph cut algorithm, and (2) locating their pos by geometry.
1
star
28

keyboard_input

4 functions for reading keyboard input : Read char or string; With or without time out.
Python
1
star