Rust CV
Rust CV is a project to implement computer vision algorithms, abstractions, and systems in Rust. #[no_std]
is supported where possible.
Documentation
Each crate has its own documentation, but the easiest way to check all of the documentation at once is to look at the docs for the cv
batteries-included crate.
Check out our tutorial book here! The book source for the tutorials can be found in the tutorial
directory of the repository. The example code used in the tutorial can be found in the tutorial-code
directory. The resources for tutorials can be found in the site res
directory.
About
This repository contains all computer vision crates for Rust CV in a mono-repo, including utilities as well as libraries. When updating libraries, all the crates in this repository should build for a PR to be accepted. Rust CV also maintains some other crates that are related to Computer Vision as well, which are located in the GitHub organization, not in this repository.
Each crate has its own associated license. Rust CV is comprised of different open source licenses, mostly MIT. See the crate directories (or their crates.io entries) for their individual licenses.
Each library was originally its own separate repository before being incorporated into the mono repo. The old repositories that are now in this repo are all archived, but still exist to find tagged versions, assocated commits, and issues. All new PRs should be made to this repository.
What is computer vision
Many people are familiar with covolutional neural networks and machine learning (ML) in computer vision, but computer vision is much more than that. Computer vision broadly encompases image processing, photogrammetry, and pattern recognition. Machine learning can be used in all of these domains (e.g. denoisers, depth map prediction, and face detection), but it is not required. Almost all of the algorithms in this repository are not based on machine learning, but that does not mean you cannot use machine learning with these tools. Please take a look at https://www.arewelearningyet.com/ for Rust ML tools. We may expand into ML more in the future for tasks at which ML outperforms statistical algorithms.
Build
Be sure to have installed rust: . The following packages will be needed on Ubuntu 20.04 (Built using Rust 1.53.0):
- Cmake
sudo apt install cmake
- build-essential
sudo apt-get install build-essential
- freetype2
sudo apt-get install libfreetype-dev
- libxkbcommon
sudo apt install libxkbcommon-dev
If you have not already done so, install Rust:
curl https://sh.rustup.rs -sSf | sh
Clone and Build
cd <directory to keep cloned repo>
git clone https://github.com/rust-cv/cv.git
cd cv
cargo build
Goals
One of the first things that Rust CV focused on was algorithms in the domain of photogrammetry. Today, Rust now has enough photogrammetry algorithms to perform SfM and visual SLAM. Weakness still exists within image processing and pattern recognition domains.
Here are some of the domains of computer vision that Rust CV intends to persue along with examples of the domain (not all algorithms below live within the Rust CV organization, and some of these may exist and are unknown to us; some things may have changed since this was last updated):
- Image processing (Wikipedia)
- Diffusion & blur
- Gaussian blur (Wikipedia)
- Fast Explicit Diffusion (FED) (implementation exists within
akaze
crate)
- Contrast enhancement
- Normalization (Wikipedia)
- Histogram equalization (Wikipedia)
- Edge detection (Wikipedia) & gradient extraction (Wikipedia)
- Perceptual hash (Wikipedia)
- Diffusion & blur
- Photogrammetry
- Feature extraction (Wikipedia)
- Camera models and calibration (both from and to image coordinates from bearings)
- Pinhole Camera (Wikipedia)
- Skew, focals, and principle point
- Kn radial distortion (Wikipedia)
- K1 radial distortion
- K1-K6 radial distortion
- Fisheye Camera (Wikipedia)
- Skew, focals, and principle point
- K1-K4 fisheye distortion (same as OpenCV)
- Equirectangular (Wikipedia)
- Pinhole Camera (Wikipedia)
- Matching (Wikipedia)
- Descriptor matching strategies
- Brute force (for camera traking with binary features)
- HGG (for loop closure)
- HNSW (for loop closure)
- Filtering strategies
- Symmetric matching/uniquely best match (exists within cv-sfm, but not reusable)
- Lowe's ratio test matching
- Descriptor matching strategies
- Geometric verification (utilized abstractions in sample-consensus)
- Consensus algorithms
- Estimation algorithms
- P3P (Wikipedia)
- Motion estimation (Wikipedia)
- Eight Point (Wikipedia)
- Nister-Stewenius (basically done, but not packaged up)
- Models
- Essential matrix (Wikipedia)
- With residual for feature matches
- Pose of world relative to camera (Wikipedia)
- With residual for feature to world matches
- Relative pose of camera (Wikipedia)
- With residual for feature matches
- Homography matrix (Wikipedia)
- With residual for feature matches
- Trifocal Tensor (Wikipedia)
- With residual for three-feature matches (not currently in cv-core, as there is no trifocal tensor yet)
- Essential matrix (Wikipedia)
- PnP (estimation, outlier filtering, and optimization) (incomplete)
- Image registration (Wikipedia)
- Real-time depth-map estimation (for direct visual odometry algorithms that require it) (Wikipedia)
- Visual concept detection (used for loop closure)
- Bag LSH/Simhash for binary features
- Bag of Visual Words (BoW, see Wikpedia article)
- Second order occurence pooling (as per the paper "Higher-order Occurrence Pooling for Bags-of-Words: Visual Concept Detection")
- Fisher vector encoding
- Learned place recognition (also see pattern recognition domain below)
- Reconstruction (Wikipedia)
- Visibility graph (Wikipedia)
- Graph optimization
- Loop closure (Wikipedia)
- Exporting (point cloud Wikipedia)
- To NVM file
- To PLY file (Wikipedia)
- Post-reconstruction depth-map estimation (for reconstruction post-processing) (Wikipedia)
- Densification
- Using more extracted features
- Using curvature maximas (as per "VITAMIN-E: VIsual Tracking And MappINg with Extremely Dense Feature Points")
- Using patch-match (Wikipedia) and depth-map
- Using depth-map only with edge detection
- Using more extracted features
- Meshing (Wikipedia)
- Delaunay triangulation
- Filtered with NLTGV minimization (as per "VITAMIN-E: VIsual Tracking And MappINg with Extremely Dense Feature Points")
- Poisson surface reconstruction (Wikipedia)
- Surface refinement
- Texturing (related Wikipedia)
- Delaunay triangulation
- Pattern recognition
- k-NN search
- Brute force
- HGG (for loop closure)
- HNSW
- FLANN
- Face recognition (Wikipedia)
- Articulated body pose estimation (Wikipedia)
- Object recognition (Wikipedia)
- Place recognition (can assist in MVG)
- Segmentation (Wikipedia)
- Semantic segmentation mapping (see this blog post)
- k-NN search
To support computer vision tooling, the following will be implemented:
- Point clouds (Wikipedia)
Credits
TheiaSfM and all of its authors can be thanked as their abstractions are direct inspiration for this crate. In some cases, the names of some abstractions may be borrowed directly if they are consistent. You can find the TheiaSfM documentation here.
"Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age" is an excellent paper that compiles information about modern SLAM algorithms and papers.