• Stars
    star
    477
  • Rank 92,112 (Top 2 %)
  • Language
  • Created about 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A detailed formulae explanation on gaussian splatting

🟢 Gaussian Splatting Notes (WIP)

The text version of my explanatory stream (Chinese with English CC) on gaussian splatting https://youtube.com/live/1buFrKUaqwM

📖 Table of contents

📑 Introduction

This guide aims at deciphering the formulae in the rasterization process (forward and backward). It is only focused on these two parts, and I want to provide as many details as possible since here lies the core of the algorithm. I will paste related code from the original repo to help you identify where to look at.

If you see sections starting with 💡, it's something I think important to understand.

Before continuing, please read the original paper of how the gaussian splatting algorithm works in a big picture. Also note that the full algorithm has other important parts such as point densification and pruning which won't be covered in this article since I think those parts are relatively easier to understand.

➡️ Forward pass

The forward pass consists of two parts:

  1. Compute the attributes of each gaussian
  2. Compute the color of each pixel

1. Compute the attributes of each gaussian

Each gaussian holds the following raw attributes:

# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L47-L52
self._xyz = torch.empty(0)            # world coordinate
self._features_dc = torch.empty(0)    # diffuse color
self._features_rest = torch.empty(0)  # spherical harmonic coefficients
self._scaling = torch.empty(0)        # 3d scale
self._rotation = torch.empty(0)       # rotation expressed in quaternions
self._opacity = torch.empty(0)        # opacity

# they are initialized as empty tensors then assigned with values on
# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L215

To project the gaussian onto a 2D image, we must go through some more computations to transform the attributes to 2D:

1-1. Compute derived attributes (radius, uv, cov2D)

First, from scaling and rotation, we can compute 3D covariance from the formula

$\Sigma = RSS^TR^T \quad \text{Eq. 6}$ where

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L134-L138
glm::mat3 R = glm::mat3(
  1.f - 2.f * (y * y + z * z), 2.f * (x * y - r * z), 2.f * (x * z + r * y),
  2.f * (x * y + r * z), 1.f - 2.f * (x * x + z * z), 2.f * (y * z - r * x),
  2.f * (x * z - r * y), 2.f * (y * z + r * x), 1.f - 2.f * (x * x + y * y)
);

and

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L121-L124
glm::mat3 S = glm::mat3(1.0f); // S is a diagonal matrix
S[0][0] = mod * scale.x;
S[1][1] = mod * scale.y;
S[2][2] = mod * scale.z;

Note that S is multiplied with a scale factor mod that is kept as 1.0 during training.

In inference, this value (scaling_modifier) and be modified on

# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L18
def render(..., scaling_modifier = 1.0, ...):

to control the scale of the gaussians. In their demo they showed how it looks by setting this number to something <1 (shrinking the size). Theoretically this value can also be set >1 to increase the size.


💡 quote from the paper 💡

An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our pa- rameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices.

The design of optimizing the 3D covariance by decomposing it to R and S separately is not a random choice. It is a trick we call "reparametrization". By making it expressed as $RSS^TR^T$, it is guaranteed to be always positive semi-definite (matrix of the form $A^TA$ is always positive semi-definite).


Next, we need to get 3 things: radius, uv and cov (2D covariance, or equivalently its inverse conic) which are the 2D attributes of a gaussian projected on an image.

We can get cov by $\Sigma' = JW\Sigma W^TJ^T \quad \text{Eq. 5}$

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L99-L106
glm::mat3 T = W * J;
glm::mat3 Vrk = glm::mat3(
		cov3D[0], cov3D[1], cov3D[2],
		cov3D[1], cov3D[3], cov3D[4],
		cov3D[2], cov3D[4], cov3D[5]);
glm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T;

Let's put 1 (remember the 2D and 3D covariance matrices are symmetric) for the calculation that we're going to do in the following.

Its inverse conic (honestly I don't know why they've chosen such a bad variable name, calling it cov_inv would've been 100x better) can be expressed as 1 (actually it's a very useful thing to remember: to invert a 2D matrix, you invert the diagonal, put negative signs on the off-diagonal entries and finally put a 1/det in front of everything).

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
float det = (cov.x * cov.z - cov.y * cov.y);
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L222-L223
float det_inv = 1.f / det;
float3 conic = { cov.z * det_inv, -cov.y * det_inv, cov.x * det_inv };  // since the covariance matrix is symmetric, we only need to save the upper triangle

💡 A small trick to ensure the numerical stability of the inverse of cov 💡

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L110-L111
cov[0][0] += 0.3f;
cov[1][1] += 0.3f;

By construction, cov is only positive semi- definite (recall that it's in the form $A^TA$) which is not sufficient for this matrix to be invertible (which we need it to be because we need to calculate Eq. 4).

Here we add 0.3 to the diagonal to make it invertible. Why is this true? Let's put $cov = A^TA$; adding some positive value to the diagonal means adding $\lambda I$ to the matrix ($\lambda$ is the value we add, and $I$ is the identity matrix), so $cov = A^TA + \lambda I$. Now for any vector $x$, if we compute $x^T \cdot cov \cdot x$, it is equal to $x^TA^TAx + \lambda x^Tx = ||Ax||^2 + \lambda ||x||^2$ which is strictly positive. Why are we computing this quantity? This is actually the definition of a matrix being positive definite (note that we have gotten rid of the semi-) which means not only it's invertible, but also all of its eigenvalues are strictly positive.


Having cov in hand, we can now proceed to compute the radius of a gaussian.

Theoretically, when projecting an ellipsoid onto an image, you get an ellipse, not a circle. However, storing the attributes of an ellipse is much more complicated: you need to store the center, the long and short axis lengths and the orientation; whereas for a circle, you only need its center and the radius. Therefore, the authors choose to approximate the projection with a circle circumscribing the ellipse (see the following figure). This is what the radius attribute represents.

How to get the radius from cov? Let's make analogy from the 1-dimensional case.

Imagine we have a 1D gaussian like the following:

image

How can we define the "radius" of such a gaussian? Intuitively, it is some value $r$ that we expect that if we crop the graph from $-r$ to $r$, it still covers most of the graph. Following this intuition and our high-school math knowledge, it is not difficult to come up with the value $r = 3 \cdot \sqrt{var}$ where $var$ is the variation of this gaussian (btw, this covers 99.73% of the gaussian).

Fortunately, the analogy applies to any dimension, just be aware that the "radius" is different along each axis (remember there are two axes in an ellipse).

We said $r = 3 \cdot \sqrt{var}$. How to, then, get the $var$ of a 2D gaussian given its covariance matrix? It is the two eigenvalues of the covariance matrix. Therefore, the problem now comes down to the calculation of the two eigenvalues.

I could've given you the answer directly, but out of personal preference (I ❤️ linear-algebra), I want to detail it more. First of all, for a square matrix $A$ we say it has eigenvalue $\lambda$ with the associated eigenvector $x$ if $\lambda$ and $x$ satisfy $Ax = \lambda x, x \neq 0$. There are as many eigenvalues (and associated eigenvectors) as the dimension of $A$ if we operate in the domain of complex numbers.

In general, to calculate all eigenvalues of $A$, we solve the equation $det(A-λ\cdot I) = 0$ (the variable being $λ$). If we replace with the cov matrix we have above, this equation can be expressed as $(a-λ)(c-λ)-b^2 = 0$ which is a quadratic equation that all of us are familiar with.

The solutions (eigenvalues) are lambda1 and lambda2 in the following code

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
float det = (cov.x * cov.z - cov.y * cov.y);  // this is a*c - b*b in our expression
...
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L229-L231
float mid = 0.5f * (cov.x + cov.z);
float lambda1 = mid + sqrt(max(0.1f, mid * mid - det));  // I'm not too sure what 0.1 serves here
float lambda2 = mid - sqrt(max(0.1f, mid * mid - det));

Then we finally get radius as 3 times the square root of the bigger eigenvalue:

https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L232
float my_radius = ceil(3.f * sqrt(max(lambda1, lambda2)));  // ceil() to make it at least 1 because we operate in pixel space

Last thing, which is probably the most obvious, is the uv (image coordinates) of the gaussian. It is done via a simple projection from the 3D center:

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L197-L200
float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };
float4 p_hom = transformPoint4x4(p_orig, projmatrix);
float p_w = 1.0f / (p_hom.w + 0.0000001f);
float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };
...
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L233
float2 point_image = { ndc2Pix(p_proj.x, W), ndc2Pix(p_proj.y, H) };  // I like to call it uv

Phew, we finally got the three quantities we need to know: radius, uv and conic. Let's move on to the next part.

1-2. Compute which tiles each gaussian covers

Before computing the color of an image, the authors introduces a special but very effective way that significantly accelerates rendering. Specifically, we divide the whole image into tiles which are 16x16 pixel blocks like the following (the tiles might exceed image borders if height/width is not a multiple of 16):

2

We also order the tiles in row-major order (left-top is tile 0, the one on its right is 1, etc). The number below the tile number is its tile coordinates.

Then, we compute which tiles each gaussian covers by using the uv and radius computed above. See the following figure:

2. Compute the color of each pixel

More Repositories

1

nerf_pl

NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
Jupyter Notebook
2,686
star
2

ngp_pl

Instant-ngp in pytorch+cuda trained with pytorch-lightning (high quality with high speed, with only few lines of legible code)
Jupyter Notebook
1,239
star
3

VTuber_Unity

Use Unity 3D character and Python deep learning algorithms to stream as a VTuber!
Python
779
star
4

pytorch-cppcuda-tutorial

tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)
Cuda
362
star
5

CasMVSNet_pl

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching using pytorch-lightning
Jupyter Notebook
271
star
6

OpenVTuberProject

Open Vtuber project containing all sub projects
238
star
7

nsff_pl

Neural Scene Flow Fields using pytorch-lightning, with potential improvements
Jupyter Notebook
222
star
8

nerf_Unity

Unity project for nerf_pl (Neural Radiance Fields)
C#
221
star
9

fish_detection

Fish detection using Open Images Dataset and Tensorflow Object Detection
Jupyter Notebook
124
star
10

Coordinate-MLPs

Experiments of coordinate MLPs
Python
93
star
11

RL

Jupyter Notebook
79
star
12

MVSNet_pl

MVSNet: Depth Inference for Unstructured Multi-view Stereo using pytorch-lightning
Jupyter Notebook
67
star
13

MINER_pl

Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning
Jupyter Notebook
60
star
14

BlendedMVS_scenes

Quick lookup for BlendedMVS scenes
Python
51
star
15

ROS_notes

Personal notes of ROS usage
Jupyter Notebook
48
star
16

Unity_live_caption

Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!
Python
36
star
17

python-ray-tracing-with-cuda-example

An example of cuda ray tracing in pure python syntax.
Python
33
star
18

pytorch-lightning-tutorial

Pytorch lightning tutorial using MNIST
Python
32
star
19

pytorch_cppcuda_practice

Practice to write cpp/cuda extension for pytorch
Cuda
27
star
20

hindsight_experience_replay

A tensorflow implementation of hindsight experience replay
Jupyter Notebook
16
star
21

kwea123

7
star
22

dino_pl

Reimplementation of Self-Supervised Vision Transformers with DINO in pytorch-lightning
Python
6
star
23

python-ray-tracing-with-numpy-example

Example of ray tracing with numpy in pure python syntax
4
star
24

kitti_bev_detection

Jupyter Notebook
4
star
25

bookkeeping

網頁語音記帳程式 - 利用Google Cloud Speech API 實現快速語音記帳
Python
4
star
26

facebook-bot

Python
2
star
27

cpp_data_algo

C++
1
star
28

frustum-pointnets-work

Jupyter Notebook
1
star
29

raspberry_pi3

Jupyter Notebook
1
star
30

cifar-10-cnn

Jupyter Notebook
1
star
31

kwea123.github.io

CSS
1
star
32

line-bot

Python
1
star
33

acoustic-indices

Jupyter Notebook
1
star