🟢 Gaussian Splatting Notes (WIP)

The text version of my explanatory stream (Chinese with English CC) on gaussian splatting https://youtube.com/live/1buFrKUaqwM

📖 Table of contents

Introduction
Foward pass
- placeholder
Backward pass
- placeholder

📑 Introduction

This guide aims at deciphering the formulae in the rasterization process (forward and backward). It is only focused on these two parts, and I want to provide as many details as possible since here lies the core of the algorithm. I will paste related code from the original repo to help you identify where to look at.

If you see sections starting with 💡, it's something I think important to understand.

Before continuing, please read the original paper of how the gaussian splatting algorithm works in a big picture. Also note that the full algorithm has other important parts such as point densification and pruning which won't be covered in this article since I think those parts are relatively easier to understand.

➡️ Forward pass

The forward pass consists of two parts:

Compute the attributes of each gaussian
Compute the color of each pixel

1. Compute the attributes of each gaussian

Each gaussian holds the following raw attributes:

# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L47-L52
self._xyz = torch.empty(0)            # world coordinate
self._features_dc = torch.empty(0)    # diffuse color
self._features_rest = torch.empty(0)  # spherical harmonic coefficients
self._scaling = torch.empty(0)        # 3d scale
self._rotation = torch.empty(0)       # rotation expressed in quaternions
self._opacity = torch.empty(0)        # opacity

# they are initialized as empty tensors then assigned with values on
# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L215

To project the gaussian onto a 2D image, we must go through some more computations to transform the attributes to 2D:

1-1. Compute derived attributes (radius, uv, cov2D)

First, from scaling and rotation, we can compute 3D covariance from the formula

$\Sigma = RSS^TR^T \quad \text{Eq. 6}$ where

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L134-L138
glm::mat3 R = glm::mat3(
  1.f - 2.f * (y * y + z * z), 2.f * (x * y - r * z), 2.f * (x * z + r * y),
  2.f * (x * y + r * z), 1.f - 2.f * (x * x + z * z), 2.f * (y * z - r * x),
  2.f * (x * z - r * y), 2.f * (y * z + r * x), 1.f - 2.f * (x * x + y * y)
);

and

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L121-L124
glm::mat3 S = glm::mat3(1.0f); // S is a diagonal matrix
S[0][0] = mod * scale.x;
S[1][1] = mod * scale.y;
S[2][2] = mod * scale.z;

Note that S is multiplied with a scale factor mod that is kept as 1.0 during training.

In inference, this value (scaling_modifier) and be modified on

# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L18
def render(..., scaling_modifier = 1.0, ...):

to control the scale of the gaussians. In their demo they showed how it looks by setting this number to something <1 (shrinking the size). Theoretically this value can also be set >1 to increase the size.

💡 quote from the paper 💡

An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our pa- rameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices.

The design of optimizing the 3D covariance by decomposing it to R and S separately is not a random choice. It is a trick we call "reparametrization". By making it expressed as $RSS^TR^T$, it is guaranteed to be always positive semi-definite (matrix of the form $A^TA$ is always positive semi-definite).

Next, we need to get 3 things: radius, uv and cov (2D covariance, or equivalently its inverse conic) which are the 2D attributes of a gaussian projected on an image.

We can get cov by $\Sigma' = JW\Sigma W^TJ^T \quad \text{Eq. 5}$

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L99-L106
glm::mat3 T = W * J;
glm::mat3 Vrk = glm::mat3(
		cov3D[0], cov3D[1], cov3D[2],
		cov3D[1], cov3D[3], cov3D[4],
		cov3D[2], cov3D[4], cov3D[5]);
glm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T;

Let's put (remember the 2D and 3D covariance matrices are symmetric) for the calculation that we're going to do in the following.

Its inverse conic (honestly I don't know why they've chosen such a bad variable name, calling it cov_inv would've been 100x better) can be expressed as (actually it's a very useful thing to remember: to invert a 2D matrix, you invert the diagonal, put negative signs on the off-diagonal entries and finally put a 1/det in front of everything).

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
float det = (cov.x * cov.z - cov.y * cov.y);
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L222-L223
float det_inv = 1.f / det;
float3 conic = { cov.z * det_inv, -cov.y * det_inv, cov.x * det_inv };  // since the covariance matrix is symmetric, we only need to save the upper triangle

💡 A small trick to ensure the numerical stability of the inverse of cov 💡

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L110-L111
cov[0][0] += 0.3f;
cov[1][1] += 0.3f;

By construction, cov is only positive semi- definite (recall that it's in the form $A^TA$) which is not sufficient for this matrix to be invertible (which we need it to be because we need to calculate Eq. 4).

Here we add 0.3 to the diagonal to make it invertible. Why is this true? Let's put $cov = A^TA$; adding some positive value to the diagonal means adding $\lambda I$ to the matrix ($\lambda$ is the value we add, and $I$ is the identity matrix), so $cov = A^TA + \lambda I$. Now for any vector $x$, if we compute $x^T \cdot cov \cdot x$, it is equal to $x^TA^TAx + \lambda x^Tx = ||Ax||^2 + \lambda ||x||^2$ which is strictly positive. Why are we computing this quantity? This is actually the definition of a matrix being positive definite (note that we have gotten rid of the semi-) which means not only it's invertible, but also all of its eigenvalues are strictly positive.

Having cov in hand, we can now proceed to compute the radius of a gaussian.

Theoretically, when projecting an ellipsoid onto an image, you get an ellipse, not a circle. However, storing the attributes of an ellipse is much more complicated: you need to store the center, the long and short axis lengths and the orientation; whereas for a circle, you only need its center and the radius. Therefore, the authors choose to approximate the projection with a circle circumscribing the ellipse (see the following figure). This is what the radius attribute represents.

How to get the radius from cov? Let's make analogy from the 1-dimensional case.

Imagine we have a 1D gaussian like the following:

How can we define the "radius" of such a gaussian? Intuitively, it is some value $r$ that we expect that if we crop the graph from $-r$ to $r$, it still covers most of the graph. Following this intuition and our high-school math knowledge, it is not difficult to come up with the value $r = 3 \cdot \sqrt{var}$ where $var$ is the variation of this gaussian (btw, this covers 99.73% of the gaussian).

Fortunately, the analogy applies to any dimension, just be aware that the "radius" is different along each axis (remember there are two axes in an ellipse).

We said $r = 3 \cdot \sqrt{var}$. How to, then, get the $var$ of a 2D gaussian given its covariance matrix? It is the two eigenvalues of the covariance matrix. Therefore, the problem now comes down to the calculation of the two eigenvalues.

I could've given you the answer directly, but out of personal preference (I ❤️ linear-algebra), I want to detail it more. First of all, for a square matrix $A$ we say it has eigenvalue $\lambda$ with the associated eigenvector $x$ if $\lambda$ and $x$ satisfy $Ax = \lambda x, x \neq 0$. There are as many eigenvalues (and associated eigenvectors) as the dimension of $A$ if we operate in the domain of complex numbers.

In general, to calculate all eigenvalues of $A$, we solve the equation $det(A-λ\cdot I) = 0$ (the variable being $λ$). If we replace with the cov matrix we have above, this equation can be expressed as $(a-λ)(c-λ)-b^2 = 0$ which is a quadratic equation that all of us are familiar with.

The solutions (eigenvalues) are lambda1 and lambda2 in the following code

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
float det = (cov.x * cov.z - cov.y * cov.y);  // this is a*c - b*b in our expression
...
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L229-L231
float mid = 0.5f * (cov.x + cov.z);
float lambda1 = mid + sqrt(max(0.1f, mid * mid - det));  // I'm not too sure what 0.1 serves here
float lambda2 = mid - sqrt(max(0.1f, mid * mid - det));

Then we finally get radius as 3 times the square root of the bigger eigenvalue:

https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L232
float my_radius = ceil(3.f * sqrt(max(lambda1, lambda2)));  // ceil() to make it at least 1 because we operate in pixel space

Last thing, which is probably the most obvious, is the uv (image coordinates) of the gaussian. It is done via a simple projection from the 3D center:

// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L197-L200
float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };
float4 p_hom = transformPoint4x4(p_orig, projmatrix);
float p_w = 1.0f / (p_hom.w + 0.0000001f);
float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };
...
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L233
float2 point_image = { ndc2Pix(p_proj.x, W), ndc2Pix(p_proj.y, H) };  // I like to call it uv

Phew, we finally got the three quantities we need to know: radius, uv and conic. Let's move on to the next part.

1-2. Compute which tiles each gaussian covers

Before computing the color of an image, the authors introduces a special but very effective way that significantly accelerates rendering. Specifically, we divide the whole image into tiles which are 16x16 pixel blocks like the following (the tiles might exceed image borders if height/width is not a multiple of 16):

We also order the tiles in row-major order (left-top is tile 0, the one on its right is 1, etc). The number below the tile number is its tile coordinates.

Then, we compute which tiles each gaussian covers by using the uv and radius computed above. See the following figure:

kwea123/gaussian_splatting_notes

kwea123

Reviews

Repository Details