Toy Path Tracer
Toy path tracer I did in 2018 for my own learning purposes. Somewhat based on Peter Shirley's Ray Tracing in One Weekend minibook (highly recommended!), and on Kevin Beason's smallpt.
I decided to write blog posts about things I discover as I do this:
- Part 0: Intro
- Part 1: Initial C++ and walkthrough
- Part 2: Fix stupid performance issue
- Part 3: C#, Unity and Burst
- Part 4: Correctness fixes and Mitsuba
- Part 5: simple GPU version via Metal
- Part 6: simple GPU version via D3D11
- Part 7: initial C++ SIMD & SoA
- Part 8: SSE SIMD for HitSpheres
- Part 9: ryg optimizes my code
- Part 10: Update all implementations to match
- Part 11: Buffer-oriented approach on CPU
- Part 12: Buffer-oriented approach on GPU D3D11
- Part 13: GPU thread group data optimization
- Part 14: Make it run on iOS
- Part 15: A bunch of path tracing links
- Part 16: Unity C# Burst optimization
- Part 17: WebAssembly
Note: it can only do spheres, no bounding volume hierachy of any sorts, a lot of stuff hardcoded.
Performance numbers in Mray/s on a scene with ~50 spheres and two light sources, running on the CPU:
Language | Approach | Ryzen 5950 | AMD TR1950 | MBP 2021 | MBP 2018 | MBA 2020 | iPhone 11 | iPhone X | iPhone SE |
---|---|---|---|---|---|---|---|---|---|
C++ | SIMD Intrinsics | 281.0 | 187.0 | 105.4 | 74.0 | 32.3 | 26.4 | 12.9 | 8.5 |
Scalar | 141.2 | 100.0 | 84.8 | 35.7 | 15.9 | ||||
WebAssembly (no threads, no SIMD) | 8.4 | 5.0 | 8.1 | 5.6 | |||||
C# | Unity Burst "manual" SIMD | 227.2 | 133.0 | 103.7 | 60.0 | 29.7 | |||
Unity Burst | 82.0 | 36.0 | |||||||
Unity (Editor) | 6.5 | 3.4 | |||||||
Unity (player Mono) | 6.7 | 3.5 | |||||||
Unity (player IL2CPP) | 39.1 | 63.8 | 17.2 | ||||||
.NET 6.0 | 91.5 | 53.0 | 40.9 | ||||||
.NET Core 2.0 | 86.1 | 53.0 | 23.6 | ||||||
Mono --llvm | 35.1 | 22.0 | |||||||
Mono | 23.6 | 3.6 | 6.1 |
More detailed specs of the machines above are:
Ryzen 5950
: AMD Ryzen 5950X (3.4GHz, 16c/32t), Visual Studio 2022.AMD TR1950
: AMD ThreadRipper 1950X (3.4GHz, SMT disabled - 16c/16t), Visual Studio 2017.MBP 2021
: Apple MacBook Pro M1 Max (8+2 cores), Xcode 13.2.MBP 2018
: Apple MacBook Pro mid-2018 (Core i9 2.9GHz, 6c/12t).MBA 2020
: Apple MacBook Air 2020 (Core i7 1.2GHz, 4c/8t).iPhone 11
: A13 chip.iPhone X
: A11 chip.iPhone SE
: A9 chip.
Software versions:
- Unity 2021.3.16. Burst 1.6.6 (safety checks off). C# testing in editor, Release mode.
- Mono 6.12.
And on the GPU, via a compute shader in D3D11 or Metal depending on the platform:
GPU | Perf |
---|---|
D3D11 | |
GeForce RTX 3080Ti | 3920 |
GeForce GTX 1080Ti | 1854 |
Metal | |
MBP 2021 (M1 Max) | 1065 |
MBP 2018 (Radeon Pro 560X) | 246 |
MBA 2020 (Iris Plus) | 201 |
iPhone 11 Pro (A13) | 80 |
iPhone X (A11) | 46 |
iPhone SE (A9) | 20 |
A lot of stuff in the implementation is totally suboptimal or using the tech in a "wrong" way. I know it's just a simple toy, ok :)
Building
- C++ projects:
- Windows (Visual Studio 2017) in
Cpp/Windows/ToyPathTracer.sln
. DX11 Win32 app that displays result as a fullscreen CPU-updated or GPU-rendered texture. Pressing G toggles between GPU and CPU tracing, A toggles animation, P toggles progressive accumulation. - Mac/iOS (Xcode 10) in
Cpp/Apple/ToyPathTracer.xcodeproj
. Metal app that displays result as a fullscreen CPU-updated or GPU-rendered texture. Pressing G toggles between GPU and CPU tracing, A toggles animation, P toggles progressive accumulation. Should work on both Mac (Test Mac
target) and iOS (Test iOS
target). - WebAssembly in
Cpp/Emscripten/build.sh
. CPU, single threaded, no SIMD.
- Windows (Visual Studio 2017) in
- C# project in
Cs/TestCs.sln
. A command line app that renders some frames and dumps out final TGA screenshot at the end. - Unity project in
Unity
. I used Unity 2021.3.16.