kernel_compiler_patch
This patch adds additional optimization/tuning for kernel builds by adding more micro-architectures options accessible under:
Processor type and features --->
Processor family --->
Why a specific patch?
The kernel uses its own set of CFLAGS, KCFLAGS. For example, see:
Alternative way to define a -march= option without this patch
As pointed out by codemac in this topic, one can simply export the value/values for the KCFLAGS
and KCPPFLAGS
before calling make
to achieve the same result, see here.
export KCFLAGS=' -march=znver3 -mtune=znver3'
export KCPPFLAGS=' -march=znver3 -mtune=znver3'
make all
Expanded CPUs include
CPU Family | -march= | Min GCC Ver | Min Clang Ver |
---|---|---|---|
Native optimizations autodetected by GCC | native | 4.2 | 3.8 |
Generic 64-bit level v2 | x86-64-v2 | 11.1 | 12.0 |
Generic 64-bit level v3 | x86-64-v3 | 11.1 | 12.0 |
Generic 64-bit level v4 | x86-64-v4 | 11.1 | 12.0 |
AMD Improved K8-family | k8-sse3 | 9.3 | 9.0 |
AMD K10-family | amdfam10 | 9.3 | 9.0 |
AMD Family 10h (Barcelona) | barcelona | 9.3 | 9.0 |
AMD Family 14h (Bobcat) | btver1 | 9.3 | 9.0 |
AMD Family 16h (Jaguar) | btver2 | 9.3 | 9.0 |
AMD Family 15h (Bulldozer) | bdver1 | 9.3 | 9.0 |
AMD Family 15h (Piledriver) | bdver2 | 9.3 | 9.0 |
AMD Family 15h (Steamroller) | bdver3 | 9.3 | 9.0 |
AMD Family 15h (Excavator) | bdver4 | 9.3 | 9.0 |
AMD Family 17h (Zen) | znver1 | 9.3 | 9.0 |
AMD Family 17h (Zen 2) | znver2 | 9.3 | 9.0 |
AMD Family 19h (Zen 3) | znver3 | 10.3 | 12.0 |
AMD Family 19h (Zen 4) | znver4 | 13.0 | ??? |
Intel Bonnell family Atom | bonnell | 9.3 | 9.0 |
Intel Silvermont family Atom | silvermont | 9.3 | 9.0 |
Intel Goldmont family Atom (Apollo Lake and Denverton) | goldmont | 9.3 | 9.0 |
Intel Goldmont Plus family Atom (Gemini Lake) | goldmont-plus | 9.3 | 9.0 |
Intel 1st Gen Core i3/i5/i7-family (Nehalem) | nehalem | 9.3 | 9.0 |
Intel 1.5 Gen Core i3/i5/i7-family (Westmere) | westmere | 9.3 | 9.0 |
Intel 2nd Gen Core i3/i5/i7-family (Sandybridge) | sandybridge | 9.3 | 9.0 |
Intel 3rd Gen Core i3/i5/i7-family (Ivybridge) | ivybridge | 9.3 | 9.0 |
Intel 4th Gen Core i3/i5/i7-family (Haswell) | haswell | 9.3 | 9.0 |
Intel 5th Gen Core i3/i5/i7-family (Broadwell) | broadwell | 9.3 | 9.0 |
Intel 6th Gen Core i3/i5/i7-family (Skylake) | skylake | 9.3 | 9.0 |
Intel 6th Gen Core i7/i9-family (Skylake X) | skylake-avx512 | 9.3 | 9.0 |
Intel 8th Gen Core i3/i5/i7-family (Cannon Lake) | cannonlake | 9.3 | 9.0 |
Intel 10th Gen Core i7/i9-family (Ice Lake) | icelake-client | 9.3 | 9.0 |
Intel Xeon (Cascade Lake) | cascadelake | 10.2 | 10.0 |
Intel Xeon (Cooper Lake) | cooperlake | 10.2 | 10.0 |
Intel 3rd Gen 10nm++ i3/i5/i7/i9-family (Tiger Lake) | cooperlake | 10.2 | 10.0 |
Intel 4th Gen 10nm++ Xeon (Sapphire Rapids) | sapphirerapids | 11.1 | 12.0 |
Intel 11th Gen i3/i5/i7/i9-family (Rocket Lake) | rocketlake | 11.1 | 12.0 |
Intel 12th Gen i3/i5/i7/i9-family (Alder Lake) | alderlake | 11.1 | 12.0 |
Intel 13th Gen i3/i5/i7/i9-family (Raptor Lake) | raptorlake | 13.0 | 15.0.5 |
Intel 5th Gen 10nm++ Xeon (Emerald Rapids) | emeraldrapids | 13.0 | ??? |
Benchmarks
Intro
Three different machines running a generic x86-64 kernel and an otherwise identical kernel running with the optimized gcc options were tested using a make based endpoint.
Conclusion
There are small but real speed increases to running with this patch as judged by a make endpoint. The increases are on par with the speed increase that the upstream sanctioned core2 option gives users, so not including additional options seems somewhat arbitrary to me.
Details
- Three test machines: Intel Xeon X3360, Intel i7-2620M, Intel Core i7-3660K.
- All ran the make benchmark (linked below) 35 times while booted into a 'generic' kernel. Then all ran the same make benchmark 35 times after booting into an optimized kernel. Below are the optimizations chosen for each machine.
- X3360 = core2
- i7-2620M = sandybridge
- i7-3660K = ivybridge
- Results were analyzed for statistical significance via ANOVA plots that clearly show statistically significant albeit small differences.
Discussion
- All the assumptions for ANOVA are met:
- Data are normally distributed as show in the normal quantile plots.
- The population variances are fairly equal (Levene and Barlett tests).
- The ANOVA plots clearly show significance.
- Pair-wise analysis by Tukey-Kramer shows significance at the 0.05 level for all CPUs compared.
Below are the differences in median values:
CPU | Difference in median value |
---|---|
core2 | +87.5 ms |
sandybridge | +79.7 ms |
ivybridge | +257.2 ms |
References
- Bash script that controls the benchmark: https://github.com/graysky2/bin/blob/master/bench
- Log file generated by script: http://repo-ck.com/bench/compile_time_optimization.txt.gz
Credit
- Original author: jeroen AT linuxforge DOT net
- Link to original version: http://www.linuxforge.net/docs/linux/linux-gcc.php
Legacy support
Find support for older version of the linux kernel and of gcc in the outdated_versions directory.