• Stars
    star
    214
  • Rank 184,678 (Top 4 %)
  • Language
    Assembly
  • License
    GNU General Publi...
  • Created about 10 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is an implementation of sgemm_kernel on L1d cache.

sgemm_hsw


This is an implementation of sgemm_kernel on L1d cache.

Test CPU: AMD Zen2 4750G Single Core @ 4.45GHz
A single core of Zen2 theoretical fp32 peak performance:
4.45GHz x (8 x 2 + 8 x 2) = 142.4 GFLOPS

Test case:
./sgemm_l1d 24 64
This means to compute a sgemm_kernel(m = 24, k = 64), while n is fixed to 24.

Test output:
sgemm_kernel_x64_fma(24, 24, 64): time = 0.520822 us, perf = 141.560930 GFLOPS.
sgemm_naive result: naive.bin
sgemm_kernel_x64_fma_m4n24 result: tuned.bin
Use fp_diff(https://github.com/pigirons/fp_diff) to compare the results.

Efficiency:
141.56093 / 142.4 = 99.4%

Diff results:
./fp_diff naive.bin tuned.bin

Array Count: 576

Average Absolute Error: 4.387564e-07
Max Absolute Error Position: 102
Max Absolute Error: 3.814697e-06

Average Relative Error: 2.836071e-08
Max Relative Error Position: 102
Max Relative Error: 2.285095e-07

Number of Relative Error(>1e-1): 0
Number of Relative Error(>1e-2): 0
Number of Relative Error(>1e-3): 0
Number of Relative Error(>1e-4): 0
Number of Relative Error(>1e-5): 0