v2.4.10 SGEMM TF32 Stage 2/3
What's Changed
- [HGEMM] HGEMM WMMA Stage mma4x2+warp4x4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/76
- [SGEMM] Add SGEMM WMMA TF32 Stage2/3 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/77
- [SGEMM] Add cuBLAS SGEMM F32/TF32 baseline by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/78
- [SGEMM] Add Kernel cudaFuncSetAttribute hint by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/79
- [RoPE] Add minimal RoPE f32/f32x4 pack impl by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/80
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.9...v2.4.10