Releases: xlite-dev/LeetCUDA
Releases · xlite-dev/LeetCUDA
v3.0.9
What's Changed
- feat: add some torch.distributed examples by @DefTruth in #313
- feat: add some torch.distributed examples by @DefTruth in #315
- feat: add a naive CuTe flash-attn by @botbw in #314
- fix(kernels): correct typo in LayerNorm kernel at line 73 110 346 443 by @nxdxml in #317
- misc: manually update submodules by @DefTruth in #318
- chore: add naive cute flash-attn index by @DefTruth in #319
- add triton merge_attn_states zhihu blog by @DefTruth in #320
New Contributors
Full Changelog: v3.0.8...v3.0.9
v3.0.8
LeetCUDA v3.0.7
What's Changed
- Update mat-transpose/README.md by @DefTruth in #300
- feat: add triton fused-softmax by @DefTruth in #301
- misc: add pre-commit & format by @DefTruth in #302
- misc: add developer guide by @DefTruth in #303
- misc: add developer guide by @DefTruth in #304
- misc: fix typo by @DefTruth in #305
- Update CONTRIBUTE.md by @DefTruth in #306
- feat: update pre-commit max-length=80 by @DefTruth in #307
Full Changelog: v3.0.6...v3.0.7
LeetCUDA v3.0.6
What's Changed
- misc: update merge_attn_states unit tests by @DefTruth in #281
- misc: update merge_attn_states docs by @DefTruth in #282
- misc: update merge_attn_states docs by @DefTruth in #283
- feat: remove merge_attn_states kernel help func by @DefTruth in #284
- misc: remove static flag for to/from_float by @DefTruth in #285
- misc: add new zhihu tech blog link by @DefTruth in #287
- misc: add debug flag for ncu profile by @DefTruth in #288
- bugfix: corrected theta calculation in RoPE CUDA kernel by @jiaau in #290
- docs: Add my ring-attention zhihu blog by @DefTruth in #291
- Add simple CuTe mat-transpose implementations by @botbw in #292
- Update README.md by @DefTruth in #296
- Update README.md by @DefTruth in #297
- Update README.md by @DefTruth in #298
- Rename to LeetCUDA by @DefTruth in #299
New Contributors
Full Changelog: v3.0.5...v3.0.6
v3.0.5
What's Changed
- [Misc] Automated submodule update by @DefTruth in #261
- Update README.md by @tpoisonooo in #264
- Update README.md by @DefTruth in #265
- bugfix: only export per token softmax kernels by @DefTruth in #266
- misc: update vllm latest slides by @DefTruth in #267
- feat: add triton vector_add kernel by @DefTruth in #268
- feat: add triton merge_attn_states kernel by @DefTruth in #269
- feat: add cuda merge_attn_states kernel by @DefTruth in #270
- feat: update cuda merge_attn_states kernel by @DefTruth in #271
- misc: dispatch CUDA merge_attn_states by @DefTruth in #273
- misc: add triton kernel index by @DefTruth in #274
- Fix mistake on mat trans 2d when init grid. by @bear-zd in #275
- misc: update cuda merge_attn_states kernel by @DefTruth in #276
- kernel: optimize merge_attn_states CUDA kernel dispatch by @DefTruth in #278
- feat: optimize merge_attn_states thread block dispatch by @DefTruth in #279
New Contributors
- @tpoisonooo made their first contribution in #264
Full Changelog: v3.0.4...v3.0.5
v3.0.4
What's Changed
- [Docs] Add vLLM + DeepSeek-R1 671B deploy blog by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/259
Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.3...v3.0.4
v3.0.3
What's Changed
- [Misc] Automated submodule update by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/257
Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.2...v3.0.3
v3.0.2
What's Changed
- Fix typo in block_all_reduce.cu by @wplf in https://github.com/DefTruth/CUDA-Learn-Notes/pull/247
- fix typo about enougth by @wplf in https://github.com/DefTruth/CUDA-Learn-Notes/pull/248
- [FFPA] Add FFPA tech zhihu blog by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/252
- [FFPA] Update FFPA(Split-D) blog title by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/253
- [Misc] Automated submodule update by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/254
New Contributors
- @wplf made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/247
Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.1...v3.0.2
v3.0.1
What's Changed
- [README] Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/240
- [Bugfix] remove some error comments by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/241
- Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/242
- Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/243
- Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/244
- [misc] Add hgemm-tensorcores-mma submodule✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/246
Full Changelog: DefTruth/CUDA-Learn-Notes@v3.0.0...v3.0.1
v3.0.0
What's Changed
- [README] Add cuffpa-py library News🔥 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/216
- [README] Update cuffpa-py library News🔥 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/217
- [README] Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/218
- [README] Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/219
- [Misc] fix ffpa-attn-mma bench links typo by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/220
- [Misc] fix ffpa-attn-mma bench links typo by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/221
- [Misc] fix ffpa-attn-mma bench links typo by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/222
- [README] Update README by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/223
- [README] Add 🤖ffpa-attn-mma D=512 bench by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/224
- [submodule] Add ffpa-attn-mma submodule by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/225
- [misc] Update ffpa-attn-mma & cutlass submodules by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/226
- [FFPA] add ffpa-attn-mma kernels to lists by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/227
- [Misc] Update ffpa-attn-mma submodule by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/228
- [README] Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/229
- Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/230
- [Misc] update ffpa-attn-mma submodule by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/231
- [Misc] update ffpa-attn-mma submodule by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/233
- [Misc] update ffpa-attn-mma submodule by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/234
- [Misc] update ffpa-attn-mma submodule by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/236
- [Release] Bump up to v3.0.0 (#237) by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/238
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.15...v3.0.0