Temp #19

apicalshark · 2024-11-15T07:55:12Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…ggml-org#10222) Fixes ggml-org#9582 Spawning too many concurrent copies of glslc leads to "Failed to create pipes" errors on Linux. This change applies the same throttling we use for multithreaded pipeline creation.

* tests: Fix memory bandwidth calculation for perf tests Add a flops calculation for flash attention. Add one GGML_OP_CPY perf test. * vulkan: Optimize contiguous copies Add a variant of the copy shader for when the tensors are contiguous. Avoid the complex addressing calculations, and do four elements per invocation to hide some other overhead. Apply similar changes to the scale shader, since scale is always contiguous. Add a "progress bar" for shader compiles.

* Fixes broken build for the SYCL CUDA backend caused by non-explicit gemm call in outprod (merged in with RWKV6 in Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ggml-org#10133) * Marks permuted MUL_MAT as unsupported to be able to run test-backend-ops * Fixes asserts in norm to fix debug builds.

Converter script can now read these two fields as a detailed base model and dataset source. This was done so that it will be easier for Hugging Face to integrate detailed metadata as needed. - base_model_sources (List[dict], optional) - dataset_sources (List[dict], optional) Dataset now represented as: - general.dataset.count - general.dataset.{id}.name - general.dataset.{id}.author - general.dataset.{id}.version - general.dataset.{id}.organization - general.dataset.{id}.description - general.dataset.{id}.url - general.dataset.{id}.doi - general.dataset.{id}.uuid - general.dataset.{id}.repo_url This also adds to base model these metadata: - general.base_model.{id}.description

…10272) * server : fix validate_model_chat_template * server : fix chat res

Signed-off-by: tianzixuan <tianzixuan335@hellobike.com>

* llama: propagating the results of `graph_compute` to the user interface * llama: reverting kv_cache in case of failed compute * llama: `llama_kv_cache_state` was removed, only the result of `llama_graph_compute` is returned * llama: restore a kv_cache in case of failed computation * llama: correct reverting of the entire batch. also updates `llama_kv_cache_find_slot`, will correctly count the number of `used` cells for recurrent models * llama: updated comments * llama : add comments about KV cache state after error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…ggml-org#10259) Also add vk_matmul_pipeline2 to hold f16/f32 accumulator versions of a pipeline. This isn't really used yet.

Reuse the index calculations across all of src0/src1/dst. Add a shader variant for when src0/src1 are the same dimensions and additional modulus for src1 aren't needed. Div/mod are slow, so add "fast" div/mod that have a fast path when the calculation isn't needed or can be done more cheaply.

* ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>

…-org#9921) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

* sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit 90cb61d.

Co-authored-by: noemotiovon <noemotiovon@gmail.com>

jeffbolznv and others added 18 commits November 11, 2024 18:13

server : fix incorrect res in validate_model_chat_template (ggml-org#…

0e712a5

…10272) * server : fix validate_model_chat_template * server : fix chat res

server : add missing docs (ggml-org#10269)

ff7fb67

docs : update bindings list (ggml-org#10261)

1ee9eea

Signed-off-by: tianzixuan <tianzixuan335@hellobike.com>

sync : ggml

5ea926d

vulkan: Use macros to make the mat mul pipeline creation more concise (…

66798e4

…ggml-org#10259) Also add vk_matmul_pipeline2 to hold f16/f32 accumulator versions of a pipeline. This isn't really used yet.

speculative : fix out-of-bounds access (ggml-org#10289)

2a82891

CUDA: no -sm row for very small matrices (ggml-org#10185)

4a8ccb3

ggml : build backends as libraries (ggml-org#10256)

ae8de6d

* ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (ggml…

1607a5e

…-org#9921) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

scripts : fix regex in sync [no ci]

4802ad3

cann: dockerfile and doc adjustment (ggml-org#10302)

231f936

Co-authored-by: noemotiovon <noemotiovon@gmail.com>

github-actions bot added documentation Improvements or additions to documentation examples server build devops nix testing python script ggml SYCL Nvidia GPU labels Nov 15, 2024

github-actions bot added Kompute Apple Metal labels Nov 15, 2024

Merge branch 'master' into temp

729d133

apicalshark merged commit 4a91abe into master Nov 15, 2024
4 of 48 checks passed

apicalshark deleted the temp branch November 15, 2024 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temp #19

Temp #19

apicalshark commented Nov 15, 2024

Temp #19

Temp #19

Conversation

apicalshark commented Nov 15, 2024