@@ -15,20 +15,23 @@ The tl;dr:
15
15
Token-level perplexity tests for various full-precision and quantized models using FP16, FP8 and Q4 cache
16
16
modes. Dataset is The Pile, 10 rows of 512 tokens per test.
17
17
18
- Model | Precision | FP16 cache | FP8 cache | Q4 cache
19
- --------|-----------|---------------|-----------|---------
20
- Mistral 7B Instruct | 3.0 bpw | 13.33 | 13.43 | 13.41
21
- -- | 3.5 bpw | 13.07 | 13.14 | 13.12
22
- -- | 4.0 bpw | 12.90 | 12.90 | 12.90
23
- -- | 5.0 bpw | 12.73 | 12.73 | 12.75
24
- -- | 6.0 bpw | 12.73 | 12.75 | 12.74
25
- -- | FP16 | 12.69 | 12.71 | 12.72
26
- Mixtral 8x7B | 3.5 bpw | 10.27 | 10.41 | 10.39
27
- -- | 4.0 bpw | 10.09 | 10.26 | 10.23
28
- -- | 5.0 bpw | 10.02 | 10.16 | 10.15
29
- Llama2 7B | 4.0 bpw | 11.43 | 11.92 | 11.74
30
- -- | 5.0 bpw | 11.13 | 11.40 | 11.31
31
- -- | FP16 | 10.91 | 11.24 | 11.16
18
+ Results are updated for the new method which uses Hadamard rotations on the keys/values. Old results for version
19
+ 0.0.18 and prior kept for reference.
20
+
21
+ Model | Precision | FP16 cache | FP8 cache | Q4 cache (old) | Q4 cache
22
+ --------|---------|-------------|-----------|-------|----------
23
+ Mistral 7B Instruct | 3.0 bpw | ** 13.33** | 13.43 | 13.41 | ** 13.37**
24
+ -- | 3.5 bpw | ** 13.07** | 13.14 | 13.12 | ** 13.09**
25
+ -- | 4.0 bpw | ** 12.90** | 12.90 | 12.90 | ** 12.90**
26
+ -- | 5.0 bpw | ** 12.73** | 12.73 | 12.75 | ** 12.75**
27
+ -- | 6.0 bpw | ** 12.73** | 12.75 | 12.74 | ** 12.74**
28
+ -- | FP16 | ** 12.69** | 12.71 | 12.72 | ** 12.69**
29
+ Mixtral 8x7B | 3.5 bpw | ** 10.27** | 10.41 | 10.39 | ** 10.32**
30
+ -- | 4.0 bpw | ** 10.09** | 10.26 | 10.23 | ** 10.19**
31
+ -- | 5.0 bpw | ** 10.02** | 10.16 | 10.15 | ** 10.04**
32
+ Llama2 7B | 4.0 bpw | ** 11.43** | 11.92 | 11.74 | ** 11.60**
33
+ -- | 5.0 bpw | ** 11.13** | 11.40 | 11.31 | ** 11.19**
34
+ -- | FP16 | ** 10.91** | 11.24 | 11.16 | ** 10.05**
32
35
33
36
34
37
### HumanEval
@@ -37,6 +40,8 @@ The following are HumanEval tests on various full-precision and quantized models
37
40
respectively. Number of samples per task is limited to 10 (still giving 39360 completions in total produced
38
41
over about 24 hours.)
39
42
43
+ The following tests were done prior to the improvements in 0.0.18-dev.
44
+
40
45
#### pass@1
41
46
42
47
Model | Precision | FP16 cache | Q4 cache | diff
0 commit comments