Skip to content

Commit 112a825

Browse files
authored
update the default collation of GBK from gbk_bin to gbk_chinese_ci (#20234)
1 parent 713d9e1 commit 112a825

File tree

3 files changed

+19
-42
lines changed

3 files changed

+19
-42
lines changed

character-set-and-collation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ SHOW CHARACTER SET;
104104
+---------+-------------------------------------+-------------------+--------+
105105
| ascii | US ASCII | ascii_bin | 1 |
106106
| binary | binary | binary | 1 |
107-
| gbk | Chinese Internal Code Specification | gbk_bin | 2 |
107+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
108108
| latin1 | Latin1 | latin1_bin | 1 |
109109
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
110110
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |

character-set-gbk.md

+7-31
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ summary: 本文介绍 TiDB 对 GBK 字符集的支持情况。
77

88
TiDB 从 v5.4.0 开始支持 GBK 字符集。本文档介绍 TiDB 对 GBK 字符集的支持和兼容情况。
99

10+
从 TiDB v6.0.0 开始,[新的排序规则框架](/character-set-and-collation.md#新框架下的排序规则支持)默认启用,即 TiDB GBK 字符集的默认排序规则为 `gbk_chinese_ci`,与 MySQL 保持一致。
11+
1012
```sql
1113
SHOW CHARACTER SET WHERE CHARSET = 'gbk';
1214
```
@@ -15,7 +17,7 @@ SHOW CHARACTER SET WHERE CHARSET = 'gbk';
1517
+---------+-------------------------------------+-------------------+--------+
1618
| Charset | Description | Default collation | Maxlen |
1719
+---------+-------------------------------------+-------------------+--------+
18-
| gbk | Chinese Internal Code Specification | gbk_bin | 2 |
20+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
1921
+---------+-------------------------------------+-------------------+--------+
2022
1 row in set (0.00 sec)
2123
```
@@ -40,38 +42,12 @@ SHOW COLLATION WHERE CHARSET = 'gbk';
4042

4143
### 排序规则兼容性
4244

43-
MySQL 的字符集默认排序规则是 `gbk_chinese_ci`。与 MySQL 不同,TiDB GBK 字符集的默认排序规则为 `gbk_bin`。另外,TiDB 支持的 `gbk_bin` 与 MySQL 支持的 `gbk_bin` 排序规则也不一致,TiDB 是将 GBK 转换成 `utf8mb4`,然后再进行二进制排序。
44-
45-
如果要使 TiDB 兼容 MySQL 的 GBK 字符集排序规则,你需要在初次初始化 TiDB 集群时设置 TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap)`true` 来开启[新的排序规则框架](/character-set-and-collation.md#新框架下的排序规则支持)。对于新部署的系统,该设置是默认值。
46-
47-
开启新的排序规则框架后,如果查看 GBK 字符集对应的排序规则,你可以看到 TiDB GBK 默认排序规则已经切换为 `gbk_chinese_ci`
45+
MySQL 的 GBK 字符集默认排序规则是 `gbk_chinese_ci`。TiDB 的 GBK 字符集的默认排序规则取决于 TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) 的值:
4846

49-
```sql
50-
SHOW CHARACTER SET WHERE CHARSET = 'gbk';
51-
```
47+
- 默认情况下,TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap)`true`,表示开启[新的排序规则框架](/character-set-and-collation.md#新框架下的排序规则支持)。GBK 字符集的默认排序规则是 `gbk_chinese_ci`
48+
- 当 TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap)`false` 时,表示关闭新的排序规则框架,GBK 字符集的默认排序规则是 `gbk_bin`
5249

53-
```
54-
+---------+-------------------------------------+-------------------+--------+
55-
| Charset | Description | Default collation | Maxlen |
56-
+---------+-------------------------------------+-------------------+--------+
57-
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
58-
+---------+-------------------------------------+-------------------+--------+
59-
1 row in set (0.00 sec)
60-
```
61-
62-
```sql
63-
SHOW COLLATION WHERE CHARSET = 'gbk';
64-
```
65-
66-
```
67-
+----------------+---------+----+---------+----------+---------+---------------+
68-
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
69-
+----------------+---------+----+---------+----------+---------+---------------+
70-
| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE |
71-
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE |
72-
+----------------+---------+----+---------+----------+---------+---------------+
73-
2 rows in set (0.00 sec)
74-
```
50+
另外,TiDB 支持的 `gbk_bin` 与 MySQL 支持的 `gbk_bin` 排序规则不一致,TiDB 是将 GBK 转换成 `utf8mb4`,然后再进行二进制排序。
7551

7652
### 非法字符兼容性
7753

sql-statements/sql-statement-show-character-set.md

+11-10
Original file line numberDiff line numberDiff line change
@@ -26,16 +26,17 @@ SHOW CHARACTER SET;
2626
```
2727

2828
```
29-
+---------+---------------+-------------------+--------+
30-
| Charset | Description | Default collation | Maxlen |
31-
+---------+---------------+-------------------+--------+
32-
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
33-
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |
34-
| ascii | US ASCII | ascii_bin | 1 |
35-
| latin1 | Latin1 | latin1_bin | 1 |
36-
| binary | binary | binary | 1 |
37-
+---------+---------------+-------------------+--------+
38-
5 rows in set (0.00 sec)
29+
+---------+-------------------------------------+-------------------+--------+
30+
| Charset | Description | Default collation | Maxlen |
31+
+---------+-------------------------------------+-------------------+--------+
32+
| ascii | US ASCII | ascii_bin | 1 |
33+
| binary | binary | binary | 1 |
34+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
35+
| latin1 | Latin1 | latin1_bin | 1 |
36+
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
37+
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |
38+
+---------+-------------------------------------+-------------------+--------+
39+
6 rows in set (0.00 sec)
3940
```
4041

4142
```sql

0 commit comments

Comments
 (0)