Skip to content

Commit 5eeac85

Browse files
Oreoxmtti-chi-bot
authored andcommitted
This is an automated cherry-pick of #20234
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
1 parent 181e6b2 commit 5eeac85

File tree

3 files changed

+29
-11
lines changed

3 files changed

+29
-11
lines changed

character-set-and-collation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ SHOW CHARACTER SET;
7272
+---------+-------------------------------------+-------------------+--------+
7373
| ascii | US ASCII | ascii_bin | 1 |
7474
| binary | binary | binary | 1 |
75-
| gbk | Chinese Internal Code Specification | gbk_bin | 2 |
75+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
7676
| latin1 | Latin1 | latin1_bin | 1 |
7777
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
7878
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |

character-set-gbk.md

+17
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ summary: 本文介绍 TiDB 对 GBK 字符集的支持情况。
77

88
TiDB 从 v5.4.0 开始支持 GBK 字符集。本文档介绍 TiDB 对 GBK 字符集的支持和兼容情况。
99

10+
<<<<<<< HEAD
1011
```sql
1112
SHOW CHARACTER SET WHERE CHARSET = 'gbk';
1213
+---------+-------------------------------------+-------------------+--------+
@@ -36,6 +37,9 @@ MySQL 的字符集默认排序规则是 `gbk_chinese_ci`。与 MySQL 不同,Ti
3637
如果要使 TiDB 兼容 MySQL 的 GBK 字符集排序规则,你需要在初次初始化 TiDB 集群时设置 TiDB 配置项[`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap)`true` 来开启[新的排序规则框架](/character-set-and-collation.md#新框架下的排序规则支持)
3738

3839
开启新的排序规则框架后,如果查看 GBK 字符集对应的排序规则,你可以看到 TiDB GBK 默认排序规则已经切换为 `gbk_chinese_ci`
40+
=======
41+
从 TiDB v6.0.0 开始,[新的排序规则框架](/character-set-and-collation.md#新框架下的排序规则支持)默认启用,即 TiDB GBK 字符集的默认排序规则为 `gbk_chinese_ci`,与 MySQL 保持一致。
42+
>>>>>>> 112a825ed1 (update the default collation of GBK from gbk_bin to gbk_chinese_ci (#20234))
3943
4044
```sql
4145
SHOW CHARACTER SET WHERE CHARSET = 'gbk';
@@ -56,6 +60,19 @@ SHOW COLLATION WHERE CHARSET = 'gbk';
5660
2 rows in set (0.00 sec)
5761
```
5862

63+
## 与 MySQL 的兼容性
64+
65+
本节介绍 TiDB 中 GBK 字符集与 MySQL 的兼容情况。
66+
67+
### 排序规则兼容性
68+
69+
MySQL 的 GBK 字符集默认排序规则是 `gbk_chinese_ci`。TiDB 的 GBK 字符集的默认排序规则取决于 TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) 的值:
70+
71+
- 默认情况下,TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap)`true`,表示开启[新的排序规则框架](/character-set-and-collation.md#新框架下的排序规则支持)。GBK 字符集的默认排序规则是 `gbk_chinese_ci`
72+
- 当 TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap)`false` 时,表示关闭新的排序规则框架,GBK 字符集的默认排序规则是 `gbk_bin`
73+
74+
另外,TiDB 支持的 `gbk_bin` 与 MySQL 支持的 `gbk_bin` 排序规则不一致,TiDB 是将 GBK 转换成 `utf8mb4`,然后再进行二进制排序。
75+
5976
### 非法字符兼容性
6077

6178
* 在系统变量 [`character_set_client`](/system-variables.md#character_set_client)[`character_set_connection`](/system-variables.md#character_set_connection) 不同时设置为 `gbk` 的情况下,TiDB 处理非法字符的方式与 MySQL 一致。

sql-statements/sql-statement-show-character-set.md

+11-10
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,17 @@ SHOW CHARACTER SET;
2525
```
2626

2727
```
28-
+---------+---------------+-------------------+--------+
29-
| Charset | Description | Default collation | Maxlen |
30-
+---------+---------------+-------------------+--------+
31-
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
32-
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |
33-
| ascii | US ASCII | ascii_bin | 1 |
34-
| latin1 | Latin1 | latin1_bin | 1 |
35-
| binary | binary | binary | 1 |
36-
+---------+---------------+-------------------+--------+
37-
5 rows in set (0.00 sec)
28+
+---------+-------------------------------------+-------------------+--------+
29+
| Charset | Description | Default collation | Maxlen |
30+
+---------+-------------------------------------+-------------------+--------+
31+
| ascii | US ASCII | ascii_bin | 1 |
32+
| binary | binary | binary | 1 |
33+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
34+
| latin1 | Latin1 | latin1_bin | 1 |
35+
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
36+
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |
37+
+---------+-------------------------------------+-------------------+--------+
38+
6 rows in set (0.00 sec)
3839
```
3940

4041
```sql

0 commit comments

Comments
 (0)