MySQL中的utf8mb4
和utf8
字符集有什么区别?
我已经知道 ASCII、UTF-8、UTF-16 和 UTF-32 编码;
但我很想知道 utf8mb4
组编码与 MySQL Server 中定义的其他编码类型有什么区别。
使用 utf8mb4
而不是 utf8
有什么特别的好处/建议吗?
最佳答案
UTF-8是一种变长编码。对于 UTF-8,这意味着存储一个代码点需要一到四个字节。但是,MySQL 的编码称为“utf8”(“utf8mb3”的别名)每个代码点最多只能存储三个字节。
所以字符集“utf8”/“utf8mb3”不能存储所有的Unicode码位:它只支持0x000到0xFFFF的范围,称为“Basic Multilingual Plane”。 另见 Comparison of Unicode encodings .
这是(同一页面的先前版本)the MySQL documentation不得不说:
The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:
For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.
因此,如果您希望您的列支持存储位于 BMP 之外的字符(并且您通常希望这样做),例如 emoji ,使用“utf8mb4”。另见 What are the most common non-BMP Unicode characters in actual use? .
https://stackoverflow.com/questions/30074492/