41から始めました

文字通り41歳から始めたブログです。DB(MySQL)を使ってお仕事してるので、DB周りの話を中心に最近始めたこととかをTwitterのノリで書いています。なお、本サイトにおいて示されている見解は私個人の見解であり、所属団体や組織を代表するものではありません。

文字コードとデフォルトの照合順序とPAD、それとutf8mb4のそれ

自分用のメモです

最近、照合順序の話を良くしている気がするので、これが見たくなる時があるので自分向けに。

文字コードとデフォルトの照合順序とPAD

mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME, IS_COMPILED, SORTLEN, PAD_ATTRIBUTE FROM information_schema.COLLATIONS WHERE IS_DEFAULT = 'Yes';
+--------------------+---------------------+-------------+---------+---------------+
| CHARACTER_SET_NAME | COLLATION_NAME      | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
+--------------------+---------------------+-------------+---------+---------------+
| armscii8           | armscii8_general_ci | Yes         |       1 | PAD SPACE     |
| ascii              | ascii_general_ci    | Yes         |       1 | PAD SPACE     |
| big5               | big5_chinese_ci     | Yes         |       1 | PAD SPACE     |
| binary             | binary              | Yes         |       1 | NO PAD        |
| cp1250             | cp1250_general_ci   | Yes         |       1 | PAD SPACE     |
| cp1251             | cp1251_general_ci   | Yes         |       1 | PAD SPACE     |
| cp1256             | cp1256_general_ci   | Yes         |       1 | PAD SPACE     |
| cp1257             | cp1257_general_ci   | Yes         |       1 | PAD SPACE     |
| cp850              | cp850_general_ci    | Yes         |       1 | PAD SPACE     |
| cp852              | cp852_general_ci    | Yes         |       1 | PAD SPACE     |
| cp866              | cp866_general_ci    | Yes         |       1 | PAD SPACE     |
| cp932              | cp932_japanese_ci   | Yes         |       1 | PAD SPACE     |
| dec8               | dec8_swedish_ci     | Yes         |       1 | PAD SPACE     |
| eucjpms            | eucjpms_japanese_ci | Yes         |       1 | PAD SPACE     |
| euckr              | euckr_korean_ci     | Yes         |       1 | PAD SPACE     |
| gb18030            | gb18030_chinese_ci  | Yes         |       2 | PAD SPACE     |
| gb2312             | gb2312_chinese_ci   | Yes         |       1 | PAD SPACE     |
| gbk                | gbk_chinese_ci      | Yes         |       1 | PAD SPACE     |
| geostd8            | geostd8_general_ci  | Yes         |       1 | PAD SPACE     |
| greek              | greek_general_ci    | Yes         |       1 | PAD SPACE     |
| hebrew             | hebrew_general_ci   | Yes         |       1 | PAD SPACE     |
| hp8                | hp8_english_ci      | Yes         |       1 | PAD SPACE     |
| keybcs2            | keybcs2_general_ci  | Yes         |       1 | PAD SPACE     |
| koi8r              | koi8r_general_ci    | Yes         |       1 | PAD SPACE     |
| koi8u              | koi8u_general_ci    | Yes         |       1 | PAD SPACE     |
| latin1             | latin1_swedish_ci   | Yes         |       1 | PAD SPACE     |
| latin2             | latin2_general_ci   | Yes         |       1 | PAD SPACE     |
| latin5             | latin5_turkish_ci   | Yes         |       1 | PAD SPACE     |
| latin7             | latin7_general_ci   | Yes         |       1 | PAD SPACE     |
| macce              | macce_general_ci    | Yes         |       1 | PAD SPACE     |
| macroman           | macroman_general_ci | Yes         |       1 | PAD SPACE     |
| sjis               | sjis_japanese_ci    | Yes         |       1 | PAD SPACE     |
| swe7               | swe7_swedish_ci     | Yes         |       1 | PAD SPACE     |
| tis620             | tis620_thai_ci      | Yes         |       4 | PAD SPACE     |
| ucs2               | ucs2_general_ci     | Yes         |       1 | PAD SPACE     |
| ujis               | ujis_japanese_ci    | Yes         |       1 | PAD SPACE     |
| utf16              | utf16_general_ci    | Yes         |       1 | PAD SPACE     |
| utf16le            | utf16le_general_ci  | Yes         |       1 | PAD SPACE     |
| utf32              | utf32_general_ci    | Yes         |       1 | PAD SPACE     |
| utf8mb3            | utf8mb3_general_ci  | Yes         |       1 | PAD SPACE     |
| utf8mb4            | utf8mb4_0900_ai_ci  | Yes         |       0 | NO PAD        |
+--------------------+---------------------+-------------+---------+---------------+
41 rows in set (0.00 sec)

utf8mb4の照合順序とPADの一覧

mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME, IS_COMPILED, SORTLEN, PAD_ATTRIBUTE FROM information_schema.COLLATIONS WHERE CHARACTER_SET_NAME = 'utf8mb4';
+--------------------+----------------------------+-------------+---------+---------------+
| CHARACTER_SET_NAME | COLLATION_NAME             | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
+--------------------+----------------------------+-------------+---------+---------------+
| utf8mb4            | utf8mb4_general_ci         | Yes         |       1 | PAD SPACE     |
| utf8mb4            | utf8mb4_bin                | Yes         |       1 | PAD SPACE     |
| utf8mb4            | utf8mb4_unicode_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_icelandic_ci       | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_latvian_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_romanian_ci        | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_slovenian_ci       | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_polish_ci          | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_estonian_ci        | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_spanish_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_swedish_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_turkish_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_czech_ci           | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_danish_ci          | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_lithuanian_ci      | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_slovak_ci          | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_spanish2_ci        | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_roman_ci           | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_persian_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_esperanto_ci       | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_hungarian_ci       | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_sinhala_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_german2_ci         | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_croatian_ci        | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_unicode_520_ci     | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_vietnamese_ci      | Yes         |       8 | PAD SPACE     |
| utf8mb4            | utf8mb4_0900_ai_ci         | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_de_pb_0900_ai_ci   | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_is_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_lv_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_ro_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sl_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_pl_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_et_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_es_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sv_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_tr_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_cs_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_da_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_lt_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sk_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_es_trad_0900_ai_ci | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_la_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_eo_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_hu_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_hr_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_vi_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_0900_as_cs         | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_de_pb_0900_as_cs   | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_is_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_lv_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_ro_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sl_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_pl_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_et_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_es_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sv_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_tr_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_cs_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_da_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_lt_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sk_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_es_trad_0900_as_cs | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_la_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_eo_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_hu_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_hr_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_vi_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_ja_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_ja_0900_as_cs_ks   | Yes         |      24 | NO PAD        |
| utf8mb4            | utf8mb4_0900_as_ci         | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_ru_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_ru_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_zh_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_0900_bin           | Yes         |       1 | NO PAD        |
| utf8mb4            | utf8mb4_nb_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_nb_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_nn_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_nn_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sr_latn_0900_ai_ci | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_sr_latn_0900_as_cs | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_bs_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_bs_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_bg_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_bg_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_gl_0900_ai_ci      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_gl_0900_as_cs      | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_mn_cyrl_0900_ai_ci | Yes         |       0 | NO PAD        |
| utf8mb4            | utf8mb4_mn_cyrl_0900_as_cs | Yes         |       0 | NO PAD        |
+--------------------+----------------------------+-------------+---------+---------------+
89 rows in set (0.00 sec)

ドキュメント

dev.mysql.com