UTF-8

UTF-8（ユーティーエフはち、ユーティーエフエイト）はISO/IEC 10646 (UCS) とUnicodeで使える8ビット符号単位（1–4バイトの可変長）の文字符号化形式および文字符号化スキーム。

正式名称は、ISO/IEC 10646では “UCS Transformation Format 8”、Unicodeでは “Unicode Transformation Format-8” という。両者はISO/IEC 10646とUnicodeのコード重複範囲で互換性がある。RFCにも仕様がある^[1]。

2バイト目以降に「/」などのASCII文字が現れないように工夫されていることから、UTF-FSS (File System Safe) ともいわれる。旧名称はUTF-2。

UTF-8は、データ交換方式・ファイル形式として一般的に使われる傾向にある。

当初は、ベル研究所においてPlan 9で用いるエンコードとして、ロブ・パイクによる設計指針のもと、ケン・トンプソンによって考案された^[2]^[3]。

^ RFC 3629 UTF-8, a transformation format of ISO 10646
^ RFC 3629 Page-3
^ Rob Pike's UTF-8 history

[1] RFC 3629 UTF-8, a transformation format of ISO 10646

[2] RFC 3629 Page-3

[3] Rob Pike's UTF-8 history

[1]

[2]

[3]

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

UTF-8

Our website is made possible by displaying online advertisements to our visitors.
Please consider supporting us by disabling your ad blocker.