This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: UTF-8 character encoding
On 6/26/18, Thomas Wolff wrote:
> This encoding scheme is wrong; where did you get it from? Maybe it's the
> obsolete UTF-8...
http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
I thought I saw something about utf-8 being able to handle a 31 bit
value.. is that also obsolete/wrong?
how about this for the current encoding scheme:
http://www.unicode.org/versions/Unicode11.0.0/ch03.pdf
Table 3-6. UTF-8 Bit Distribution
Bits Scalar Value First Byte Second Byte Third Byte
Fourth Byte
7 00000000 0xxxxxxx 0xxxxxxx
11 00000yyy yyxxxxxx 110yyyyy 10xxxxxx
16 zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx
21 000uuuuu zzzzyyyy yyxxxxxx 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
Lee
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple