This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UTF-8 character encoding


On 6/26/18, Thomas Wolff  wrote:

> This encoding scheme is wrong; where did you get it from? Maybe it's the
> obsolete UTF-8...

http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt

I thought I saw something about utf-8 being able to handle a 31 bit
value..  is that also obsolete/wrong?

how about this for the current encoding scheme:
http://www.unicode.org/versions/Unicode11.0.0/ch03.pdf

Table 3-6.  UTF-8 Bit Distribution
Bits    Scalar Value               First Byte  Second Byte  Third Byte
 Fourth Byte
  7   00000000 0xxxxxxx            0xxxxxxx
 11   00000yyy yyxxxxxx            110yyyyy    10xxxxxx
 16   zzzzyyyy yyxxxxxx            1110zzzz    10yyyyyy     10xxxxxx
 21   000uuuuu zzzzyyyy yyxxxxxx   11110uuu    10uuzzzz     10yyyyyy    10xxxxxx

Lee

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]