This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UTF-8 character encoding

Am 25.06.2018 um 20:33 schrieb Lee:
On 6/24/18, L A Walsh <> wrote:
Lee wrote:
So... keep it simple, set
and use vi or something else that comes with cygwin to create the file
and I'll have a file with UTF-8 character encoding - correct?
	The first 127 characters of UTF-8 are identical to the
first 127 characters of ASCII, and latin1 and iso-8859-1.

If you don't use any characters that need accents or special symbols,
then nothing will be encoded in UTF-8, because its only
the characters OVER the first 127
(see chart @
I'm still trying to figure utf-8 out, but it seems to me that 0x0 -
0xff is part of the utf-8 encoding.  This chart makes things clearer
... at least for me :)
  The proposed UCS transformation format encodes UCS values in the range
  [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5
  bytes.  For all encodings of more than one byte, the initial byte
  determines the number of bytes used and the high-order bit in each byte
  is set.

  An easy way to remember this transformation format is to note that the
  number of high-order 1's in the first byte is the same as the number of
  subsequent bytes in the multibyte character:

     Bits  Hex Min  Hex Max         Byte Sequence in Binary
  1    7  00000000 0000007f 0zzzzzzz
  2   13  00000080 0000207f 10zzzzzz 1yyyyyyy
  3   19  00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx
  4   25  00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww
  5   31  02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv
This encoding scheme is wrong; where did you get it from? Maybe it's the obsolete UTF-8...

Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]