This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Remove 0x005C conversion from __jisx0208_from_ucs4_lat1 for ISO-2022-JP


Ulrich Drepper wrote:
> The implemented behavior has been added on demand and changing this
> will break code.

Probably the demand was to map U+005C to a particular ISO-2022-JP character.
But what the glibc code currently does, is to map U+005C to one ISO-2022-JP
character (equivalent to U+005C) or to another ISO-2022-JP character
(equivalent to U+FF3C), depending on the preceding characters.

$ printf "\xe3\x81\x82\x5c" | /usr/bin/iconv -f utf-8 -t iso-2022-jp \
  | /usr/bin/iconv -f iso-2022-jp -t ucs-4le \
  | hexdump -e '"%06.6_ax  " 16/4 "%08X " "\n"'
000000  00003042 0000FF3C

$ printf "\xe3\x81\x82 \x5c" | /usr/bin/iconv -f utf-8 -t iso-2022-jp \
  | /usr/bin/iconv -f iso-2022-jp -t ucs-4le \
  | hexdump -e '"%06.6_ax  " 16/4 "%08X " "\n"'
000000  00003042 00000020 0000005C

If the demand was to map U+005C to FULLWIDTH SOLIDUS, the current behaviour
is incorrect. If the demand was to map U+005C to SOLIDUS, the current
behaviour is incorrect as well. Either way, it looks like an implementation
bug, not like a desired behaviour.

GNU libiconv, by the way, maps U+005C to SOLIDUS always.

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]