This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC] Character set support


Two comments:
There's a lot of passing integers around to refer to a character. That doesn't make a lot of sense to me; we should either be passing
char *, so that we can decode multibyte sequences, or using wchar_t
explicitly and autoconfing for it.

I see hardcoded support for a couple of simplistic charsets; would it
be worthwhile to add (minimal!) support for UTF-8 in case iconv is not
available? Gcj is natively UTF-8, and I have some open Debian bug
reports about this.

Absolutely --- as I say in the comments to charset.c:

   At the moment, GDB only supports single-byte, stateless character
   sets.  This includes the ISO-8859 family (ASCII extended with
   accented characters, and (I think) Cyrillic, for European
   languages), and the EBCDIC family (used on IBM's mainframes).
   Unfortunately, it excludes many Asian scripts, the fixed- and
   variable-width Unicode encodings, and other desireable things.
   Patches are welcome!  (For example, it would be nice if the Java
   string support could simply get absorbed into some more general
   multi-byte encoding support.)
I think this should be mentioned in the documentation.

Andrew

But it seemed to me that supporting stateless variable-width encodings
was going to be a *lot* of work.  Specifically, how the printing code
should change was a bit beyond me.

Regarding `int' vs. `wchar_t': the wchar_t we could detect with
autoconf is a host type.  It has no necessary relationship to the
`wchar_t' on the target.  LONGEST might be a better choice than `int',
but `wchar_t' is worse.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]