This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Re: printing wchar_t*
> From: Vladimir Prus <ghost@cs.msu.su>
> Date: Mon, 17 Apr 2006 10:17:40 +0400
> Cc: pkoning@equallogic.com,
> gdb@sources.redhat.com
>
> On Friday 14 April 2006 21:10, Eli Zaretskii wrote:
>
> > > > If we want to support wchar_t arrays that store UTF-16, we will need
> > > > to add a feature to GDB to convert UTF-16 to the full UCS-4
> > > > codepoints, and output those.
> > >
> > > That's what I mentioned in a reply to Jim -- since the current string
> > > printing code operated "one wchar_t at a time", it's not suitable for
> > > outputing UTF-16 encoded wchar_t values to the user.
> >
> > I don't understand: if the wchar_t array holds a UTF-16 encoding, then
> > when you receive the entire string, you have a UTF-16 encoding of what
> > you want to display, and you yourself said that displaying a UTF-16
> > encoded string is easy for you. So where is the problem? is that only
> > that you cannot know the length of the UTF-16 encoded string? or is
> > there something else missing?
>
> For my frontend -- there's no problem, I can handle UTF-16 myself. However, if
> gdb is to ever produce output in UTF-8
We were talking about wchar_t and wide character strings, which UTF-8
isn't. Let's not confuse ourselves more than we already did. Adding
to GDB support for converting arbitrary encoded text into UTF-8 would
be a giant job.
> then it should handle surrogate pairs itself. Taking first and
> second element of surrogate pair and converting both to UTF-8, individually,
> won't work, for obvious reasons.
I don't think it's quite as ``obvious'' as you imply. Handling
surrogates is generally a job for a display engine, so a UTF-8 enabled
terminal could very well do it itself. I don't know if they actually
do that, though. But anyway, this is a different issue.