This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support

From: Tom Tromey <tromey at redhat dot com>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: Julian Brown <julian at codesourcery dot com>, gdb-patches at sourceware dot org
Date: Thu, 15 Jan 2009 16:58:21 -0700
Subject: Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support
References: <20090115202411.5f154657@rex.config> <m3r634jneg.fsf@fleche.redhat.com> <Pine.LNX.4.64.0901152112030.15655@digraph.polyomino.org.uk>
Reply-to: tromey at redhat dot com

>>>>> "Joseph" == Joseph S Myers <joseph@codesourcery.com> writes:

Joseph> (Of course, now C++0x has and C1x has accepted (not yet in a
Joseph> draft) a lot of further new string syntax that Jakub has
Joseph> implemented for GCC 4.5.)

Yeah, I haven't looked at that yet.

Joseph> If you handle input of the new string syntax, do you also
Joseph> handle the interesting concatenation issues?  "\xab" L"c" is a
Joseph> wide string with two characters, L'\xab' and L'c' (plus the
Joseph> trailing NUL); you do not interpret '\xab' as a member of the
Joseph> target narrow character set and convert to the target wide
Joseph> character set (nor do you interpret it as L"\xabc", with a
Joseph> single escape sequence), so you can't convert escape sequences
Joseph> to bytes of a string until after you know whether the final
Joseph> string is narrow or wide (or some other variant, in
Joseph> C++0x/C1x).

I think my patch handles this correctly, though I have not written any
tests for it yet.

What I do is construct an OP_STRING in a new format.  This is done in
the C parser.  This format describes the resulting type, and then has
each sub-string included separately.  Some escape processing is done
in the lexer, but not everything, and in particular not \x.

Then, the C language overrides the interpretation of OP_STRING to do
its work.  This step converts the strings to the desired target
format.

This could all be done in the parser, of course, but I chose to defer
part of it to expression evaluation for a reason.  This approach gives
us the ability to use a single expression across multiple inferiors,
which may (in theory -- not practice, yet) have different
target-charset settings.

It does have another user-visible effect, which is that a string in a
breakpoint condition will change when the target-charset is changed.
I tend to think this is a feature.

Finally, my patch supports UCNs in strings and character literals,
though, I suspect, incorrectly.  I haven't dug into it.  In any case
the differences are only likely to be noticed in fairly unusual code.

Tom

References:
- [PATCH/WIP] C/C++ wchar_t/Unicode printing support
  - From: Julian Brown
- Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support
  - From: Tom Tromey
- Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]