This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Signed/unsigned character arrays are not strings

From: Jim Blandy <jimb at codesourcery dot com>
To: Mark Kettenis <mark dot kettenis at xs4all dot nl>
Cc: drow at false dot org, eliz at gnu dot org, dewar at adacore dot com, nickrob at snap dot net dot nz, jan dot kratochvil at redhat dot com, Mathieu dot Lacage at sophia dot inria dot fr, gdb at sourceware dot org
Date: Tue, 27 Feb 2007 16:42:31 -0800
Subject: Re: [RFC] Signed/unsigned character arrays are not strings
References: <17887.62990.937672.281975@kahikatea.snap.net.nz> <20070224161315.GA27534@caradoc.them.org> <17888.39894.136355.447008@kahikatea.snap.net.nz> <1172390381.2584.18.camel@mathieu> <20070225195350.GA12811@host0.dyn.jankratochvil.net> <20070226004457.GA9926@caradoc.them.org> <17892.4014.160191.285423@kahikatea.snap.net.nz> <45E42969.1030007@adacore.com> <20070227131442.GA20718@caradoc.them.org> <ulkij2tva.fsf@gnu.org> <20070227215316.GA26262@caradoc.them.org> <200702272211.l1RMBVvI028239@brahms.sibelius.xs4all.nl>

Okay, here's a horrible idea.  :)  With this patch:

$ cat chars.c
#include <stdio.h>
#include <stdint.h>

typedef char byte_t;

char *c = "chars";
unsigned char *uc = "unsigned chars";
signed char *sc = "signed chars";
byte_t *b = "bytes";
int8_t *i8 = "int8_t's";
uint8_t *ui8 = "uint8_t's";

int
main (int argc, char **argv)
{
  puts ("Hi!");
}
$ gcc -g chars.c -o chars
$ ~/uberbaum/build-cvs-out/gdb/gdb chars
GNU gdb 6.6.50.20070227-cvs
...
(gdb) print c
$1 = 0x8048450 "chars"
(gdb) print uc
$2 = (unsigned char *) 0x8048456 "unsigned chars"
(gdb) print sc
$3 = (signed char *) 0x8048465 "signed chars"
(gdb) print b
$4 = (byte_t *) 0x8048472
(gdb) print i8
$5 = (int8_t *) 0x8048478
(gdb) print ui8
$6 = (uint8_t *) 0x8048481
(gdb) start
Breakpoint 1 at 0x8048365: file chars.c, line 16.
Starting program: /home/jimb/play/chars 
main () at chars.c:16
16        puts ("Hi!");
(gdb) print $xmm0
$7 = {v4_float = {0, 0, 0, 0}, v2_double = {0, 0}, v16_int8 = {0 <repeats 16 times>}, v8_int16 = {
    0, 0, 0, 0, 0, 0, 0, 0}, v4_int32 = {0, 0, 0, 0}, v2_int64 = {0, 0}, 
  uint128 = 0x00000000000000000000000000000000}
(gdb) 

Because of the way C works, heuristics about what's textual and what's
numeric are inevitable here.  Whether or not you like the one I'm
suggesting, we should definitely consolidate the heuristic in one
place, so it's consistent.

If people like this, there are probably test suite changes needed.

gdb/ChangeLog:
2007-02-27  Jim Blandy  <jimb@codesourcery.com>

	* c-valprint.c (textual_element_type): New function.
	(c_val_print): Use textual_element_type to decide whether to print
	arrays, pointer types, and integer types as strings and characters.
	(c_value_print): Doc fix.

Index: gdb/c-valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/c-valprint.c,v
retrieving revision 1.42
diff -u -r1.42 c-valprint.c
--- gdb/c-valprint.c	26 Jan 2007 20:54:16 -0000	1.42
+++ gdb/c-valprint.c	28 Feb 2007 00:39:07 -0000
@@ -56,6 +56,56 @@
 }
 
 
+/* Return non-zero if an array of TYPE or a pointer to TYPE should be
+   printed as a textual string, or zero if it should be treated as an
+   array of /pointer to integers.  */
+static int
+textual_element_type (struct type *type)
+{
+  /* GDB doesn't use TYPE_CODE_CHAR for the C 'char' types; instead,
+     it uses one-byte TYPE_CODE_INT types, with TYPE_NAMEs like
+     "char", "unsigned char", etc. and appropriate flags.  For various
+     reasons, this works out well in some places.
+
+     But this means that we have no clear distinction between types
+     representing text and types representing one-byte integers, used
+     numerically.  It's not too uncommon for programs to use 'unsigned
+     char' and 'signed char' for text.
+
+     So, our heuristic is that, if a one-byte TYPE_CODE_INT has a
+     TYPE_NAME of "char" or something ending with " char", then we
+     treat it as text; otherwise, we assume it's being used as data.
+     This makes all our SIMD types like builtin_type_v8_int8 and the
+     <stdint.h> types like uint8_t print numerically, but all 'char'
+     types print textually.  Code which says what it means does
+     well.  */
+  struct type *true_type = check_typedef (type);
+
+  if (TYPE_CODE (true_type) == TYPE_CODE_CHAR)
+    return 1;
+
+  /* Is this a one-byte integer type?  */
+  if (TYPE_CODE (true_type) == TYPE_CODE_INT
+      && TYPE_LENGTH (true_type) == 1)
+    {
+      int name_len;
+      
+      /* All integer types should have names.  */
+      gdb_assert (TYPE_NAME (type));
+
+      name_len = strlen (TYPE_NAME (type));
+
+      /* Is the name "char", or does it end with " char"?  */
+      if (strcmp (TYPE_NAME (type), "char") == 0
+          || (name_len > 5
+              && strcmp (TYPE_NAME (type) + name_len - 5, " char") == 0))
+        return 1;
+    }
+
+  return 0;
+}
+
+
 /* Print data of type TYPE located at VALADDR (within GDB), which came from
    the inferior at address ADDRESS, onto stdio stream STREAM according to
    FORMAT (a letter or 0 for natural format).  The data at VALADDR is in
@@ -94,11 +144,9 @@
 	    {
 	      print_spaces_filtered (2 + 2 * recurse, stream);
 	    }
-	  /* For an array of chars, print with string syntax.  */
-	  if (eltlen == 1 &&
-	      ((TYPE_CODE (elttype) == TYPE_CODE_INT && TYPE_NOSIGN (elttype))
-	       || ((current_language->la_language == language_m2)
-		   && (TYPE_CODE (elttype) == TYPE_CODE_CHAR)))
+
+	  /* Print arrays of textual chars with a string syntax.  */
+          if (textual_element_type (TYPE_TARGET_TYPE (type))
 	      && (format == 0 || format == 's'))
 	    {
 	      /* If requested, look for the first null char and only print
@@ -191,12 +239,11 @@
 	      deprecated_print_address_numeric (addr, 1, stream);
 	    }
 
-	  /* For a pointer to char or unsigned char, also print the string
+	  /* For a pointer to a textual type, also print the string
 	     pointed to, unless pointer is null.  */
 	  /* FIXME: need to handle wchar_t here... */
 
-	  if (TYPE_LENGTH (elttype) == 1
-	      && TYPE_CODE (elttype) == TYPE_CODE_INT
+	  if (textual_element_type (TYPE_TARGET_TYPE (type))
 	      && (format == 0 || format == 's')
 	      && addr != 0)
 	    {
@@ -398,7 +445,7 @@
 	     Since we don't know whether the value is really intended to
 	     be used as an integer or a character, print the character
 	     equivalent as well. */
-	  if (TYPE_LENGTH (type) == 1)
+	  if (textual_element_type (type))
 	    {
 	      fputs_filtered (" ", stream);
 	      LA_PRINT_CHAR ((unsigned char) unpack_long (type, valaddr + embedded_offset),
@@ -500,7 +547,9 @@
       || TYPE_CODE (type) == TYPE_CODE_REF)
     {
       /* Hack:  remove (char *) for char strings.  Their
-         type is indicated by the quoted string anyway. */
+         type is indicated by the quoted string anyway.
+         (Don't use textual_element_type here; quoted strings
+         are always exactly (char *).  */
       if (TYPE_CODE (type) == TYPE_CODE_PTR
 	  && TYPE_NAME (type) == NULL
 	  && TYPE_NAME (TYPE_TARGET_TYPE (type)) != NULL

Follow-Ups:
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Jim Blandy

References:
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Nick Roberts
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Daniel Jacobowitz
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Nick Roberts
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: mathieu lacage
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Jan Kratochvil
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Daniel Jacobowitz
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Nick Roberts
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Robert Dewar
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Daniel Jacobowitz
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Eli Zaretskii
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Daniel Jacobowitz
- Re: [RFC] Signed/unsigned character arrays are not strings
  - From: Mark Kettenis

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]