This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

A case for `void *' for pointers to arbitrary (byte) buffers


   Date: Tue, 03 May 2005 10:42:46 -0400
   From: Andrew Cagney <cagney@gnu.org>

   ... and such debate belongs on gdb@.  Presumably Mark will respond to 
   your proposal.

OK, here it is.

The Problem
-----------

In many places GDB needs to pass along and pick apart blobs of memory.
In many cases, GDB does this using `char *' pointers.  Small blobs of
memory are often allocated on the stack as `char[]' or `unsigned
char[]'.  Unfortunately ISO C doesn't specify whether `char' is signed
or unsigned.  On most systems, char is signed, but on some systems
(most notably powerpc) it is unsigned.  This leads to many
platform-dependent type conversion bugs in code.  Since we have been
sloppy about the (un)signedness of char in the GDB codebase, there
probably are quite a few of these bugs in GDB.

Fortunately recent versions of GCC have started to warn about
suspicous type conversions.  But because of our sloppyness, the
current GDB doesn't compiler with -Werror on systems with this new
version of GCC.  Compiling without -Werror will hide many bugs in our
codebase, so we should fix this in GDB.


The proper pointer type for arbitrary buffers
---------------------------------------------

Looking at ISO C and in particular the ISO C standard library, we
notice that there are two families of functions that deal with blobs
of memory:

* Functions starting with mem, such as memcpy(), memset(), etc.  These
  functions all use the `void *' type to refer to the blobs of memory
  they operate on.

* Functions starting with str, such as strcpy(), strlen(), etc.  These
  functions don't really operate on arbitrary blobs of memory, but
  rather on NUL-terminated character strings.  They all use the `char
  *' type to refer to these strings.

So from a language standpoint of view we should use `char *' for
NUL-terminated sequences of bytes and `void *' for arbitrary sequences
of bytes.  A nice thing about `void *' is that you can implicitly cast
it to every other pointer type in the world.


Why doesn't GDB use `void *'?
-----------------------------

Well, it does, in some places.  But the vast majority of our codebase
predates ISO C, or at least GDB was supposed to be compiled on
compilers that predate ISO C.  We couldn't count on `void *' being
part of the standard.  And even after we made an ISO C compiler a
requirement for compiling GDB new code still followed the bad examples
from the K&R era.  And of course there is also some laziness involved.
One of the good things about `void *' pointers is that you can't do
any pointer arithmetic with them.  So if you want to do pointer
arithmetic (and GDB has some genuine need to do that) you'll have to
explicitly cast to the right pointer type.  This is a good thing.
This forces you to think about what you're doing.  In the cases where
it matters.


An example
----------

An example of a good GDB interface where the above applies is
extract_unsigned_integer().  Its full prototype is:

LONGEST extract_unsigned_integer (const void *addr, int len);

Its implementation converts the `void *' pointer into an `unsigned
char *' pointer before doing pointer arithmetic.


An example of a bad interface is read_memory().  Its full prototype is:

void read_memory (CORE_ADDR memaddr, char *myaddr, int len)

This function reads LEN bytes from MEMADDR and stores them in MYADDR.
But MYADDR is not necessarily a NUL-terminated string.  The second
argument should be `void *'.


Recently a few `struct value'-related functions were converted from
using `char *' to `bfd_byte *'.  If we take for example unpack_long():

LONGEST unpack_long(struct type *, const bfd_byte *valaddr);

Its interface is very similar to extract_unsigned_integer(), in fact
it is implemented using extract_unsigned_integer().  Therefore it
should use the same pointer type as extract_unsigned_integer(): `void
*'.


Do we need a `bfd_byte *' or `gdb_byte *'?
------------------------------------------

We might.  Many implementations will need to cast to some sort of byte
type to do arithmetic or look at individual bytes.  As we have seen,
using can be `char *' is problematic.  In most (problematic) cases we
really mean `unsigned char', only a few will warrant `signed char'.
And `bfd_byte *' is a nice abbreviation.  Personally I'm perfectly
happy with writing `unsigned char'.  IMHO it has the benefit that it
is perfectly clear that this type is unsigned, even to someone who has
never been exposed to the GDB codebase.


Why not use `xxx_byte *' instead of `void *'?
---------------------------------------------

* It's nonstandard.  Why do we need a nonstandard type if a perfectly
  god standard type is available?

* It doesn't really solve the issue.  It only propagates the problem
  one level up or down.  Since `xxx_byte *' is nothing but a typedef
  for `unsigned char *', someone calling a functions with `xxx_byte *'
  as one of its arguments with a `char *' argument will suffer from
  the warning that raised this entire issue; `void *' breaks the chain
  immediately.


Conclusion
----------

It's clear that we should work actively towards reserving the
unqualified `char *' for strings, and probably never ever use an
unqualified `char' as a data type (the C type of a single character is
`int').  Since GCC 4.0 has been released and will probably be used by
many in a few months time, we should solve this issue now, and
preferably in our complete codebase.  Hopefully we can get consesus
soon, and start taking this issue into account when we review patches.

Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]