This is the mail archive of the gdb-patches@sourceware.cygnus.com mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
RFA: utils.c changes for converting to/from DOUBLEST

To: David Taylor <taylor at cygnus dot com>, gdb-patches at sourceware dot cygnus dot com
Subject: RFA: utils.c changes for converting to/from DOUBLEST
From: Kevin Buettner <kevinb at cygnus dot com>
Date: Mon, 17 Jan 2000 18:33:40 -0700
David, et.al,

I've been working on a port of gdb to a 64 bit architecture.  The
floating point registers for this architecture are 82 bits long.  I've
defined an appropriate floatformat struct for this architecture, but
found several bugs in the functions responsible for converting these
arbitrary sized floating point numbers to DOUBLEST (and back again).

Here is a summary of the problems:
    1) Buffer overruns and underruns.
    2) Little endian formats whose length in bits is not
       evenly divisible by eight ends up accessing the
       wrong set of bits.
    3) When extracting a field which is entirely contained
       in one byte, not all of the bits are zero'd that should
       be.

My patch appears in its entirety at the end of this message.  It is
somewhat complicated, so I will attempt to explain the various
problems and why my patch fixes these problems.

The fixes are in get_field() and put_field().  These functions are
called by floatformat_to_doublest() and floatformat_from_doublest()
(respectively) to extract (or insert) a bitfield of at most 32 bits
from/to a floating point number.

get_field() works by first computing a starting byte index (cur_byte)
and an amount to shift (cur_bitshift).  The index refers to a byte in
the floating point number to convert and will start off (always)
referring to a *partial* byte.  By this I mean that some (perhaps all)
of the bits will always need to be shifted out, leaving only the bits
forming the low part of the bitfield.  Even when the the starting byte
*could* refer to the actual full first byte of the field to be
extracted, it won't.  In this case, the algorithm will arrange for
cur_bitshift to be -8 which will cause the byte in question to be
discarded.  (If you're getting an uneasy feeling at this point, I
assure you that your unease is justified.)

Let us first examine the code that the original algorithm uses to
set cur_byte and cur_bitshift.  It looked like this:

  cur_byte = (start + len) / FLOATFORMAT_CHAR_BIT;
  if (order == floatformat_little || order == floatformat_littlebyte_bigword)
    cur_byte = (total_len / FLOATFORMAT_CHAR_BIT) - cur_byte - 1;
  cur_bitshift =
    ((start + len) % FLOATFORMAT_CHAR_BIT) - FLOATFORMAT_CHAR_BIT;

Here is the equivalent code in the revised algorithm (comments elided):

  if (order == floatformat_little || order == floatformat_littlebyte_bigword)
    {
      int excess = FLOATFORMAT_CHAR_BIT - (total_len % FLOATFORMAT_CHAR_BIT);
      cur_byte = (total_len / FLOATFORMAT_CHAR_BIT) 
                 - ((start + len + excess) / FLOATFORMAT_CHAR_BIT);
      cur_bitshift = ((start + len + excess) % FLOATFORMAT_CHAR_BIT) 
                     - FLOATFORMAT_CHAR_BIT;
    }
  else
    {
      cur_byte = (start + len) / FLOATFORMAT_CHAR_BIT;
      cur_bitshift =
        ((start + len) % FLOATFORMAT_CHAR_BIT) - FLOATFORMAT_CHAR_BIT;
    }

The first thing to note is that the manner in which cur_byte and
cur_bitshift is computed for big endian float formats remains the
same.

Now let's consider the computation of cur_byte for little endian
(and littlebyte_bigword) formats when total_len is evenly divisible
by 8.  (All existing floatformat structs have a total_len which is
evenly divisible by 8.)

When total_len is evenly divisible by 8, excess will be set to 8. 
(Note:  FLOATFORMAT_CHAR_BIT is defined to be 8.) The fact that excess
is 8 in this circumstance means that

        cur_byte = (total_len / FLOATFORMAT_CHAR_BIT)
                 - ((start + len + 8) / FLOATFORMAT_CHAR_BIT

        =>

        cur_byte = (total_len / FLOATFORMAT_CHAR_BIT)
                 - (((start + len) / FLOATFORMAT_CHAR_BIT + 1)

        =>

        cur_byte = (total_len / FLOATFORMAT_CHAR_BIT)
                 - ((start + len) / FLOATFORMAT_CHAR_BIT - 1

If we choose, we could rewrite this as two assignment statements:

        cur_byte = (start + len) / FLOATFORMAT_CHAR_BIT;
        cur_byte = (total_len / FLOATFORMAT_CHAR_BIT) - cur_byte - 1;

In other words, cur_byte will be computed the same way in both the
original and revised algorithms when total_len is evenly divisible by
8.

Now let's do a similar analysis of cur_bitshift when total_len
is evenly divisible by 8... Again excess will be 8.  This means
that

      cur_bitshift = ((start + len + 8) % FLOATFORMAT_CHAR_BIT) 
                     - FLOATFORMAT_CHAR_BIT;

But since FLOATFORMAT_CHAR_BIT is defined to be 8, we can rewrite
the above as

      cur_bitshift = ((start + len) % FLOATFORMAT_CHAR_BIT) 
                     - FLOATFORMAT_CHAR_BIT;

Note that this is the same statement which was used to compute
cur_bitshift in the original algorithm.

Okay...  we've verified that both the original and revised algorithms
will function the same when working on big endian floats.  We've also
verified that they will operate the same way for little endian floats
whose length in bits is evenly divisible by 8.  This means that this
portion of the revised algorithm will continue to function as it
always has for all existing floatformats.

So what happens for little endian floatformats when total_len is not 
evenly divisible by eight?  Let us consider a simple example...

    Example 1: Let order == floatformat_little, total_len == 14,
    start == 1 and len == 2.

    This means that we wish to extract two bits as shown
    in the following picture (Note that the bit numbering used by the
    algorithm has bit zero referring to the most significant bit; big
    endian formats have a very regular structure with bit 0 being on
    the far left; OTOH, little endian is somewhat messy):

         Byte 0             Byte 1
    | 1 1 1 1         |                 |
    | 3 2 1 0 9 8 7 6 | 5 4 3 2 1 0 _ _ |
                              -+-
                               |
                               +-- bits to extract

    The original algorithm would cause cur_byte to be set to 0 and
    cur_bitshift to be set to -5.  This means that byte 0 is fetched
    and the contents will be shifted right by 5.  This is plainly
    wrong because a) The wrong byte is being extracted from.  b) Even
    if it were the correct byte, the shift value is incorrect.  c)
    Assuming that both a and b were okay, the high bits after the
    shift are being retained.

    The revised algorithm causes cur_byte to be set to 1 and
    cur_bitshift to be set to -3.  This means that byte 1 is fetched
    and the contents are shifted right by 3 (which is correct). 
    Zeroing of unwanted bits is taken care of later on in the revised
    algorithm.

Let us now turn our attention to the fetching of the first byte.
The original algorithm did it like this:

  result = *(data + cur_byte) >> (-cur_bitshift);

Remember that uneasy feeling you had (or should have had) before?
I'll now give two examples to help perhaps explain that unease.

    Example 2: Let order=floatformat_big, total_len=16, start=0,
    len=16.  We can illustrate the situation with the following
    picture:

         Byte 0             Byte 1
    |                 |     1 1 1 1 1 1 |
    | 0 1 2 3 4 5 6 7 | 8 9 0 1 2 3 4 5 |
      -------------------------+-------
                               |
                               +-- bits to extract

    cur_byte will be 2 for this example and cur_bitshift will be -8.
    But byte 2 doesn't appear in the picture!  Well, suppose we extract
    it anyway.  When we do, we'll just shift it right by 8 (--8 == 8)
    which means that it'll get entirely shifted away!

    Example 3: Let order=floatformat_little, total_len=16, start=0,
    len=16.  (Same as example 2, but with little endian byte order.)
    This has a similar looking picture.

         Byte 0             Byte 1
    | 1 1 1 1 1 1     |                 |
    | 5 4 3 2 1 0 9 8 | 7 6 5 4 3 2 1 0 |
      -------------------------+-------
                               |
                               +-- bits to extract

    In this case (regardless of which algorithm is used), cur_byte
    will be -1.  Sort of.  cur_byte is unsigned, so it'll actually
    be the largest unsigned integer possible.  On machines which
    have the property that 
    
        sizeof(unsigned int) == sizeof (unsigned char *),

    this code will probably work.  The byte immediately before byte 0
    will be read, even though this is technically a buffer underrun. 
    (Just as example 2 illustrates an overrun.) On a machine where
    sizeof(unsigned int) == 4 and sizeof(unsigned char *) == 8, the
    above code is disasterous.  The machine will attempt to fetch
    
        data + 4294967296 
        
    which will likely result in a segmentation violation.  (It did on
    the platform that I'm porting to.)

Okay, so we have overruns and underruns.  How do we fix it?  When
this case occurs, cur_bitshift will always be -8 which means that
*if* the data in question could be fetched safely, it would be
shifted into nothingness anyway.  Here is how we do it safely in
the revised algorithm:

  if (cur_bitshift > -FLOATFORMAT_CHAR_BIT)
    result = *(data + cur_byte) >> (-cur_bitshift);
  else
    result = 0;

I.e, we simply avoid fetching the over/underrun byte and zero out
our result, since the byte would be shifted into nothingness
anyway.

The original algorithm contains the following lines which appear in
the while loop for fetching the bytes after the first byte:

      if (len - cur_bitshift < FLOATFORMAT_CHAR_BIT)
        /* This is the last byte; zero out the bits which are not part of
           this field.  */
        result |=
          (*(data + cur_byte) & ((1 << (len - cur_bitshift)) - 1))
          << cur_bitshift;
      else

As the comment says, these lines are supposed to handle the zeroing of
the unwanted bits.  And in fact it does (if it ever gets into the
loop).  But as we saw in example 1, this code does not account for
what happens when the bit field to extract doesn't straddle two (or
more) bytes.  In order to account for both cases, I simply moved the
zeroing of unwanted bits after the while loop like so:

  if (len < sizeof(result) * FLOATFORMAT_CHAR_BIT)
    /* Mask out bits which are not part of the field */
    result &= ((1UL << len) - 1);

The changes to put_field() are very similar to those made to
get_field(), with the exception that the problem of zeroing unwanted
bits doesn't exist.  Therefore I won't give a blow-by-blow account. 
(Unless you really want me to.)

I have tested this patch on Linux/x86 and Solaris/sparc and see no new
regressions.  (I chose these two because they're little endian and big
endian respectively.)

I request approval for committing these changes.

	* utils.c (get_field, put_field): Fix buffer underruns and
	overruns.  Also, handle case where total_len is not evenly
	divisible by 8.
	(getfield): Make sure zeroing of unwanted bits occurs even
	when bit field to extract does not straddle two or more
	bytes.

Index: utils.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gdb/utils.c,v
retrieving revision 1.229
diff -u -p -r1.229 utils.c
--- utils.c	1999/12/14 12:40:46	1.229
+++ utils.c	2000/01/17 23:53:39
@@ -3203,12 +3203,31 @@ get_field (data, order, total_len, start
   int cur_bitshift;
 
   /* Start at the least significant part of the field.  */
-  cur_byte = (start + len) / FLOATFORMAT_CHAR_BIT;
   if (order == floatformat_little || order == floatformat_littlebyte_bigword)
-    cur_byte = (total_len / FLOATFORMAT_CHAR_BIT) - cur_byte - 1;
-  cur_bitshift =
-    ((start + len) % FLOATFORMAT_CHAR_BIT) - FLOATFORMAT_CHAR_BIT;
-  result = *(data + cur_byte) >> (-cur_bitshift);
+    {
+      /* We start counting from the other end (i.e, from the high bytes
+	 rather than the low bytes).  As such, we need to be concerned
+	 with what happens if bit 0 doesn't start on a byte boundary. 
+	 I.e, we need to properly handle the case where total_len is
+	 not evenly divisible by 8.  So we compute ``excess'' which
+	 represents the number of bits from the end of our starting
+	 byte needed to get to bit 0. */
+      int excess = FLOATFORMAT_CHAR_BIT - (total_len % FLOATFORMAT_CHAR_BIT);
+      cur_byte = (total_len / FLOATFORMAT_CHAR_BIT) 
+                 - ((start + len + excess) / FLOATFORMAT_CHAR_BIT);
+      cur_bitshift = ((start + len + excess) % FLOATFORMAT_CHAR_BIT) 
+                     - FLOATFORMAT_CHAR_BIT;
+    }
+  else
+    {
+      cur_byte = (start + len) / FLOATFORMAT_CHAR_BIT;
+      cur_bitshift =
+	((start + len) % FLOATFORMAT_CHAR_BIT) - FLOATFORMAT_CHAR_BIT;
+    }
+  if (cur_bitshift > -FLOATFORMAT_CHAR_BIT)
+    result = *(data + cur_byte) >> (-cur_bitshift);
+  else
+    result = 0;
   cur_bitshift += FLOATFORMAT_CHAR_BIT;
   if (order == floatformat_little || order == floatformat_littlebyte_bigword)
     ++cur_byte;
@@ -3218,20 +3237,16 @@ get_field (data, order, total_len, start
   /* Move towards the most significant part of the field.  */
   while (cur_bitshift < len)
     {
-      if (len - cur_bitshift < FLOATFORMAT_CHAR_BIT)
-	/* This is the last byte; zero out the bits which are not part of
-	   this field.  */
-	result |=
-	  (*(data + cur_byte) & ((1 << (len - cur_bitshift)) - 1))
-	  << cur_bitshift;
-      else
-	result |= *(data + cur_byte) << cur_bitshift;
+      result |= (unsigned long)*(data + cur_byte) << cur_bitshift;
       cur_bitshift += FLOATFORMAT_CHAR_BIT;
       if (order == floatformat_little || order == floatformat_littlebyte_bigword)
 	++cur_byte;
       else
 	--cur_byte;
     }
+  if (len < sizeof(result) * FLOATFORMAT_CHAR_BIT)
+    /* Mask out bits which are not part of the field */
+    result &= ((1UL << len) - 1);
   return result;
 }
 
@@ -3368,15 +3383,28 @@ put_field (data, order, total_len, start
   int cur_bitshift;
 
   /* Start at the least significant part of the field.  */
-  cur_byte = (start + len) / FLOATFORMAT_CHAR_BIT;
   if (order == floatformat_little || order == floatformat_littlebyte_bigword)
-    cur_byte = (total_len / FLOATFORMAT_CHAR_BIT) - cur_byte - 1;
-  cur_bitshift =
-    ((start + len) % FLOATFORMAT_CHAR_BIT) - FLOATFORMAT_CHAR_BIT;
-  *(data + cur_byte) &=
-    ~(((1 << ((start + len) % FLOATFORMAT_CHAR_BIT)) - 1) << (-cur_bitshift));
-  *(data + cur_byte) |=
-    (stuff_to_put & ((1 << FLOATFORMAT_CHAR_BIT) - 1)) << (-cur_bitshift);
+    {
+      int excess = FLOATFORMAT_CHAR_BIT - (total_len % FLOATFORMAT_CHAR_BIT);
+      cur_byte = (total_len / FLOATFORMAT_CHAR_BIT) 
+                 - ((start + len + excess) / FLOATFORMAT_CHAR_BIT);
+      cur_bitshift = ((start + len + excess) % FLOATFORMAT_CHAR_BIT) 
+                     - FLOATFORMAT_CHAR_BIT;
+    }
+  else
+    {
+      cur_byte = (start + len) / FLOATFORMAT_CHAR_BIT;
+      cur_bitshift =
+	((start + len) % FLOATFORMAT_CHAR_BIT) - FLOATFORMAT_CHAR_BIT;
+    }
+  if (cur_bitshift > -FLOATFORMAT_CHAR_BIT)
+    {
+      *(data + cur_byte) &=
+	~(((1 << ((start + len) % FLOATFORMAT_CHAR_BIT)) - 1)
+	  << (-cur_bitshift));
+      *(data + cur_byte) |=
+	(stuff_to_put & ((1 << FLOATFORMAT_CHAR_BIT) - 1)) << (-cur_bitshift);
+    }
   cur_bitshift += FLOATFORMAT_CHAR_BIT;
   if (order == floatformat_little || order == floatformat_littlebyte_bigword)
     ++cur_byte;
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]