This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

backtrace through 'sleep', (1255 and 1253)


Here's what I've learned so far.

This is the code for 'sleep' in /lib/i686/libc.so.6:

  push %ebp
  xor  %ecx, %ecx
  mov  %esp, %ebp
  push %edi
  xor  %edx, %edx
  ...
  call __i686.get_pc_thunk.bx
  add  $0x7bfab, %ebx
  sub  $0x1cc, %esp
  ...

This is on a red hat linux 8 system, native i686-pc-linux-gnu.

This is C code, not hand-coded assembler!  The "xor" instructions have been
mixed into the prologue.  They are just setting some variables to zero.
The call to __i686.get_pc_thunk.bx comes from gcc -fpic.

Here is the code in i386_frame_cache:

  frame_unwind_register (next_frame, I386_EBP_REGNUM, buf);
  cache->base = extract_unsigned_integer (buf, 4);
  if (cache->base == 0)
    return cache;

  cache->save_regs[I386_EIP_REGNUM] = 4;

  cache->pc = frame_func_unwind (next_frame);
  if (cache->pc != 0)
    i386_analyze_prologue (cache->pc, frame_pc_unwind (next_frame), cache);

  if (cache->locals < 0)
    {
      /* We didn't find a valid frame, which means that CACHE->base
         currently holds the frame pointer for our calling frame.  If
         we're at the start of a function, or somewhere half-way its
         prologue, the function's frame probably hasn't been fully
         setup yet.  Try to reconstruct the base address for the stack
         frame by looking at the stack pointer.  For truly "frameless"
         functions this might work too.  */

      frame_unwind_register (next_frame, I386_ESP_REGNUM, buf);
      cache->base = extract_unsigned_integer (buf, 4) + cache->sp_offset;
    }

The etiology is:

  The prologue analyzer fails on this function because of the 
  'xor %ecx, %ecx'.

  So cache->locals == -1.

  /* We didn't find a valid frame ... */

  So the code behaves like it's in a frameless function.  It grabs
  the stack pointer and adds an offset to it and uses that for a frame.

Whereas, in reality, the pc is in the middle of 'sleep' (well past the
prologue), and there is a perfectly good frame.  In fact if I undo the
bogus re-assignment to cache->base in this case then the stack trace
works fine.

Now, what to do about it ...

Red Hat Linux 8 has an rpm for a debug version of glibc.  The
glibc-debug rpm installs libraries in /usr/lib/debug, rather than
overwriting /lib/i686.  I installed glibc-debug and set LD_LIBRARY_PATH
to /usr/lib/debug, and it worked!  The test cases in both gdb/1253 and
gdb/1255 both backtraced just fine!

Also, static-linking with glibc works, because the static version
of 'sleep' has different code (no -fpic) with a prologue that gdb
can digest.

So we can either:

. Document the problem and tell people to use a debugging glibc or
  static-link their program.  Also send a message to vendors that they may
  want to make the debugging glibc the default glibc.  Vendors may even
  want to patch their gcc to not mix other instructions into the prologue,
  because gdb is a lot more sensitive to un-analyzable prologues now.

. Ask the gcc guys directly to not schedule any instructions between
  'push %ebp' and 'mov %esp, %ebp'.

. Change gdb so that the prologue reader is more powerful.  It doesn't
  take much to get through the 'xor %ecx, %ecx' instruction.  The
  trouble is that there could be a billion different instructions
  in there ('mov any-register, immediate').  The advantage is that
  this would work without any changes to external software.

. Do nothing, let the users suffer.

. Something else?

Michael C


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]