This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
backtrace through 'sleep', (1255 and 1253)
- From: Michael Elizabeth Chastain <mec at shout dot net>
- To: gdb at sources dot redhat dot com
- Date: Sat, 2 Aug 2003 11:18:28 -0400
- Subject: backtrace through 'sleep', (1255 and 1253)
Here's what I've learned so far.
This is the code for 'sleep' in /lib/i686/libc.so.6:
push %ebp
xor %ecx, %ecx
mov %esp, %ebp
push %edi
xor %edx, %edx
...
call __i686.get_pc_thunk.bx
add $0x7bfab, %ebx
sub $0x1cc, %esp
...
This is on a red hat linux 8 system, native i686-pc-linux-gnu.
This is C code, not hand-coded assembler! The "xor" instructions have been
mixed into the prologue. They are just setting some variables to zero.
The call to __i686.get_pc_thunk.bx comes from gcc -fpic.
Here is the code in i386_frame_cache:
frame_unwind_register (next_frame, I386_EBP_REGNUM, buf);
cache->base = extract_unsigned_integer (buf, 4);
if (cache->base == 0)
return cache;
cache->save_regs[I386_EIP_REGNUM] = 4;
cache->pc = frame_func_unwind (next_frame);
if (cache->pc != 0)
i386_analyze_prologue (cache->pc, frame_pc_unwind (next_frame), cache);
if (cache->locals < 0)
{
/* We didn't find a valid frame, which means that CACHE->base
currently holds the frame pointer for our calling frame. If
we're at the start of a function, or somewhere half-way its
prologue, the function's frame probably hasn't been fully
setup yet. Try to reconstruct the base address for the stack
frame by looking at the stack pointer. For truly "frameless"
functions this might work too. */
frame_unwind_register (next_frame, I386_ESP_REGNUM, buf);
cache->base = extract_unsigned_integer (buf, 4) + cache->sp_offset;
}
The etiology is:
The prologue analyzer fails on this function because of the
'xor %ecx, %ecx'.
So cache->locals == -1.
/* We didn't find a valid frame ... */
So the code behaves like it's in a frameless function. It grabs
the stack pointer and adds an offset to it and uses that for a frame.
Whereas, in reality, the pc is in the middle of 'sleep' (well past the
prologue), and there is a perfectly good frame. In fact if I undo the
bogus re-assignment to cache->base in this case then the stack trace
works fine.
Now, what to do about it ...
Red Hat Linux 8 has an rpm for a debug version of glibc. The
glibc-debug rpm installs libraries in /usr/lib/debug, rather than
overwriting /lib/i686. I installed glibc-debug and set LD_LIBRARY_PATH
to /usr/lib/debug, and it worked! The test cases in both gdb/1253 and
gdb/1255 both backtraced just fine!
Also, static-linking with glibc works, because the static version
of 'sleep' has different code (no -fpic) with a prologue that gdb
can digest.
So we can either:
. Document the problem and tell people to use a debugging glibc or
static-link their program. Also send a message to vendors that they may
want to make the debugging glibc the default glibc. Vendors may even
want to patch their gcc to not mix other instructions into the prologue,
because gdb is a lot more sensitive to un-analyzable prologues now.
. Ask the gcc guys directly to not schedule any instructions between
'push %ebp' and 'mov %esp, %ebp'.
. Change gdb so that the prologue reader is more powerful. It doesn't
take much to get through the 'xor %ecx, %ecx' instruction. The
trouble is that there could be a billion different instructions
in there ('mov any-register, immediate'). The advantage is that
this would work without any changes to external software.
. Do nothing, let the users suffer.
. Something else?
Michael C