This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: fail to attach to process on Solaris


Thank you very much for your response.

I did peek at /proc from the command line when the breakpoint in
find_procinfo_or_die() was hit - there was no corresponding LWP.  Nor,
does it seem, that any LWP with the same thread number ever existed.

Here's more extensive info, with a complete stack trace, and some
preliminary info printed after gdb attaches.  This time, the thread
number that triggered the problem was 65.

I've also filed this info under bug 13212.  I added
"puts(pi->pathname);" in procfs.c after line 725, to log creation of
procinfo_list entries.

(gdb) c
Continuing.
procfs: couldn't find pid 12276 (kernel thread 67) in procinfo list.
(gdb)

Apparently, the function 'procfs.c:find_procinfo_or_die()' fails, and
the
exception it throws forces gdb back to the command prompt.

There's some confusion about which threads are in LWP's, and which ones
aren't.
 When this failure occurs, this is the stack trace:
Breakpoint 1, find_procinfo_or_die (pid=16946, tid=65) at procfs.c:489
489           if (tid)
(gdb) whe
#0  find_procinfo_or_die (pid=16946, tid=65) at procfs.c:489
#1  0x000a1cd0 in procfs_fetch_registers (ops=0x7293d8,
regcache=0x71b1d0, 
    regnum=-1) at procfs.c:3483
#2  0x0012feec in sol_thread_fetch_registers (ops=0x718a70,
regcache=0x71b1d0, 
    regnum=-1) at sol-thread.c:457
#3  0x00231af0 in target_fetch_registers (regcache=0x71b1d0, regno=-1)
    at target.c:3417
#4  0x00130e48 in ps_lgetregs (ph=0x700998, lwpid=65,
gregset=0xffbfe37c)
    at sol-thread.c:923
#5  0xff0735dc in td_thr_getgregs () from /usr/lib/libthread_db.so.1
#6  0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70,
regcache=0x71b3b0, 
    regnum=68) at sol-thread.c:473
#7  0x00231af0 in target_fetch_registers (regcache=0x71b3b0, regno=68)
    at target.c:3417
#8  0x0016dd6c in regcache_raw_read (regcache=0x71b3b0, regnum=68, 
    buf=0xffbfe5c8 "") at regcache.c:604
#9  0x0016e5e0 in regcache_cooked_read (regcache=0x71b3b0, regnum=68, 
    buf=0xffbfe5c8 "") at regcache.c:695
#10 0x0016ea0c in regcache_cooked_read_unsigned (regcache=0x71b3b0,
regnum=68, 
    val=0xffbfe638) at regcache.c:746
#11 0x0016fb18 in regcache_read_pc (regcache=0x71b3b0) at regcache.c:990
#12 0x001f43bc in switch_to_thread (ptid=...) at thread.c:1005
#13 0x0008c024 in switch_to_program_space_and_thread (pspace=0x8264f0)
    at progspace.c:493
#14 0x0014e89c in insert_breakpoint_locations () at breakpoint.c:1895
#15 0x0014e7a4 in insert_breakpoints () at breakpoint.c:1856
#16 0x001da9ec in proceed (addr=18446744073709551615, 
    siggnal=TARGET_SIGNAL_DEFAULT, step=0) at infrun.c:2056
#17 0x001d1084 in continue_1 (all_threads=0) at infcmd.c:701
#18 0x001d1464 in continue_command (args=0x0, from_tty=1) at
infcmd.c:793
#19 0x000e276c in do_cfunc (c=0x7435a0, args=0x0, from_tty=1)
    at ./cli/cli-decode.c:67
#20 0x000e60f8 in cmd_func (cmd=0x7435a0, args=0x0, from_tty=1)
    at ./cli/cli-decode.c:1777
#21 0x0006cfa4 in execute_command (p=0x71a879 "", from_tty=1) at
top.c:428
#22 0x00201e38 in command_handler (command=0x71a878 "c") at
event-top.c:499
#23 0x00202844 in command_line_handler (rl=0x9e7ec8 "c") at
event-top.c:704
#24 0x00372098 in rl_callback_read_char () at callback.c:205
#25 0x00200fc0 in rl_callback_read_char_wrapper (client_data=0x0)
---Type <return> to continue, or q <return> to quit---
    at event-top.c:177
#26 0x00201c8c in stdin_event_handler (error=0, client_data=0x0)
    at event-top.c:434
#27 0x001ffc44 in handle_file_event (data=Cannot access memory at
address 0x0
) at event-loop.c:831
#28 0x001fed3c in process_event () at event-loop.c:402
#29 0x001feeb4 in gdb_do_one_event (data=0x0) at event-loop.c:467
#30 0x001f6c20 in catch_errors (func=0x1fed58 <gdb_do_one_event>, 
    func_args=0x0, errstring=0x5473d0 "", mask=6) at exceptions.c:521
#31 0x00104d48 in tui_command_loop (data=0x0) at ./tui/tui-interp.c:172
#32 0x001f7b20 in current_interp_command_loop () at interps.c:291
#33 0x0005defc in captured_command_loop (data=0x0) at ./main.c:228
#34 0x001f6c20 in catch_errors (func=0x5deec <captured_command_loop>, 
    func_args=0x0, errstring=0x5296e0 "", mask=6) at exceptions.c:521
#35 0x0005f750 in captured_main (data=0xffbff340) at ./main.c:936
#36 0x001f6c20 in catch_errors (func=0x5df48 <captured_main>, 
    func_args=0xffbff340, errstring=0x5296e0 "", mask=6) at
exceptions.c:521
#37 0x0005f794 in gdb_main (args=0xffbff340) at ./main.c:945
#38 0x0005d924 in main (argc=3, argv=0xffbff3b4) at gdb.c:35

Some relevant data:

(gdb) up 4
#4  0x00130e48 in ps_lgetregs (ph=0x700998, lwpid=65,
gregset=0xffbfe37c)
    at sol-thread.c:923
923       target_fetch_registers (regcache, -1);
(gdb) p *regcache
$1 = {descr = 0x84fc40, aspace = 0x7aa258, registers = 0x187c9d0 "", 
  register_status = 0x8e13f0 "", readonly_p = 0, ptid = {pid = 16946, 
    lwp = 65, tid = 0}}
(gdb) up 2
#6  0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70,
regcache=0x71b3b0, 
    regnum=68) at sol-thread.c:473
473       val = p_td_thr_getgregs (&thandle, gregset);
(gdb) p *regcache
$2 = {descr = 0x84fc40, aspace = 0x7aa258, registers = 0x846c48 "", 
  register_status = 0x14f37c0 "", readonly_p = 0, ptid = {pid = 16946, 
    lwp = 0, tid = 65}}

glenn.burkhardt $ ls /proc/16946/lwp/
1/   13/  17/  21/  25/  29/  32/  36/  4/   43/  47/  50/  54/  58/
62/  8/
10/  14/  18/  22/  26/  3/   33/  37/  40/  44/  48/  51/  55/  59/
63/  9/
11/  15/  19/  23/  27/  30/  34/  38/  41/  45/  49/  52/  56/  6/
66/
12/  16/  20/  24/  28/  31/  35/  39/  42/  46/  5/   53/  57/  60/  7/

And after issuing the 'attach' command, gdb prints:
[New process 16946]
[Thread debugging using libthread_db enabled]
/proc/16946/lwp/59
/proc/16946/lwp/3
/proc/16946/lwp/4
/proc/16946/lwp/5
/proc/16946/lwp/6
/proc/16946/lwp/7
/proc/16946/lwp/8
/proc/16946/lwp/9
/proc/16946/lwp/10
/proc/16946/lwp/11
/proc/16946/lwp/12
/proc/16946/lwp/13
/proc/16946/lwp/14
/proc/16946/lwp/15
/proc/16946/lwp/16
/proc/16946/lwp/17
/proc/16946/lwp/18
/proc/16946/lwp/19
/proc/16946/lwp/20
/proc/16946/lwp/21
/proc/16946/lwp/22
/proc/16946/lwp/23
/proc/16946/lwp/24
/proc/16946/lwp/25
/proc/16946/lwp/26
/proc/16946/lwp/27
/proc/16946/lwp/28
/proc/16946/lwp/29
/proc/16946/lwp/30
/proc/16946/lwp/31
/proc/16946/lwp/32
/proc/16946/lwp/33
/proc/16946/lwp/34
/proc/16946/lwp/35
/proc/16946/lwp/36
/proc/16946/lwp/37
/proc/16946/lwp/38
/proc/16946/lwp/39
/proc/16946/lwp/40
/proc/16946/lwp/41
/proc/16946/lwp/42
/proc/16946/lwp/43
/proc/16946/lwp/44
/proc/16946/lwp/45
/proc/16946/lwp/46
/proc/16946/lwp/47
/proc/16946/lwp/48
/proc/16946/lwp/49
/proc/16946/lwp/50
/proc/16946/lwp/51
/proc/16946/lwp/52
/proc/16946/lwp/53
/proc/16946/lwp/54
/proc/16946/lwp/55
/proc/16946/lwp/56
/proc/16946/lwp/57
/proc/16946/lwp/58
/proc/16946/lwp/60
/proc/16946/lwp/62
/proc/16946/lwp/63
/proc/16946/lwp/66
[New LWP    66        ]
[New LWP    63        ]
[New LWP    62        ]
[New LWP    60        ]
[New LWP    58        ]
[New LWP    57        ]
[New LWP    56        ]
[New LWP    55        ]
[New LWP    54        ]
[New LWP    53        ]
[New LWP    52        ]
[New LWP    51        ]
[New LWP    50        ]
[New LWP    49        ]
[New LWP    48        ]
[New LWP    47        ]
[New LWP    46        ]
[New LWP    45        ]
[New LWP    44        ]
[New LWP    43        ]
[New LWP    42        ]
[New LWP    41        ]
[New LWP    40        ]
[New LWP    39        ]
[New LWP    38        ]
[New LWP    37        ]
[New LWP    36        ]
[New LWP    35        ]
[New LWP    34        ]
[New LWP    33        ]
[New LWP    32        ]
[New LWP    31        ]
[New LWP    30        ]
[New LWP    29        ]
[New LWP    28        ]
[New LWP    27        ]
[New LWP    26        ]
[New LWP    25        ]
[New LWP    24        ]
[New LWP    23        ]
[New LWP    22        ]
[New LWP    21        ]
[New LWP    20        ]
[New LWP    19        ]
[New LWP    18        ]
[New LWP    17        ]
[New LWP    16        ]
[New LWP    15        ]
[New LWP    14        ]
[New LWP    13        ]
[New LWP    12        ]
[New LWP    11        ]
[New LWP    10        ]
[New LWP    9        ]
[New LWP    8        ]
[New LWP    7        ]
[New LWP    6        ]
[New LWP    5        ]
[New LWP    4        ]
[New LWP    3        ]
[New LWP    59        ]
[New Thread 1 (LWP 1)]
[New Thread 3        ]
[New Thread 4 (LWP 4)]
[New Thread 5 (LWP 5)]
[New Thread 6        ]
[New Thread 7 (LWP 7)]
[New Thread 8 (LWP 8)]
[New Thread 9        ]
[New Thread 10 (LWP 10)]
[New Thread 11 (LWP 11)]
[New Thread 12        ]
[New Thread 13 (LWP 13)]
[New Thread 14 (LWP 14)]
[New Thread 15        ]
[New Thread 16 (LWP 16)]
[New Thread 17 (LWP 17)]
[New Thread 18        ]
[New Thread 19 (LWP 19)]
[New Thread 20 (LWP 20)]
[New Thread 21        ]
[New Thread 22        ]
[New Thread 23        ]
[New Thread 24 (LWP 24)]
[New Thread 25        ]
[New Thread 26        ]
[New Thread 27        ]
[New Thread 28 (LWP 28)]
[New Thread 29 (LWP 29)]
[New Thread 30        ]
[New Thread 31 (LWP 31)]
[New Thread 32 (LWP 32)]
[New Thread 33 (LWP 33)]
[New Thread 34        ]
[New Thread 35        ]
[New Thread 36        ]
[New Thread 37        ]
[New Thread 38        ]
[New Thread 39        ]
[New Thread 40        ]
[New Thread 41        ]
[New Thread 42        ]
[New Thread 43 (LWP 43)]
[New Thread 44        ]
[New Thread 45 (LWP 45)]
[New Thread 46        ]
[New Thread 47        ]
[New Thread 48        ]
[New Thread 49        ]
[New Thread 50        ]
[New Thread 51        ]
[New Thread 52        ]
[New Thread 53        ]
[New Thread 54        ]
[New Thread 55        ]
[New Thread 56 (LWP 56)]
[New Thread 57        ]
[New Thread 58 (LWP 58)]
[New Thread 59 (LWP 59)]
[New Thread 60        ]
[New Thread 62        ]
[New Thread 63        ]
[New Thread 66 (LWP 66)]
[New Thread 2        ]
[New Thread 61        ]
[New Thread 64        ]
[New Thread 65        ]

So, thread 65 isn't executing in an LWP.  But the call to ps_lgetregs()
gets
made assuming that the registers from LWP 65 are wanted, instead of
thread 65. 
There's no LWP 65 on the procinfo_list, so the call fails.


> -----Original Message-----
> From: Pedro Alves [mailto:pedro@codesourcery.com] 
> Sent: Wednesday, September 21, 2011 10:26 AM
> To: Burkhardt, Glenn
> Cc: gdb@sourceware.org
> Subject: Re: fail to attach to process on Solaris
> 
> On Wednesday 21 September 2011 00:22:21, Burkhardt, Glenn wrote:
> > The problem appears that thread debug library has callback for 
> > register get operation that's connected to 
> > "sol-thread.c:ps_lgetregs()".  In the case that fails, the thread 
> > exists, but the calling sequence tries to lookup registers 
> for a LWP with the same ID as the thread.
> 
> This is Solaris 9, with the default 1:1 model thread library, right?
> 
> > #0  find_procinfo_or_die (pid=12276, tid=67) at procfs.c:489
> > #1  0x000a1cd0 in procfs_fetch_registers (ops=0x7293d8, 
> > regcache=0x71b1d0,
> >     regnum=-1) at procfs.c:3483
> > #2  0x0012feec in sol_thread_fetch_registers (ops=0x718a70, 
> > regcache=0x71b1d0,
> >     regnum=-1) at sol-thread.c:457
> > #3  0x00231af0 in target_fetch_registers 
> (regcache=0x71b1d0, regno=-1)
> >     at target.c:3417
> > #4  0x00130e48 in ps_lgetregs (ph=0x700998, lwpid=67,
> > gregset=0xffbfe37c)
> >     at sol-thread.c:923
> > #5  0xff0735dc in td_thr_getgregs () from /usr/lib/libthread_db.so.1
> > #6  0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70, 
> > regcache=0x71b3b0,
> >     regnum=68) at sol-thread.c:473
> 
> But what is the rest of the stack trace?  IOW, where's this 
> being called from?
> 
> > 
> > For this stack trace of 'gdb', 'sol_thread_fetch_registers()' is 
> > passed
> > 
> > (gdb) frame
> > #6  0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70, 
> > regcache=0x71b3b0,
> >     regnum=68) at sol-thread.c:473
> > 473       val = p_td_thr_getgregs (&thandle, gregset);
> > (gdb) p *regcache
> > $24 = {descr = 0x84fc40, aspace = 0x7aa258, registers = 
> 0x846c48 "", 
> >   register_status = 0x14f37c0 "", readonly_p = 0, ptid = 
> {pid = 12276, 
> >     lwp = 0, tid = 67}}
> > 
> > So it's looking for registers from a thread that's not 
> associated with 
> > an LWP.  But the function 'ps_lgetregs()' is always looking for the 
> > registers on the LWP list.
> > 
> > I can't see how the callback 'ps_lgetregs()' is connected to the 
> > thread debug library.  In fact, the documentation for the 
> thread debug 
> > library seems sparse.  I've only been able to find out 
> about it in the 
> > man pages and comments section of sol-thread.c  So any pointers to 
> > documentation would be helpful.
> 
> That's about all there is...  Luckily or not, glibc copied 
> the same interface out of Solaris, so people who understand 
> the Linux version can understand the Solaris' one with ease.  
> Older Solaris versions supported an M:N thread model, where 
> multiple user space threads would be mapped to the same 
> kernel thread (LWP), and sometimes even to no kernel thread 
> (LWP) (when they're idle).  libthread_db.so is a library the 
> system provides, that debuggers load into their own address 
> space, that serves as bridge between user threads, and 
> however they're mapped underneath.  So in this case, GDB 
> wants to fetch the registers of some thread.  It asks 
> libthread_db.so for its registers.  libthread_db.so 
> internally knows that that thread is mapped into LWP 67, and 
> to serve GDB's initial request, it needs to fetch the 
> registers of LWP 67.  libthread_db.so can't read registers 
> off of an LWP itself, but the debugger client can.  So 
> libthread_db.so calls back info the debugger through the 
> `ps_lgetregs' function of the proc_service interface (see man 
> ps_lgetregs).
> ps_lgetregs ends up recursing into 
> sol_thread_fetch_registers, but this time, inferior_ptid 
> points directly into an LWP, so we just pass the request 
> directly to the LWP support layer in procfs.c.  It's at this 
> point that things are failing for some reason.
> 
> So, next step would be understanding whether LWP 67 really 
> still exists or not at the failure point.  Can you find that 
> out peeking at /proc/... from the command line?  Maybe the 
> LWP had just exited while GDB was attaching to the process, 
> but GDB hadn't processed the exit event yet?  Or has GDB failed in the
> thread->lwp id mappings somewhere?
> 
> --
> Pedro Alves
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]