This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Async signal safe TLS accesses
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: Paul Pluzhnikov <ppluzhnikov at google dot com>
- Cc: Andrew Hunter <ahh at google dot com>, Rich Felker <dalias at aerifal dot cx>, GNU C Library <libc-alpha at sourceware dot org>, <allan at archlinux dot org>, Carlos O'Donell <carlos at redhat dot com>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
- Date: Thu, 9 Jan 2014 02:40:27 +0000
- Subject: Re: [PATCH] Async signal safe TLS accesses
- Authentication-results: sourceware.org; auth=none
- References: <52C4DC54 dot 4000109 at redhat dot com> <1388689454-1854-1-git-send-email-ahh at google dot com> <CALoOobPio5625ws7dSWepgQbKmqHifvbU3tKWtKFS-tz_zihdQ at mail dot gmail dot com> <CADroS=7BBPbJ5bAUUy5VUWHX+gCrRmrEk17qO-s9zkdVNeFbxA at mail dot gmail dot com> <20140103074522 dot GT24286 at brightrain dot aerifal dot cx> <CADroS=49b8c8KCiNF2cHHRk5nPmy8LzYYF_x=GZfOCCQORkx8A at mail dot gmail dot com> <CALoOobNz=FzbSkJdPMFwqnFdpyNcAy8vDDEftj+vbMT5r8mJAw at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1401081752130 dot 1349 at digraph dot polyomino dot org dot uk> <CALoOobM6R+ua_0ffxRdaS_h69oUJ_+CoidxvLi+U_tdvJZY3dg at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1401082122230 dot 8625 at digraph dot polyomino dot org dot uk> <CALoOobMWsgbAjupv7Cj0-Xz0ND+TNinj26TquvEwZXM+BjfgiA at mail dot gmail dot com>
On Wed, 8 Jan 2014, Paul Pluzhnikov wrote:
> On Wed, Jan 8, 2014 at 2:04 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>
> > The problem doesn't reproduce under GDB. With a core dump I get the not
> > particularly helpful backtrace:
> >
> > Core was generated by `./tst-tls7 --direct'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0 0x0fdef838 in ?? ()
> > (gdb) bt
> > #0 0x0fdef838 in ?? ()
> > #1 <signal handler called>
> > #2 0x0fe999e0 in __GI___libc_malloc (bytes=bytes@entry=128) at malloc.c:2900
> > #3 0x100016e4 in spin (ignored=<optimized out>) at tst-tls7.c:35
> > #4 0x0ffa8d0c in start_thread (arg=0xf6a8b470) at pthread_create.c:311
> > #5 0x0fefe704 in clone ()
> > at ../sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S:102
>
> Where line 2900 is this one, right?
Yes.
> Could you "print *ar_ptr" in frame #2.
$1 = {mutex = 2, flags = 3, fastbinsY = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0,
0x0, 0x0, 0x0}, top = 0xf3d00570, last_remainder = 0x0, bins = {
0xf3d00040, 0xf3d00040, 0xf3d00048, 0xf3d00048, 0xf3d00050,
0xf3d00050,
0xf3d00058, 0xf3d00058, 0xf3d00060, 0xf3d00060, 0xf3d00068,
0xf3d00068,
0xf3d00070, 0xf3d00070, 0xf3d00078, 0xf3d00078, 0xf3d00080,
0xf3d00080,
0xf3d00088, 0xf3d00088, 0xf3d00090, 0xf3d00090, 0xf3d00098,
0xf3d00098,
0xf3d000a0, 0xf3d000a0, 0xf3d000a8, 0xf3d000a8, 0xf3d000b0,
0xf3d000b0,
0xf3d000b8, 0xf3d000b8, 0xf3d000c0, 0xf3d000c0, 0xf3d000c8,
0xf3d000c8,
0xf3d000d0, 0xf3d000d0, 0xf3d000d8, 0xf3d000d8, 0xf3d000e0,
0xf3d000e0,
0xf3d000e8, 0xf3d000e8, 0xf3d000f0, 0xf3d000f0, 0xf3d000f8,
0xf3d000f8,
0xf3d00100, 0xf3d00100, 0xf3d00108, 0xf3d00108, 0xf3d00110,
0xf3d00110,
0xf3d00118, 0xf3d00118, 0xf3d00120, 0xf3d00120, 0xf3d00128,
0xf3d00128,
0xf3d00130, 0xf3d00130, 0xf3d00138, 0xf3d00138, 0xf3d00140,
0xf3d00140,
0xf3d00148, 0xf3d00148, 0xf3d00150, 0xf3d00150, 0xf3d00158,
0xf3d00158,
0xf3d00160, 0xf3d00160, 0xf3d00168, 0xf3d00168, 0xf3d00170,
0xf3d00170,
0xf3d00178, 0xf3d00178, 0xf3d00180, 0xf3d00180, 0xf3d00188,
0xf3d00188,
0xf3d00190, 0xf3d00190, 0xf3d00198, 0xf3d00198, 0xf3d001a0,
0xf3d001a0,
0xf3d001a8, 0xf3d001a8, 0xf3d001b0, 0xf3d001b0, 0xf3d001b8,
0xf3d001b8,
0xf3d001c0, 0xf3d001c0, 0xf3d001c8, 0xf3d001c8, 0xf3d001d0,
0xf3d001d0,
0xf3d001d8, 0xf3d001d8, 0xf3d001e0, 0xf3d001e0, 0xf3d001e8,
0xf3d001e8,
0xf3d001f0, 0xf3d001f0, 0xf3d001f8, 0xf3d001f8, 0xf3d00200,
0xf3d00200,
0xf3d00208, 0xf3d00208, 0xf3d00210, 0xf3d00210, 0xf3d00218,
0xf3d00218,
0xf3d00220, 0xf3d00220, 0xf3d00228, 0xf3d00228, 0xf3d00230,
0xf3d00230,
0xf3d00238, 0xf3d00238, 0xf3d00240, 0xf3d00240, 0xf3d00248,
0xf3d00248,
0xf3d00250, 0xf3d00250, 0xf3d00258, 0xf3d00258, 0xf3d00260,
0xf3d00260,
0xf3d00268, 0xf3d00268, 0xf3d00270, 0xf3d00270, 0xf3d00278,
0xf3d00278,
0xf3d00280, 0xf3d00280, 0xf3d00288, 0xf3d00288, 0xf3d00290,
0xf3d00290,
0xf3d00298, 0xf3d00298, 0xf3d002a0, 0xf3d002a0, 0xf3d002a8,
0xf3d002a8,
0xf3d002b0, 0xf3d002b0, 0xf3d002b8, 0xf3d002b8, 0xf3d002c0,
0xf3d002c0,
0xf3d002c8, 0xf3d002c8, 0xf3d002d0, 0xf3d002d0, 0xf3d002d8,
0xf3d002d8,
0xf3d002e0, 0xf3d002e0, 0xf3d002e8, 0xf3d002e8, 0xf3d002f0,
0xf3d002f0,
0xf3d002f8, 0xf3d002f8, 0xf3d00300, 0xf3d00300, 0xf3d00308,
0xf3d00308,
0xf3d00310, 0xf3d00310, 0xf3d00318, 0xf3d00318, 0xf3d00320,
0xf3d00320,
0xf3d00328, 0xf3d00328, 0xf3d00330, 0xf3d00330, 0xf3d00338,
0xf3d00338,
0xf3d00340, 0xf3d00340, 0xf3d00348, 0xf3d00348, 0xf3d00350,
0xf3d00350,
0xf3d00358, 0xf3d00358...}, binmap = {0, 0, 0, 0}, next = 0xf5f00010,
next_free = 0x0, system_mem = 135168, max_system_mem = 135168}
> Does your compiler and/or configure force GNU2 TLS to be used?
I don't believe it's supported for powerpc, only for ARM and x86/x86_64.
> (How did you configure? Which compiler did you use?)
I didn't use any special options to select a particular powerpc variant,
so it's whatever processor GCC and glibc default to. This particular
build is with GCC 4.7 branch (but the problem also occurs with mainline).
> If GNU2 TLS is not in the picture, then I think it's likely that the
> patch(es) missed some path from TLS into malloc, and therefore arena
> is getting corrupted by TLS access from within a signal. We'll need to
> reproduce this here (fortunately we do have some ppc64 machines to do
> that on).
If the <signal handler called> backtrace is correct, and the main thread
backtrace in dlclose is also correct, I don't understand how a signal
handler should be executing in one thread while dlclose is executing
(indeed, the lack of a function name in the backtrace with the signal
handler would be explained by it executing the handler located in a module
that's in the process of being unloaded). So I wonder if there could be
something wrong with sem_wait / sem_post on powerpc (which has its own
sem_post variant), causing dlclose to execute when not all threads have
finished handling the signal, although I haven't managed to identify a bug
there (and one might expect such a bug to cause more than just this one
new test to fail).
--
Joseph S. Myers
joseph@codesourcery.com