This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

mixing clone(CLONE_VM|CLONE_FILES) with libc


I am currently working on a project where I use thread-processes
created with clone(CLONE_VM|CLONE_FILES). The scenario is as follows
(and I apologize that this is a little long):

I am running Linux 2.6.9 with glibc 2.3.4 installed. I have this
problem on both 32-bit and 64-bit x86 architectures.

The application I am working on is a multi-threaded one. At some
point, one of the threads, creates a new thread using pthread_create.
I will call this thread M.

M in turn does the clone system call with CLONE_VM|CLONE_FILES flags.
I'll call the new one C. M then does some other work and exits. At any
point there may be multiple C's running, but only one M at a time. C's
may be terminated at any point and this is done by sending them
SIGQUIT. C's always call _exit() to terminate.

After much testing, it became apparent that the TLS area used by M
could at some point be deallocated and C would crash when making libc
calls that use this area. I wrote my own wrappers for clone to allow
for the CLONE_SETTLS feature. To create a new TLS area, I allocate a
zeroed 4k page, and copy over the 16*sizeof(void *) header.

This works fine as long as C does not call printf().
If C calls printf() I seem to run into some sort of deadlock. M will
get stuck in __lll_mutex_lock_wait() as per the backtrace below:

#0  0x0000003e56bd2d2b in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6
#1  0x0000003e56d315f0 in _IO_stdfile_2_lock () from /lib64/tls/libc.so.6
#2  0x00000000412089f0 in ?? ()
#3  0x0000003e56b5c9b4 in ?? () from /lib64/tls/libc.so.6
#4  0x0000000000000003 in ?? ()
#5  0x0000000001305f67 in do_free (ptr=0x3e56d2f8c0) at src/tcmalloc.cc:2342
#6  0x0000003e56b5c830 in puts () from /lib64/tls/libc.so.6
#7  0x00000000006fc643 in CloneManager::do_clone (arg=0x1baae40)
    at somedirectory/clone.cc:694
#8  0x0000003e57606137 in start_thread () from /lib64/tls/libpthread.so.0
#9  0x0000003e56bc7543 in clone () from /lib64/tls/libc.so.6
#10 0x0000000000000000 in ?? ()

The code is compiled to inline everything so the stack trace is not
very informative. do_clone() is the body of M and printf() is being
executed (clone.cc +694). Unfortunately I cannot provide any of my
code since it is proprietary.

Whether there are any C's running at this point or not does not make a
difference.

Any ideas on what might be causing this? Do I perhaps need to
initialize any of the remainder of the pthread structure? I have been
stuck on this for a few days and any help whatsoever would be much
appreciated. Please let me know if you want/need more information.

Thank you,

Lukasz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]