This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.
Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
This is a race condition: the sigaction wrapper in linuxthreads/signals.c is not atomic, there is a window between registering the pthread signal handler and storing the real signal handler in sighandler[].old. How can this be solved best? Andreas.
- Subject: bug in thread support
- From: bazsi at balabit dot hu (Balazs Scheidler)
- Date: Sun, 21 Oct 2001 14:22:21 +0200
Hi, I was sending this information and example program to the linux kernel folks, but they responded that this must be a libc bug instead. So I'm sending this information to you. (the thread on the linux-kernel mailing list should give you additional information in addition to this message) So the problem: we are developing a massively multithreaded application. This application sends syslog() messages from its threads. The problem I'm encountering seems to be related to SIGPIPE handling (either the kernel signal code, the libc signal code or the linuxthreads signal code) Our application starts a new thread for each new TCP session. Writing to sockets may result in a SIGPIPE to be delivered and an EPIPE to be returned from write() when the remote end closes its socket. If this SIGPIPE happens about the same time as a syslog() libc call, a segmentation fault occurs. Since core dumping of multithreaded programs do not work reliably, I implemented a quick&dirty backtrace function, which dumps the stack when a signal occurs. (see the attached test program) My backtrace function reports that the SIGSEGV occurs at virtual address 0x1: bazsi@hugefw:~$ cc -g -lpthread stressthreads.c bazsi@hugefw:~$ ./a.out Signal (11) received, stackdump follows; eax='ffffffe0', ebx='0000001d', ecx='bc5ff96c', edx='00000400', eip='00000001' retaddr=0x1, ebp=0xbc5ff944 retaddr=0x8048a2a, ebp=0xbc5ffd74 retaddr=0x4001bc9f, ebp=0xbc5ffe34 bazsi@hugefw:~$ gdb a.out GNU gdb 19990928 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... (gdb) info line *0x8048a2a Line 80 of "stressthreads.c" starts at address 0x8048a12 <thread_func+118> and ends at 0x8048a2d <thread_func+145>. (gdb) l stressthreads.c:80 75 #endif 76 77 memset(buf, 'a', sizeof(buf)); 78 for (i = 0; i < 1024; i++) 79 { 80 write(fd, buf, sizeof(buf)); 81 } 82 close(fd); 83 //syslog(LOG_DEBUG, "thread stopped...%p\n", pthread_self()); 84 free(arg); (gdb) x/2i 0x8048a25 0x8048a25 <thread_func+137>: call 0x8048680 <write> 0x8048a2a <thread_func+142>: add $0x10,%esp so the virtual address of 0x804892a points where the write() call returns. The attached test program reproduces the SIGSEGV, although the time needed to do this depends whether you are using SMP or non-SMP kernel. SMP kernel with more than a single processor crashes within 1 second. Some instructions how to use the attached test programs: 1) stressthreads.c is the server, which crashes, compile it with gcc stressthreads.c -lpthreads and run it. It will bind itself to port 0.0.0.0:10000, and listens for incoming connections. It will syslog() a message, and write 1MB of data to the opened socket. The syslog() call is protected by a mutex (which I don't think is necessary, at least glibc seems to do locking on its own) 2) test-zorp.py, a small python script starting several parallel threads, connecting to the server in each thread, reading 1024 bytes of data, and closing the connection. (this will cause a nice SIGPIPE in the server process) Since this script was only put together to reproduce the problem, no argument parsing is done. You will need to adjust the IP address of the server at the end of the script (test() function call.) The application sets the SIGPIPE handler to a dummy function doing nothing but a return. (earlier it was SIG_IGNed, but since I suspected it the source of the problems I changed the code to use an empty function) The crash does _NOT_ occur if the threads do not send log messages via syslog(). I implemented my own syslog() routines for the time being, and the crash doesn't occur. I tried to narrow down the problem even more, but simply changing SIGPIPE handlers during the thread execution was not enough. (this is what syslog() is doing) There are several defines changing the behaviour of stressthreads.c: BACKTRACE when #defined it uses my backtrace function reporting the exact location of the sigsegv, otherwise SIGSEGV is not masked. SYSLOG whe #defined the threads send info to syslog. The crash doesn't occur with this undefined. SIGACTION use the SIGPIPE set/reset code similar to what is found in syslog() function. The crash didn't occur for me. The environment I have here is Debian GNU/Linux potato: ii libc6 2.1.3-18 GNU C Library: Shared libraries and Timezone bazsi@hugefw:~$ uname -a Linux hugefw 2.2.19 #2 SMP Thu Sep 27 17:23:56 CEST 2001 i686 unknown (hugefw has two PIII 800Mhz processors) If you need more information, please tell me I'd be glad to help. Thanks in advance. -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1#include <pthread.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <signal.h> #include <syslog.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <errno.h> #define BACKTRACE 0 #define SYSLOG 1 #define SIGACTION 0 #if BACKTRACE void inline z_dump_backtrace(unsigned long eip, unsigned long first_ebp) { /* NOTE: this is i386 specific */ unsigned long *ebp; fprintf(stderr, "retaddr=0x%lx, ebp=0x%lx\n", eip, first_ebp); ebp = (unsigned long *) first_ebp; while (ebp > (unsigned long *) &ebp && *ebp) { fprintf(stderr, "retaddr=0x%lx, ebp=0x%lx\n", *(ebp+1), *ebp); ebp = (unsigned long *) *ebp; } } void z_fatal_signal_handler(int signo) { struct sigcontext *p = (struct sigcontext *) (((char *) &p) + 16); fprintf(stderr, "Signal (%d) received, stackdump follows; eax='%08lx', ebx='%08lx', ecx='%08lx', edx='%08lx', eip='%08lx'\n", signo, p->eax, p->ebx, p->ecx, p->edx, p->eip); z_dump_backtrace(p->eip, p->ebp); exit(1); } #endif pthread_mutex_t syslog_mutex = PTHREAD_MUTEX_INITIALIZER; void *thread_func(void *arg) { int fd = *(int *) arg; int i; char buf[1024]; #if SYSLOG /* this shows the problem */ pthread_mutex_lock(&syslog_mutex); syslog(LOG_DEBUG, "thread started...%p\n", pthread_self()); pthread_mutex_unlock(&syslog_mutex); #endif #if SIGACTION /* this is not enough, the crash doesn't occur */ struct sigaction sa, oldsa; memset(&sa, 0, sizeof(sa)); sa.sa_handler = SIG_IGN; sigaction(SIGPIPE, &sa, &oldsa); for (i = 0; i < 102400; ) i++; sigaction(SIGPIPE, &oldsa, NULL); #endif memset(buf, 'a', sizeof(buf)); for (i = 0; i < 1024; i++) { write(fd, buf, sizeof(buf)); } close(fd); //syslog(LOG_DEBUG, "thread stopped...%p\n", pthread_self()); free(arg); return NULL; } int main() { int fd; struct sockaddr_in sin; int tmp = 1; #if BACKTRACE signal(SIGSEGV, z_fatal_signal_handler); #endif signal(SIGPIPE, SIG_IGN); fd = socket(AF_INET, SOCK_STREAM, 0); sin.sin_family = AF_INET; sin.sin_port = htons(10000); sin.sin_addr.s_addr = INADDR_ANY; setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &tmp, sizeof(tmp)); if (bind(fd, (struct sockaddr *) &sin, sizeof(sin)) < 0) { perror("bind"); return 0; } listen(fd, 255); while (1) { int newfd; int tmplen; pthread_t t; tmplen = sizeof(sin); newfd = accept(fd, (struct sockaddr *) &sin, &tmplen); if (newfd == -1) { perror("accept"); } else { int *state = (int *) malloc(sizeof(int)); *state = newfd; pthread_create(&t, NULL, thread_func, state); } } }#!/usr/bin/python from socket import * from time import time, sleep from thread import start_new_thread, get_ident from os import system import sys def httptest(name,url): id = get_ident() j=0 while 1: j=j+1 i="%u,%u"%(id,j) print "%s,%s,started"%(name,i) t1=time() try: f=socket(AF_INET, SOCK_STREAM) f.connect(url) f.read(1024) f.close() except: print "%s,%s,failed,%s"%(name,i,sys.exc_value) t2=time() print "%s,%s,elapsed,%f"%(name,i,(t2-t1)) def test(name,url,count): for i in range(1,count): print "starting %s thread(%u,%u)"%(name,i,count) start_new_thread(httptest,(name,url)) test("rovid", ("192.168.131.149", 10000), 100) system("ping 192.168.131.149")
-- Andreas Schwab "And now for something Andreas.Schwab@suse.de completely different." SuSE Labs, SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |