This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[bazsi@balabit.hu (Balazs Scheidler)] bug in thread support


This is a race condition: the sigaction wrapper in linuxthreads/signals.c
is not atomic, there is a window between registering the pthread signal
handler and storing the real signal handler in sighandler[].old.  How can
this be solved best?

Andreas.



Hi,

I was sending this information and example program to the linux kernel
folks, but they responded that this must be a libc bug instead. So I'm
sending this information to you. (the thread on the linux-kernel mailing
list should give you additional information in addition to this message)

So the problem: we are developing a massively multithreaded application.
This application sends syslog() messages from its threads. The problem I'm
encountering seems to be related to SIGPIPE handling (either the kernel
signal code, the libc signal code or the linuxthreads signal code)

Our application starts a new thread for each new TCP session. Writing to
sockets may result in a SIGPIPE to be delivered and an EPIPE to be returned
from write() when the remote end closes its socket. If this SIGPIPE happens
about the same time as a syslog() libc call, a segmentation fault occurs.
Since core dumping of multithreaded programs do not work reliably, I
implemented a quick&dirty backtrace function, which dumps the stack when a
signal occurs. (see the attached test program)

My backtrace function reports that the SIGSEGV occurs at virtual address
0x1:

bazsi@hugefw:~$ cc -g -lpthread stressthreads.c 
bazsi@hugefw:~$ ./a.out 
Signal (11) received, stackdump follows; eax='ffffffe0', ebx='0000001d', ecx='bc5ff96c', edx='00000400', eip='00000001'
retaddr=0x1, ebp=0xbc5ff944
retaddr=0x8048a2a, ebp=0xbc5ffd74
retaddr=0x4001bc9f, ebp=0xbc5ffe34
bazsi@hugefw:~$ gdb a.out 
GNU gdb 19990928
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb) info line *0x8048a2a 
Line 80 of "stressthreads.c" starts at address 0x8048a12 <thread_func+118>
   and ends at 0x8048a2d <thread_func+145>.
(gdb) l stressthreads.c:80
75	#endif
76	
77	  memset(buf, 'a', sizeof(buf));
78	  for (i = 0; i < 1024; i++)
79	    {
80	      write(fd, buf, sizeof(buf));
81	    }
82	  close(fd);
83	  //syslog(LOG_DEBUG, "thread stopped...%p\n", pthread_self());
84	  free(arg);
(gdb) x/2i 0x8048a25
0x8048a25 <thread_func+137>:	call   0x8048680 <write>
0x8048a2a <thread_func+142>:	add    $0x10,%esp

so the virtual address of 0x804892a points where the write() call returns.

The attached test program reproduces the SIGSEGV, although the time needed
to do this depends whether you are using SMP or non-SMP kernel. SMP kernel
with more than a single processor crashes within 1 second.

Some instructions how to use the attached test programs:
1) stressthreads.c is the server, which crashes, compile it with 
     gcc stressthreads.c -lpthreads

   and run it. It will bind itself to port 0.0.0.0:10000, and listens for
   incoming connections. It will syslog() a message, and write 1MB of data 
   to the opened socket. The syslog() call is protected by a mutex (which I
   don't think is necessary, at least glibc seems to do locking on its own)

2) test-zorp.py, a small python script starting several parallel threads,
   connecting to the server in each thread, reading 1024 bytes of data, and 
   closing the connection. (this will cause a nice SIGPIPE in the server
   process)

   Since this script was only put together to reproduce the problem, no
   argument parsing is done. You will need to adjust the IP address of the
   server at the end of the script (test() function call.)

The application sets the SIGPIPE handler to a dummy function doing nothing
but a return. (earlier it was SIG_IGNed, but since I suspected it the source
of the problems I changed the code to use an empty function)

The crash does _NOT_ occur if the threads do not send log messages via
syslog(). I implemented my own syslog() routines for the time being, and the
crash doesn't occur. I tried to narrow down the problem even more, but
simply changing SIGPIPE handlers during the thread execution was not enough.
(this is what syslog() is doing)

There are several defines changing the behaviour of stressthreads.c:

BACKTRACE when #defined it uses my backtrace function reporting the exact
          location of the sigsegv, otherwise SIGSEGV is not masked.
SYSLOG    whe #defined the threads send info to syslog. The crash doesn't
          occur with this undefined.
SIGACTION use the SIGPIPE set/reset code similar to what is found in
          syslog() function. The crash didn't occur for me.

The environment I have here is Debian GNU/Linux potato:

ii  libc6          2.1.3-18       GNU C Library: Shared libraries and Timezone
bazsi@hugefw:~$ uname -a
Linux hugefw 2.2.19 #2 SMP Thu Sep 27 17:23:56 CEST 2001 i686 unknown

(hugefw has two PIII 800Mhz processors)

If you need more information, please tell me I'd be glad to help.

Thanks in advance.
-- 
Bazsi
PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1
#include <pthread.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <signal.h>
#include <syslog.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <errno.h>

#define BACKTRACE 0
#define SYSLOG 1
#define SIGACTION 0

#if BACKTRACE
void inline 
z_dump_backtrace(unsigned long eip, unsigned long first_ebp)
{
  /* NOTE: this is i386 specific */
  unsigned long *ebp;
  
  fprintf(stderr, "retaddr=0x%lx, ebp=0x%lx\n", eip, first_ebp);
  
  ebp = (unsigned long *) first_ebp;
  while (ebp > (unsigned long *) &ebp && *ebp) 
    {
      fprintf(stderr, "retaddr=0x%lx, ebp=0x%lx\n", *(ebp+1), *ebp);
      ebp = (unsigned long *) *ebp;
    }
}

void
z_fatal_signal_handler(int signo)
{
  struct sigcontext *p = (struct sigcontext *) (((char *) &p) + 16);

  fprintf(stderr, "Signal (%d) received, stackdump follows; eax='%08lx', ebx='%08lx', ecx='%08lx', edx='%08lx', eip='%08lx'\n",
        signo, p->eax, p->ebx, p->ecx, p->edx, p->eip);
  z_dump_backtrace(p->eip, p->ebp);
  exit(1);
}
#endif

pthread_mutex_t syslog_mutex = PTHREAD_MUTEX_INITIALIZER;

void *thread_func(void *arg)
{
  int fd = *(int *) arg;
  int i;
  char buf[1024];

#if SYSLOG
  /* this shows the problem */
  pthread_mutex_lock(&syslog_mutex);
  syslog(LOG_DEBUG, "thread started...%p\n", pthread_self());
  pthread_mutex_unlock(&syslog_mutex);
#endif

#if SIGACTION
  /* this is not enough, the crash doesn't occur */
  struct sigaction sa, oldsa;

  memset(&sa, 0, sizeof(sa));
  sa.sa_handler = SIG_IGN;
  sigaction(SIGPIPE, &sa, &oldsa);
  
  for (i = 0; i < 102400; )
    i++;
  sigaction(SIGPIPE, &oldsa, NULL);
#endif

  memset(buf, 'a', sizeof(buf));
  for (i = 0; i < 1024; i++)
    {
      write(fd, buf, sizeof(buf));
    }
  close(fd);
  //syslog(LOG_DEBUG, "thread stopped...%p\n", pthread_self());
  free(arg);
  return NULL;
}

int main()
{
  int fd;
  struct sockaddr_in sin;
  int tmp = 1;
  
#if BACKTRACE
  signal(SIGSEGV, z_fatal_signal_handler);
#endif
  signal(SIGPIPE, SIG_IGN);
  
  fd = socket(AF_INET, SOCK_STREAM, 0);
  
  sin.sin_family = AF_INET;
  sin.sin_port = htons(10000);
  sin.sin_addr.s_addr = INADDR_ANY;
  
  setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &tmp, sizeof(tmp));
  
  if (bind(fd, (struct sockaddr *) &sin, sizeof(sin)) < 0)
    {
      perror("bind");
      return 0;
    }
  
  listen(fd, 255);

  while (1)
    {
      int newfd;
      int tmplen;
      pthread_t t;
      
      tmplen = sizeof(sin);
      newfd = accept(fd, (struct sockaddr *) &sin, &tmplen);
      if (newfd == -1)
        {
          perror("accept");
        }
      else
        {
          int *state = (int *) malloc(sizeof(int));
          
          *state = newfd;
          pthread_create(&t, NULL, thread_func, state);
        }
    }
}
#!/usr/bin/python

from socket import *
from time import time, sleep


from thread import start_new_thread, get_ident
from os import system
import sys

def httptest(name,url):
        id = get_ident()
        j=0
        while 1:
                j=j+1
                i="%u,%u"%(id,j)
                print "%s,%s,started"%(name,i)
                t1=time()
                try:
                        f=socket(AF_INET, SOCK_STREAM)
			f.connect(url)
			f.read(1024)
	                f.close()
                except:
                        print "%s,%s,failed,%s"%(name,i,sys.exc_value)
                t2=time()
                print "%s,%s,elapsed,%f"%(name,i,(t2-t1))

def test(name,url,count):
        for i in range(1,count):
                print "starting %s thread(%u,%u)"%(name,i,count)
                start_new_thread(httptest,(name,url))

test("rovid", ("192.168.131.149", 10000), 100)
system("ping 192.168.131.149")

PGP signature





-- 
Andreas Schwab                                  "And now for something
Andreas.Schwab@suse.de				completely different."
SuSE Labs, SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]