This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Unwarranted assumption in tst-waitid, or a kernel bug?


Greetings,

We've recently noticed intermittent failures in posix/tst-waitid and
rt/tst-mqueue5 under newer kernels.

Attached test is distilled from tst-waitid, and

- passes on kernels 2.6.18
- fails after ~30000 iterations on kernels 2.6.26
- fails after ~10 iterations on kernels 2.6.34

In addition to kernels we build ourselves, the failure has been observed on
"stock" Lucid distribution (2.6.32-24-generic #41-Ubuntu SMP), as well as
Fedora 11 (2.6.29.6-167.fc11.i586) and Fedora 13 (2.6.33.3-85.fc13.i686,
2.6.34.6-54.fc13.x86_64), but only on multi-processor machines.

The test succeeds when built with -DSKIP_SIGSTOP.

Is there some standard that says that glibc expectaion is correct, and
the SIGCHLD *must* be delevered before waitpid() returns?

If not, it seems that tst-waitid should be fixed (e.g. by nanosleep()ing
for 1 usec, though there is probably a better fix).

Thanks,
--
Paul Pluzhnikov



#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <sys/wait.h>

int expecting_sigchld;
int fork_counter;

void sighandler(int signo)
{
  if (signo != SIGCHLD) {
    fprintf(stderr, "Unexpected signal %d\n", signo);
    abort();
  }
  if (!expecting_sigchld) {
    fprintf(stderr, "Unexpected SIGCHLD, fork_counter = %d\n", fork_counter);
    abort();
  }
}

#ifndef SKIP_SIGSTOP
# define SKIP_SIGSTOP 0
#endif

int main()
{
  struct sigaction sa;

  memset(&sa, 0, sizeof(sa));
  sa.sa_handler = &sighandler;
  sa.sa_flags = SA_RESTART;
  if (0 != sigaction(SIGCHLD, &sa, NULL)) {
    perror("sigaction");
    abort();
  }

  while (fork_counter++ < 1000000) {
    int pid, status;
    struct timespec ts = { 0, 1000 };  // 1 usec
    switch ((pid = fork())) {
    case -1:
      perror("fork");
      abort();
    case 0:
      // child
      while (1) sleep(3600);
      abort();  // unreached
    default:
      // parent
      expecting_sigchld = 1;
#if !SKIP_SIGSTOP
      kill(pid, SIGSTOP);
      if (pid != waitpid(pid, &status, WUNTRACED)) {
        perror("waitpid");
        abort();
      }
      // A reasonable expectation is that SIGCHLD is delivered
      // before waitpid() returns successfully.
      expecting_sigchld = 0;
      nanosleep(&ts, NULL);  // aborts on Lucid
      expecting_sigchld = 1;
      kill(pid, SIGCONT);
#endif
      expecting_sigchld = 1;
      kill(pid, SIGKILL);
      if (pid != waitpid(pid, &status, 0)) {
        perror("waitpid");
        abort();
      }
      // A reasonable expectation is that SIGCHLD is delivered
      // before waitpid() returns successfully.
      expecting_sigchld = 0;
      break;
    }
    if (fork_counter % 10000 == 0) {
      // Print progress.
      fprintf(stderr, ".");
    }
  }
  fprintf(stderr, "\n");
  return 0;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]