This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
PATCH: Declare wait_node structure volatile
- From: Mark Brown <bmark at us dot ibm dot com>
- To: libcalpha <libc-alpha at sources dot redhat dot com>
- Date: Wed, 20 Mar 2002 15:37:29 -0600
- Subject: PATCH: Declare wait_node structure volatile
This is for the problem described in :
http://bugzilla.linux.ibm.com/show_bug.cgi?id=581
"Default pthread mutexes cause SIGSEGV on SMP IA64 machines"
We have found that the default type of pthread mutexs (TIMED) do
not function
properly on SMP machines. This problem is timing dependant and
so it SIGSEGVs in
slightly different ways, but it appears as though it is all to
do with wait_node
handling.
Here is an example backtrace:
#0 __pthread_alt_unlock (lock=0x80000fffffffb7b8) at spinlock.c:396
#1 0x2000000000068df0 in __pthread_mutex_unlock
(mutex=0x80000fffffffb7a0) at
mutex.c:195
#2 0x4000000000000c20 in lock_unlock ()
#3 0x4000000000000ca0 in critical ()
#4 0x2000000000066e20 in pthread_start_thread
(arg=0x2000000000effa60) at
manager.c:284
#5 0x2000000000249d00 in __clone2 () at soinit.c:56
#6 0x2000000000068df0 in __pthread_mutex_unlock (mutex=0x0) at
mutex.c:195
#7 0x00000000 in ?? ()
The example testcase below creates 2 threads that repeatedly
loop, locking and
unlocking a mutex. The main loop repeatedly creates a thread
that locks and
unlocks the mutex once and then exits. As wait_nodes are
allocated on the stack,
the short lived thread is what appears to be provoking the
SIGSEGV, but I
suspect that there are locking issues within wait_node_dequeue.
I have been
unable to recreate this problem on a 4-way PowerPC 64 machine.
I have tried using ADAPTIVE mutexs, however the problem still
remains as some
C-library functions use mutexs too, such as calloc.
This just declares several structures and variables to be volatile,
so a good compiler does not optimize them away (loading them into
registers once and keep using the register copies forever without
reloading them). This is a typical problem on SMP machines.
(A test program is appended after my .sig)
-------
2002-01-29 Mark Brown <bmark@us.ibm.com>
linuxthreads/spinlock.c: Declare wait_node structure volatile
--- glibc-2.2.4.orig/linuxthreads/spinlock.c Sat Sep 1 04:13:46 2001
+++ glibc-2.2.4/linuxthreads/spinlock.c Wed Mar 20 12:24:09 2002
@@ -269,9 +269,9 @@
struct wait_node {
- struct wait_node *next; /* Next node in null terminated
linked list */
- pthread_descr thr; /* The thread waiting with this node */
- int abandoned; /* Atomic flag */
+volatile struct wait_node *next; /* Next node in null
terminated linked list */
+volatile pthread_descr thr; /* The thread waiting with this node */
+volatile int abandoned; /* Atomic flag */
};
static long wait_node_free_list;
---
Mark S. Brown
Senior Technical Staff Member, IBM Server Group
512.838.3926 fax 512.838.3882
bmark@us.ibm.com
/* alt_test.c */
#include <pthread.h>
#include <stdio.h>
#include <time.h>
void lock_unlock(pthread_mutex_t *mux)
{
sched_yield();
pthread_mutex_lock(mux);
sched_yield();
pthread_mutex_unlock(mux);
}
void *critical(void *arg)
{
pthread_mutex_t *mux;
mux = (pthread_mutex_t *) arg;
lock_unlock(mux);
return NULL;
}
void *loopcritical(void *arg)
{
pthread_mutex_t *mux;
mux = (pthread_mutex_t *) arg;
do
{
sched_yield();
sched_yield();
lock_unlock(mux);
}while(1);
return NULL;
}
int main(int argc,char *argv)
{
unsigned long int x =0;
pthread_mutex_t mux;
pthread_mutexattr_t ma;
pthread_t thr[3];
pthread_mutexattr_init(&ma);
if(argc>1)
pthread_mutexattr_settype(&ma,PTHREAD_MUTEX_ADAPTIVE_NP);
pthread_mutex_init(&mux,&ma);
pthread_create(&thr[0],NULL,loopcritical,(void *) &mux);
pthread_create(&thr[2],NULL,loopcritical,(void *) &mux);
do{
pthread_create(&thr[1],NULL,critical,(void *) &mux);
sched_yield();
lock_unlock(&mux);
pthread_join(thr[1],NULL);
}while(1);
}