This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Showstopper for 2.1.3

To: libc-alpha Mailinglist <libc-alpha at sourceware dot cygnus dot com>
Subject: Showstopper for 2.1.3
From: Andreas Jaeger <aj at suse dot de>
Date: 13 Feb 2000 08:27:50 +0100
Cc: khendricks at ivey dot uwo dot ca


Hi glibc developers,

we received the appended bug report.  We should fix this before we
release 2.1.3.

Andreas

Subject: Digested Articles
From: khendricks at ivey dot uwo dot ca
Date: Sun Feb 13 08:26:28 2000

Topics:
   libc/1597: Last linuxthreads condvar change commiutted on Feb 11 broke jdk again!
   libc/1598: pthread cond timed wait still broken, addendum to Bug 1597

----------------------------------------------------------------------

Date: Sat, 12 Feb 2000 16:35:29 -0500
From: khendricks@ivey.uwo.ca
To: bugs@gnu.org
Subject: libc/1597: Last linuxthreads condvar change commiutted on Feb 11 broke jdk again!
Message-Id: <200002122135.QAA28445@delysid.gnu.org>

>Number:         1597
>Category:       libc
>Synopsis:       Last linuxthreads condvar change commiutted on Feb 11 broke jdk again!
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    libc-gnats
>State:          open
>Class:          sw-bug
>Submitter-Id:   unknown
>Arrival-Date:   Sat Feb 12 16:40:01 EST 2000
>Last-Modified:
>Originator:     khendricks@ivey.uwo.ca
>Organization:
net
>Release:        glibc 2.1.3 branch Feb 11 cvs
>Environment:
Linuxppc 2.2.14, glibc 2.1.3 from cvs Feb 11
>Description:
A new change just added on Feb 11 to linuxthreads again broke the jdk in 
conmdvar.c.  A 30 second timeout in a high signal environment now hangs forever.

This worked after my two patches were put in just recently but this new change
just broke it again.

DOES ANYONE TEST PATCHES BEFORE COMMITING THEM?

Please revert this change until I can figure out what is wrong with the new code
you just placed in condvar.c (the tight loop code) when used in high signal
environments.

>How-To-Repeat:
use jdk RC4 with glibc 2.1.3 cvs dated on or after Feb 11.%0
>Fix:
>Audit-Trail:
>Unformatted:

------------------------------

Date: Sat, 12 Feb 2000 23:18:30 -0500
From: khendricks@ivey.uwo.ca
To: bugs@gnu.org
Subject: libc/1598: pthread cond timed wait still broken, addendum to Bug 1597
Message-Id: <200002130418.XAA29768@delysid.gnu.org>

>Number:         1598
>Category:       libc
>Synopsis:       pthread cond timed wait still broken, addendum to Bug 1597
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    libc-gnats
>State:          open
>Class:          sw-bug
>Submitter-Id:   unknown
>Arrival-Date:   Sat Feb 12 23:20:01 EST 2000
>Last-Modified:
>Originator:     khendricks@ivey.uwo.ca
>Organization:
net
>Release:        2.1.3 cvs Feb 11
>Environment:
Linuxppc 2.2.14, glibc 2.1.3 cvs Feb 11
>Description:
This is an addendum to Bug 1597. Please see that bug number for a full
description.

I edited condvar.c and turned on pthreads debugging and edited the while
loop that repeatedly calls __libc_nanosleep and had it print out (via MSG())
the reltime value each time an interrupted syscall happened.

Would you believe reltime was actually increasing!

Here is a short snippet showing the value of reltime.tv_sec, reltime.tv_nsec,
and the value of errno (in this case 4 is EINTR).  There are two threads in
pthread_cond_timedwait in this example, you can look at values for thread 19016
which was originally told to wait for exactly 30 seconds.

19016 : reltime: 30 250000000 4
18987 : reltime: 1 460000000 4
19016 : reltime: 30 250000000 4
18987 : reltime: 1 470000000 4
19016 : reltime: 30 250000000 4
18987 : reltime: 1 480000000 4
19016 : reltime: 30 260000000 4
18987 : reltime: 1 490000000 4
19016 : reltime: 30 270000000 4
18987 : reltime: 1 500000000 4
19016 : reltime: 30 280000000 4
18987 : reltime: 1 510000000 4
19016 : reltime: 30 280000000 4
18987 : reltime: 1 510000000 4
19016 : reltime: 30 280000000 4
18987 : reltime: 1 510000000 4
19016 : reltime: 30 280000000 4

Notice by the end that the tv_nsec field has actually grown.

It seems the kernel routine (see linux/kernel/sched.c) converts the
time to jiffees and when interrupted converts jiffees back to time.

Unfortunately, some bug in this conversion is actually coming back with
a higher time than was passed in if it is interrupted fast enough.

So I think this is a kernel bug that is scratched by this new tight loop.

The previous version used in condvar timed wait always decreased time since
alot of time actually elapsed outside of __libc_nanosleep and it overwhelmed 
any tiny increases due to conversion to and from jiffees.

I have no idea whether this bug exists on Linux x86 or not.  The kernel
routine in question is not arch specific so it should be used by 
everyone.

This was the first time I have ever seen time actually increase!

Until this kernel issue is resolved, please revert your last condvar.c patch
back to what it was previously.

Thank you.

Kevin
>How-To-Repeat:
Play around with jdk native threads and linu
>Fix:
>Audit-Trail:
>Unformatted:

------------------------------

End of forwardFqClEx Digest
***************************


-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.rhein-neckar.de

Follow-Ups:
- Re: Showstopper for 2.1.3
  - From: Ulrich Drepper

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]