Fix nanosleep returning negative rem

Corinna Vinschen corinna-cygwin@cygwin.com
Wed Jul 21 09:30:50 GMT 2021


On Jul 21 09:07, David Allsopp wrote:
> > On Jul 20 16:16, David Allsopp wrote:
> > > I've pushed a repro case for this to
> > > https://github.com/dra27/cygwin-nanosleep-bug.git
> > >
> > > Originally noticed as the main CI system for OCaml has been failing
> > > sporadically for the signal.ml test mentioned in that repo. This
> > > morning I tried hammering that test on my dev machine and discovered
> > > that it fails very frequently. No idea if that's drivers, Windows 10
> > > updates, number of cores or what, but it was definitely happening, and
> > > easily.
> > >
> > > Drilling further, it appears that NtQueryTimer is able to return a
> > > negative value in the TimeRemaining field even when SignalState is
> > > false. The values I've seen have always been < 15ms - i.e. less than
> > > the timer resolution, so I wonder if there is a point at which the
> > > timer has elapsed but has not been signalled, but WaitForMultipleObjects
> > returns because of the EINTR signal.
> > > Mildly surprising that it seems to be so reproducible.
> > >
> > > Anyway, a patch is attached which simply guards a negative return
> > > value. The test on tbi.SignalState is in theory unnecessary.
> > 
> > Thanks for the patch, I think your patch is fine.  However, I'd like to
> > dig a bit into this to see what exactly happens.  Do you have a very
> > simple testcase in plain C, by any chance?
> 
> https://github.com/dra27/cygwin-nanosleep-bug/blob/main/signal.c was
> as simple as I'd gone at this stage (eliminating OCaml from the
> equation!). It might be possible to get it to happen without all the
> pthreads stuff: having confirmed it definitely wasn't OCaml and been
> able to put the appropriate system_printf's into cygwait to see that
> NtQueryTimer really was returning this small negative value, I stopped
> simplifying.
> 
> Does that repro case trigger on your system too?

I'm not sure.  Would the output " - nanosleep failed: ..." indicate the
bug has been triggered?  If so, no, I can't reproduce this on my system.

I wrote a quick STC using the NT API calls and I can't reproduce the
problem with this code either.  The output is either

  SignalState: 1 TimeRemaining: -5354077459183

or

  SignalState: 0 TimeRemaining: 653

I never get a small negative value in the latter case.  Can you
reproduce your problem with this testcase or tweak it to reproduce it?

Either way, your patch as safe guard should be ok.


Thanks,
Corinna


More information about the Cygwin-patches mailing list