double-fork issue on Windows on ARM64

Jeremy Drake cygwin@jdrake.com
Wed May 8 18:54:21 GMT 2024


(this is the same issue discussed in
https://cygwin.com/pipermail/cygwin-patches/2024q1/012621.html)

On MSYS2, running on Windows on ARM64 only, we've been plagued by issues
with processes hanging up.  Usually pacman, when it is trying to validate
signatures with gpgme.  When a process is hung in this way, no debugger
seems to be able to attach properly.

After many months of off-and-on progress trying to debug this, we've
*finally* got an idea of what behavior is causing this, and a standalone
reproducer that runs on Cygwin.

> A common symptom is that the hanging process has a command-line that is
> identical to its parent process' command-line (indicating that it has
> been fork()ed), and anecdotally, the hang occurs when _exit() calls
> proc_terminate() which is then blocked by a call to TerminateThread()
> with an invalid thread handle (for more details, see
> https://github.com/msys2/msys2-autobuild/issues/62#issuecomment-1951796327).
>
> In my tests, I found that the hanging process is spawned from
> _gpgme_io_spawn() which lets the child process immediately spawn another
> child. That seems like a fantastic way to find timing-related bugs in
> the MSYS2/Cygwin runtime.
>
> As a work-around, it does seem to help if we avoid that double-fork.

That led me to make the attached reproducer, which is based on the code
from _gpgme_io_spawn.  I originally expected that this would require some
timing adjustment, hence the defines to change the binary and argument (I
expected to use /bin/sleep and different values).  It turns out, this
reproduces readily with /bin/true.

I build this with `gcc -ggdb -o testfork testfork.c`, and this reproduces:
* on a Raspberry PI 4 running Windows 10, with an i686 msys2 runtime
* on a QC710 running Windows 11 23H2, with x86_64 msys2 runtime (this
seems to reproduce it most readily).
* on a hyper-v virtual machine on Dev Kit 2023 running Windows 11 23H2,
with x86_64 msys2 runtime or Cygwin 3.5.3.  This seems to require running
two instances of testfork.exe at the same time.

When attaching to the hung process, gdb shows
(gdb) i thr
  Id   Target Id                Frame
  1    Thread 6516.0xbe8        error return
/cygdrive/d/a/scallywag/gdb/gdb-13.2-1.x86_64/src/gdb-13.2/gdb/windows-nat.c:748
was 31: A device attached to the system is not functioning.
0x0000000000000000 in ?? ()
  2    Thread 6516.0x1b28 "sig" 0x00007ff8051a8a64 in ?? ()
* 3    Thread 6516.0x12b4       0x00007ff8051b4374 in ?? ()


Let me know if I can provide any additional info, or anything else we can
try to help debug this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testfork.c
Type: text/x-c
Size: 1045 bytes
Desc: 
URL: <https://cygwin.com/pipermail/cygwin-developers/attachments/20240508/7026bbfc/attachment.bin>


More information about the Cygwin-developers mailing list