killpg(pgid, 0) fails if the process is in the middle of spawnve()

Jun T takimoto-j@kba.biglobe.ne.jp
Wed May 18 11:19:31 GMT 2022


Dear Cygwin developers,

It seems killpg(2) on Cygwin has a problem as described below.
Can this be (easily) fixed?

[1] The problem

killpg(pgid, 0) (or kill_pgrp(pgid, si_signo=0), in signal.cc)
fails (returns -1) even when there is a process in the process
group pgid, if the process is in the middle of spawnve().

[2] A problem of zsh on Cygwin that is caused by [1]

More than a year ago, a user of zsh on Cygwin/MSYS2 reported
to the zsh/workers mailing list:
https://www.zsh.org/mla/workers/2021/msg00060.html

As described in this post, it can sometimes (or frequently)
happen that a pipeline like 'ls | less' results in:

zsh% ls --color | less
zsh: done                    ls --color |
zsh: suspended (tty output)  less

How frequently you get this may depend on your hardware,
but if it happens you will find it quite annoying.

[3] How does [1] cause [2]?

According to the strace output, what is happening is as follows:

The main zsh (zsh0) fork() two subshells, zsh1 for ls and
zsh2 for less.
zsh1 becomes a process group leader (pid=pgid=101, for example),
gets tty (becomes foreground), and calls execve(ls).
zsh2 becomes a member of the process group pgid=101, and
calls execve(less).

When ls exits, zsh0 gets SIGCHLD, and in the signal handler
it calls killpg(101, 0) to see if there are any process
remaining in the process group 101. At this point zsh2/less
is still in the process group 101, so killpg(101, 0) should
succeed.

But when problem [2] happens, zsh2 has already called execve(less)
or spawnve(_P_OVERLAY,less), but spawnve() has not finished yet.

There are two Windows processes (zsh2 and less), but it _seems_
neither of them is included in the list of win-pids created by
    winpids pids ((DWORD) PID_MAP_RW);
at line 358 of signal.cc. So kill_pgrp() fails, and zsh0 thinks
that there is no foreground process remaining, and regains tty.

Later spawnve(less) completes, and less wants to write to stdout,
but it has not tty, and is stopped by SIGTTOU.

Is it possible to fix the problem [1], so that killpg(pgid, 0)
succeeds even when all the process(es) in the process group pgid
is/are in the middle of spawnve()?

--
Jun


More information about the Cygwin mailing list