[PATCH] Cygwin: console: Avoid slipping past disable_master_thread check.

Johannes Schindelin Johannes.Schindelin@gmx.de
Sat Feb 3 14:53:29 GMT 2024


Hi Takashi,

On Sat, 3 Feb 2024, Takashi Yano wrote:

> If disable_master_thread flag is set between the code checking that
> flag not be set and the code acquiring input_mutex, input record is
> processed once after setting disable_master_thread flag. This patch
> prevents that.
>
> Fixes: d4aacd50e6cf ("Cygwin: console: Add missing input_mutex guard.")
> Signed-off-by: Takashi Yano <takashi.yano@nifty.ne.jp>

Thank you for double-checking this (and for finding and fixing the bug).

I wonder what could be a symptom of this bug. I ask because we have
noticed a couple of inexplicable hangs in GitHub workflow runs in the Git
for Windows and the MSYS2 projects, hangs that are almost certainly due to
the ConPTY code in Cygwin, and I hope to figure out what causes them, and
even more importantly, how to fix those. Maybe the bug you fixed in this
patch could explain this?

Concretely, the hangs occur typically when some `pacman` process (a
package manager using the MSYS2 runtime, i.e. the Cygwin runtime with
several dozen patches on top) calls a few non-Cygwin processes. Those
processes seem to succeed, but there is an extra `pacman` process left
hanging around (reported using the same command-line as its parent
process, indicating a `fork()`ed child process or maybe that
signal-handler that is spawned for every non-Cygwin child process) and at
that point the program hangs indefinitely (or at least until the GitHub
workflow run times out after 6 hours).

I was not able to obtain any helpful stacktraces, they all seem to make no
sense, I only vaguely remember that one thread was waiting for an object,
but that could be a false flag.

Stopping those hanging `pacman` processes via `wmic process ... delete`
counter-intuitively fails to result in `pacman` to exit with a non-zero
exit code. Instead, the program now runs to completion successfully!

Most recently, these hangs became almost reliably reproducible, when we
started changing a certain GitHub workflow that runs on a Windows/ARM64
build agent. I suspect that Windows/ARM64 just makes this much more
likely, but that the bug is also there on regular Windows/x86_64.

What changed in the GitHub workflow is a new PowerShell script that runs
`pacman` a couple of times, trying to update packages, and after that
force-reinstalls a certain package for the benefit of its post-install
script. This post-install script is run using the (MSYS) Bash, and it
calls among other things (non-MSYS) `git.exe`. And that's where it hangs
almost every time.

When I log into the build agent via RDP, I do not see any `git.exe`
process running, but the same type of hanging `pacman` process as
indicated above. Using the `wmic` command to stop the hanging process lets
the GitHub workflow continue without any indication of failure.

Running the PowerShell script manually succeeds every single time, so I
think the hang might be connected to running things headlessly.

Do you have any idea what the bug could be? Or how I could diagnose this
better? Attaching via `gdb` only produces unhelpful stacktraces (that may
even be bogus, by the looks of it). Or do you think that your patch that I
am replying to could potentially fix this problem? How could the code be
improved to avoid those hangs altogether, or at least to make them easier
to diagnose?

Ciao,
Johannes

> ---
>  winsup/cygwin/fhandler/console.cc | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/winsup/cygwin/fhandler/console.cc b/winsup/cygwin/fhandler/console.cc
> index 6a42b4949..1c8d383cd 100644
> --- a/winsup/cygwin/fhandler/console.cc
> +++ b/winsup/cygwin/fhandler/console.cc
> @@ -420,6 +420,12 @@ fhandler_console::cons_master_thread (handle_set_t *p, tty *ttyp)
>  	}
>
>        WaitForSingleObject (p->input_mutex, mutex_timeout);
> +      /* Ensure accessing input recored is not disabled. */
> +      if (con.disable_master_thread)
> +	{
> +	  ReleaseMutex (p->input_mutex);
> +	  continue;
> +	}
>        total_read = 0;
>        switch (cygwait (p->input_handle, (DWORD) 0))
>  	{
> @@ -4545,8 +4551,6 @@ fhandler_console::set_disable_master_thread (bool x, fhandler_console *cons)
>  	return;
>      }
>    const _minor_t unit = cons->get_minor ();
> -  if (con.disable_master_thread == x)
> -    return;
>    cons->acquire_input_mutex (mutex_timeout);
>    con.disable_master_thread = x;
>    cons->release_input_mutex ();
> --
> 2.43.0
>
>


More information about the Cygwin-patches mailing list