Deadlock of the process tree when running make
Alexey Izbyshev
izbyshev@ispras.ru
Wed Apr 13 23:17:38 GMT 2022
On 2022-04-13 19:48, Alexey Izbyshev wrote:
> On 2022-04-11 13:10, Alexey Izbyshev wrote:
> What's probably not normal is the behavior of the hanging conhost.exe.
> I've compared the points where conhost.exe is blocked, and all but one
> threads in the model case are doing the same things as in the hanging
> case, but the remaining thread is blocked in
> ReadFile("\Device\NamedPipe\") (i.e. the read end of "hWritePipe" of
> pcon) instead of trying to enter a critical section like thread 1
> above. So now I'm starting to doubt that it's a cygwin bug and not
> some conhost.exe bug.
>
> I'll try to poke around the hanging conhost.exe some more, and also
> may be will try to create a faster reproducer.
>
I've studied conhost.exe hang, and it indeed looks like it's buggy.
TLDR: https://github.com/microsoft/terminal/pull/12181
The full story:
I dumped conhost.exe, opened the dump in windbg and looked at the stack
trace of the hanging thread:
ntdll!NtWaitForAlertByThreadId+0x14
ntdll!RtlpWaitOnAddressWithTimeout+0x81
ntdll!RtlpWaitOnAddress+0xae
ntdll!RtlpWaitOnCriticalSection+0xfd
ntdll!RtlpEnterCriticalSectionContended+0x1c4
ntdll!RtlEnterCriticalSection+0x42
conhost!Microsoft::Console::Render::Renderer::_PaintFrameForEngine+0x54
conhost!Microsoft::Console::Render::Renderer::TriggerTeardown+0x19e60
conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit+0x21
conhost!Microsoft::Console::PtySignalInputThread::_GetData+0x65
conhost!Microsoft::Console::PtySignalInputThread::_InputThread+0x25
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21
By looking at assembly, I've found that it hangs *after* ReadFile() on
the pipe completes, so the problem is definitely not a leak of
hWritePipe in bash.exe or elsewhere.
Using the function names, I've found this issue:
https://github.com/microsoft/terminal/issues/1810.
This is a different one, but the discussion and the patch shows that
synchronization on startup/shutdown is a disaster.
Then I looked at the code and identified that hang happens while
attempting to lock the console at [1]. After studying how this lock is
used in other parts of the code, I noticed that
PtySignalInputThread::_Shutdown() (which is further up in the call stack
of the hanging function) uses ProcessCtrlEvents() incorrectly, because
the latter unconditionally unlocks the console, but the lock is never
taken by this thread at this point. Then I looked at a more recent
version of the code and discovered the patch to _Shutdown() which I
referenced above.
I've also verified that assembly of _Shutdown() (which is inlined into
PtySignalInputThread::_GetData()) corresponds to the unpatched version
(i.e. without LockConsole() call):
call conhost!CloseConsoleProcessState (00007ff6`22e7013c)
call conhost!ProcessCtrlEvents (00007ff6`22e262a0)
mov ecx,6Dh
call
conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit
(00007ff6`22e3c730)
I'm not sure why this bug is not triggered more frequently, but one
possible reason, as indicated by comment [2], is that the bad path is
only taken if there are live clients after ClosePseudoConsole() is
called, which is probably rare.
A potential workaround on Cygwin side would be to ensure that the
pseudoconsole doesn't have clients before calling ClosePseudoConsole(),
but I don't know whether it's possible.
[1]
https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/renderer/base/renderer.cpp#L75
[2]
https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/host/PtySignalInputThread.cpp#L205
Alexey
More information about the Cygwin
mailing list