Deadlock of the process tree when running make

Alexey Izbyshev izbyshev@ispras.ru
Wed Apr 13 23:17:38 GMT 2022


On 2022-04-13 19:48, Alexey Izbyshev wrote:
> On 2022-04-11 13:10, Alexey Izbyshev wrote:
> What's probably not normal is the behavior of the hanging conhost.exe.
> I've compared the points where conhost.exe is blocked, and all but one
> threads in the model case are doing the same things as in the hanging
> case, but the remaining thread is blocked in
> ReadFile("\Device\NamedPipe\") (i.e. the read end of "hWritePipe" of
> pcon) instead of trying to enter a critical section like thread 1
> above. So now I'm starting to doubt that it's a cygwin bug and not
> some conhost.exe bug.
> 
> I'll try to poke around the hanging conhost.exe some more, and also
> may be will try to create a faster reproducer.
> 
I've studied conhost.exe hang, and it indeed looks like it's buggy.

TLDR: https://github.com/microsoft/terminal/pull/12181

The full story:

I dumped conhost.exe, opened the dump in windbg and looked at the stack 
trace of the hanging thread:

ntdll!NtWaitForAlertByThreadId+0x14
ntdll!RtlpWaitOnAddressWithTimeout+0x81
ntdll!RtlpWaitOnAddress+0xae
ntdll!RtlpWaitOnCriticalSection+0xfd
ntdll!RtlpEnterCriticalSectionContended+0x1c4
ntdll!RtlEnterCriticalSection+0x42
conhost!Microsoft::Console::Render::Renderer::_PaintFrameForEngine+0x54
conhost!Microsoft::Console::Render::Renderer::TriggerTeardown+0x19e60
conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit+0x21
conhost!Microsoft::Console::PtySignalInputThread::_GetData+0x65
conhost!Microsoft::Console::PtySignalInputThread::_InputThread+0x25
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

By looking at assembly, I've found that it hangs *after* ReadFile() on 
the pipe completes, so the problem is definitely not a leak of 
hWritePipe in bash.exe or elsewhere.

Using the function names, I've found this issue: 
https://github.com/microsoft/terminal/issues/1810.

This is a different one, but the discussion and the patch shows that 
synchronization on startup/shutdown is a disaster.

Then I looked at the code and identified that hang happens while 
attempting to lock the console at [1]. After studying how this lock is 
used in other parts of the code, I noticed that 
PtySignalInputThread::_Shutdown() (which is further up in the call stack 
of the hanging function) uses ProcessCtrlEvents() incorrectly, because 
the latter unconditionally unlocks the console, but the lock is never 
taken by this thread at this point. Then I looked at a more recent 
version of the code and discovered the patch to _Shutdown() which I 
referenced above.

I've also verified that assembly of _Shutdown() (which is inlined into 
PtySignalInputThread::_GetData()) corresponds to the unpatched version 
(i.e. without LockConsole() call):

call    conhost!CloseConsoleProcessState (00007ff6`22e7013c)
call    conhost!ProcessCtrlEvents (00007ff6`22e262a0)
mov     ecx,6Dh
call    
conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit 
(00007ff6`22e3c730)

I'm not sure why this bug is not triggered more frequently, but one 
possible reason, as indicated by comment [2], is that the bad path is 
only taken if there are live clients after ClosePseudoConsole() is 
called, which is probably rare.

A potential workaround on Cygwin side would be to ensure that the 
pseudoconsole doesn't have clients before calling ClosePseudoConsole(), 
but I don't know whether it's possible.

[1] 
https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/renderer/base/renderer.cpp#L75

[2] 
https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/host/PtySignalInputThread.cpp#L205

Alexey


More information about the Cygwin mailing list