python fails asyncio tests (py 3.7 & 3.8)

Corinna Vinschen corinna-cygwin@cygwin.com
Wed Dec 2 13:38:13 GMT 2020


Hi Mark,

On Dec  2 01:01, Mark Geisert wrote:
> Hi folks,
> I'm following up on the OP's investigation supplied in
> https://cygwin.com/pipermail/cygwin/2020-November/246832.html .
> The situation is a socket select thread stuck in a wait-for-event loop that
> doesn't realize select() is trying to cleanup that thread before returning a
> result to the app.  Here is the relevant part of an strace log:
> 
> >   114 8495682 [main] python3.8 1987 start_thread_socket: stuff_start 0xFFFF8C38
> >    68 8495750 [main] python3.8 1987 cygthread::create: name socksel, id 0x737C, this 0x180234778
> >    76 8495826 [main] python3.8 1987 cygthread::create: activated name 'socksel', thread_sync 0x3A8 for id 0x737C
> >   122 8495948 [socksel] python3.8 1987 thread_socket: stuff_start 0xFFFF8C38, timeout 4294967295
> >    78 8496026 [main] python3.8 1987 select_stuff::wait: m 4, us 10000, wmfo_timeout -1
> >    77 8496103 [socksel] python3.8 1987 fhandler_socket_local::af_local_connect: af_local_connect called, no_getpeereid=0
> >   115 8496218 [socksel] python3.8 1987 fhandler_socket_local::af_local_send_secret: Sending af_local secret succeeded
> >    95 8496313 [socksel] python3.8 1987 fhandler_socket_local::af_local_recv_secret: entered
> > 11450 8507763 [main] python3.8 1987 select_stuff::wait: wait_ret 3, m = 4.  verifying
> >   135 8507898 [main] python3.8 1987 select_stuff::wait: timed out
> >    98 8507996 [main] python3.8 1987 select_stuff::wait: returning 1
> >    84 8508080 [main] python3.8 1987 select: sel.wait returns 1
> >    73 8508153 [main] python3.8 1987 select_stuff::cleanup: calling cleanup routines
> >    78 8508231 [main] python3.8 1987 socket_cleanup: si 0x800324910 si->thread 0x180234778
> [end of strace.. nothing further happens]
> 
> The 'socksel' thread is shown entering af_local_recv_secret(), so this is
> all part of local socket connection startup, when a secret is sent and
> received, then credentials are sent and received.  The socksel thread is
> looping on a WSAWaitForMultipleEvents() call.  The OP suggested using
> select_info.stop_thread to indicate that wait loops should exit.  That would
> work further up the stack, but at this level the code doesn't (currently)
> see the appropriate select_info.

This is apparently an old problem in the still current AF_LOCAL
implementation.  Christian Franke encountered it when porting postfix:

https://sourceware.org/legacy-ml/cygwin/2014-08/msg00420.html

The problem is the security handshake between listening/accepting socket
and connecting socket.  The connecting socket send its half of the
handshake and waits for accept on the other side to return the other
half.  However, if the listening side doesn't accept right away, the
connecting side hangs.

The workaround right now is to call

  int peercred_off = 1;
  fd = socket (AF_LOCAL, SOCK_STREAM, 0);
  setsockopt(fd, SOL_SOCKET, SO_PEERCRED, &peercred_off, sizeof peercred_off);

This disables the security handshake.


Corinna


More information about the Cygwin-developers mailing list