connect() hangs on a listen()ing AF_UNIX socket

Corinna Vinschen
Fri Aug 22 09:39:00 GMT 2014

On Aug 21 21:14, Christian Franke wrote:
> Corinna Vinschen wrote:
> >On Aug 21 18:16, Christian Franke wrote:
> >>Corinna Vinschen wrote (in thread "[ITP] libsuexec 1.0"):
> >>>Postfix for Cygwin would be *so* nice.  Sigh.  ...
> >>Due to the following problem, Postfix hangs during startup (and blocks any
> >>possible "[ITP] postfix ..."):
> >>
> >>If a AF_UNIX socket is in listen()ing state, a client connect() should
> >>succeed immediately. On Cygwin, connect() waits until the server site
> >>accept()s the connection.
> >>
> >>Testcase:
> >>...
> >>
> >>
> >>This is likely because fhandler_socket::af_local_connect() waits for some
> >>secret. Sending it in af_local_accept() is too late in this case.
> >>
> >>Unfortunately the event handling of postfix relies on the correct behavior
> >>and there is possibly no easy workaround.
> >Off the top of my head I don't see one inside the Cygwin DLL :(
> Complex but may work: A fhandler_socket::listen() on a AF_UNIX/SOCK_STREAM
> socket starts a thread which accept()s connections, performs the handshake
> and puts the new socket descs in a queue. fhandler_socket::accept4() then no
> longer calls accept() but waits for the next entry in the queue.

Yeah, that might be very tricky, especially if the executable forks and
execs after calling listen.

> >The problem is that the package exchange at the start of an
> >accept/connect is required to be able to exchange credentials.  This in
> >turn is required for getpeereid and the SO_PEERCRED socket option which
> >is utilized at least by sshd.
> Easier and may work for Postfix: Add a Cygwin specific socket option like
> SO_DONT_NEED_PEERCRED which is set immediately after Postfix calls
> socket(AF_UNIX, SOCK_STREAM). If set, no handshake occurs on
> connect()/accept(). getpeerid()/SO_PEERCRED should fail then.

Well, it's not *only* SO_PEERCRED.  Another, the older part of the
handshake, is about recognizing the peer.  Since AF_UNIX sockets don't
exist on Windows, Cygwin is using AF_INET sockets under the hood, and
so *any* Windows process could accidentally connect to a Cygwin AF_UNIX
socket.  The handshake also aims to avoid this scenario.  Only if the
handshake worked, the peers can be sure to talk to another Cygwin
process assuming an AF_UNIX socket.

A Cygwin-specific socket option which switches off the handshake would
disallow this peer recognition.  How bad is that?  I'm not sure.

Another potential solution might be to defer the AF_UNIX handshake to
the first send/recv:

Whatever the peers do, there is a certain protocol used.  That means,
there's an implicit understanding who's going to do the first send and
who's doing the first recv.  So, after connect/accept, both sides of the
sockets go into "connected_but_handshake_missing" mode.  On the first
send/recv, the handshake gets started and if it fails, send/recv

This might be easier to implement and might even get rid of the special
code in select handling the AF_UNIX handshake after a non-blocking
connect.  The potential problem here is that this might require another
set of changes to cover select...


