AF_UNIX/SOCK_DGRAM is dropping messages

Ken Brown kbrown@cornell.edu
Tue Mar 30 14:17:13 GMT 2021


On 3/24/2021 5:18 AM, Kristian Ivarsson via Cygwin wrote:
> Hi Glenn
> 
> Thanks for the reply, so more below
> 
>>> Hi all
>>>
>>> Using AF_UNIX/SOCK_DGRAM with current version (3.2.0) seems to drop  messages or at least they are not received in the same order they are  sent
>>>
>>> Attached C:ish (with C++ threads though) sample program that  essentially creates a "client" that writes as much as possible and a  "server" that consumes as much as possible
>>>
>>> It seems like some buffer is filled and then messages are dropped (or at least it appears so) (if someone is about to test this, the "sleep" might need to be adjusted in order to make this happen)
>>>
>>> Hopefully it's just a flaw in our application (and sample), but as far as we can see, this should work
>>>
>>>
>>> Does anyone, perhaps named Ken, have any insightful thoughts about this ?
>>
>>
>>> const int size = BUFSIZ * 2;
>>
>>
>>>      char buffer[size] = {};
>>>
>>>      for( int idx = 0; idx < count; ++idx)
>>>      {
>>>          memcpy( buffer, &idx, sizeof idx);
>>>
>>>          const ssize_t result = sendto( fd, buffer, size, 0, (struct
>>> sockaddr*)&address, sizeof address);
>>
>>
>>>              const ssize_t result = recv( fd, buffer, size, 0);
>> ...
>>>              int index = 0;
>>>              memcpy( &index, buffer, sizeof idx);
>>
>> This appears to be a programming error, unrelated to Cygwin.
>>
>> I know that what you provided was an example test case, but you might want  to check if your app is sending way too much when the actual payload size is much smaller.  In the example you provided, you are sending 16KB instead of 4 bytes for the count.
> 
> To send a larger buffer (in this case 16 KB) is intentional, but just the sizeof int is relevant. The reason is just to send many bytes and verify that they end up on the other side in correct order
> 
> 
>> Is your code handling partial read/recv and partial write/sendto?  (It is definitely a bug in the use of recv() in the sample code.)
> 
> It was not and the updated version does not either, but that is not the issue though but I added a test to verify that the whole chunk is sent/read
> 
>> Partial reads and writes can occur more frequently with non-blocking sockets, but it is still good defensive programming to detect and handle partial read/writes.
> 
> That might be the case, but this is blocking attempts though (or maybe I've misunderstood the flags ?), but regardless of that, the test-case is not about how to handle partial writes/reads though, but to kind of show that messages seems to be lost, but of cource code need not to be flawed so thanx for the feedback
> 
> It almost seems like it is UDP-semantics and that packages can get lost or end up in non sequential order, and of course SOCK_DGRAM tells you that, but the posix description says "UNIX domain datagram sockets are always reliable and don't reorder datagrams"
> 
> It seems like when an internal buffer or so of 64 KB is filled the rest of the packages are dropped until consumed, so in this case the 32 first packages arrive in correct order but after that any random package (with index > 32) seems to end up at the "server"
> 
>> It goes without saying that if your protocol sends a fixed size chunk of data, that you should ensure that you read the entire fixed size, even if only using part of the data.
> 
> That's done in the updated version, or at least verified

Thanks for the test case.  I can confirm the problem.  I'm not familiar enough 
with the current AF_UNIX implementation to debug this easily.  I'd rather spend 
my time on the new implementation (on the topic/af_unix branch).  It turns out 
that your test case fails there too, but in a completely different way, due to a 
bug in sendto for datagrams.  I'll see if I can fix that bug and then try again.

Ken


More information about the Cygwin mailing list