This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

readv() questions


Warning - LONG and network code related - do not read if not interested
or not versed.

I'm trying to currently debug an issue where readv() seems to be filling
iovec's with bad data or otherwise overflowing when having to deal with a
large receive buffer. I say large receive buffer because I can replicate the
issue by writev()ing 1000+ iovec's on the sending side and readv()ing on
the cygwin side continously. I have multiply verified the sending side is
writev()ing the correct iovecs, with length intact and as I specified it -
however upon readv()ing the same back on the cygwin end, after a sporadic
number of data has been transfered (usually around 100 iovecs or so),
I get a spurious iovec filled with data I did not originally send out.

[ sending-side (Linux 2.6.9-22.0.1.EL) ]
--> writev() + write(variable length, sent in preceeded iovec)
--> 100mb uplink
--> 384/1500 dsl up/downlink
--> readv() + read(length derived from iovec received)
[ receiving-side (Cygwin 1.5.20s(0.155/4/2) 20060427) ]

The iovec itself is small, 13 bytes:
1 byte (total length)
1 byte (variable length)
1 byte (flag)
2 byte (header data)
4 byte (header data)
4 byte (header data)

On the sending side I writev() to the network stack, and then immediately
issue another write() afterwards containing the variable length data,
which I stored in the header (iovec[1]). On the receiving end, same deal,
just reverse. readv(), passing a char * to iovec[1], and relying on readv()
to fill it with the correct data received - which I then use as a length
to read() to get the variable length data following.

A few things:

1. Sanity test, nobody sees anything wrong with this fairly standard
procedure, correct?

2. What exactly is the purpose of dummytest() within
/winsup/cygwin/miscfuncs.cc?

The call to check_iovec_for_read from within readv():

    440 extern "C" ssize_t
    441 readv (int fd, const struct iovec *const iov, const int iovcnt)
    442 {
    443   extern int sigcatchers;
    444   const int e = get_errno ();
    445
    446   int res = -1;
    447
    448   const ssize_t tot = check_iovec_for_read (iov, iovcnt);

check_iovec_for_read is a macro defined as:

winsup.h:#define check_iovec_for_read(a, b) check_iovec ((a), (b), false)

The actual check_iovec() call with preceeding dummytest():

    162 static char __attribute__ ((noinline))
    163 dummytest (volatile char *p)
    164 {
    165   return *p;
    166 }
    167 ssize_t
    168 check_iovec (const struct iovec *iov, int iovcnt, bool forwrite)
    169 {
    170   if (iovcnt <= 0 || iovcnt > IOV_MAX)
    171     {
    172       set_errno (EINVAL);
    173       return -1;
    174     }
    175
    176   myfault efault;
    177   if (efault.faulted (EFAULT))
    178     return -1;
    179
    180   size_t tot = 0;
    181
    182   while (iovcnt != 0)
    183     {
    184       if (iov->iov_len > SSIZE_MAX || (tot += iov->iov_len) > SSIZE_MAX)
    185         {
    186           set_errno (EINVAL);
    187           return -1;
    188         }
    189
    190       volatile char *p = ((char *) iov->iov_base) + iov->iov_len - 1;
    191       if (!iov->iov_len)
    192         /* nothing to do */;
    193       else if (!forwrite)
    194         *p  = dummytest (p);
    195       else
    196         dummytest (p);
    197
    198       iov++;
    199       iovcnt--;
    200     }
    201
    202   assert (tot <= SSIZE_MAX);
    203
    204   return (ssize_t) tot;
    205 }

Lines 190 to 196 seem completely pointless to me unless I'm missing
something, which I believe to be the case here. Can someone explain it? Due
to the use of volatile and the explicit noinline attribute, I have a
feeling it's some form of memory assertion - but why?

Anyways, the cases where the situation *does not* happen are if I run it
under strace (which smells of a race) or if I throttle the data manually
by only sending a set amount and then requesting ack from the receiving
side (which I use the flags var for). If I go full unthrottled, no acks,
standard write it all to wire, read it all from wire - the s* hits the fan.

What I believe is causing the issue is an MTU related problem. It almost
always seems to get into weirdness right around 1452 bytes transfered. I
have verified, via Ethereal, that my assertions fail (which are checking
the variable length stored in the header I sent == what is stored in the
received iovec) when readv() reads data at the border of a TCP packet in
the stream (i.e. the next portion of an iovec or the next iovec entirely is
in the next packet). Ethereal also verifies that the data sent is exactly
as I had placed it on the sending stack via writev() from sending host.
Ethereal also verifies that the problems occur as iovec data or iovecs
within the array passed to readv() span TCP packets.

I'm slowly going through the code, which can be a mission, but I'm beginning
to wonder if this section:

    219 void
    220 fhandler_base::raw_read (void *ptr, size_t& ulen)
    221 {
    222 #define bytes_read ulen
    223
    224   HANDLE h = NULL;      /* grumble */
    225   int prio = 0;         /* ditto */
    226   DWORD len = ulen;
    227
    228   ulen = (size_t) -1;
    229   if (read_state)
    230     {
    231       h = GetCurrentThread ();
    232       prio = GetThreadPriority (h);
    233       SetThreadPriority (h, THREAD_PRIORITY_TIME_CRITICAL);
    234       signal_read_state (1);
    235     }
    236   BOOL res = ReadFile (get_handle (), ptr, len, (DWORD *) &ulen, 0);
    237   if (read_state)
    238     {
    239       signal_read_state (1);
    240       SetThreadPriority (h, prio);
    241     }
    242   if (!res)
    243     {
    244       /* Some errors are not really errors.  Detect such cases here.  */
    245
    246       DWORD  errcode = GetLastError ();
    247       switch (errcode)
    248         {
    249         case ERROR_BROKEN_PIPE:
    250           /* This is really EOF.  */
    251           bytes_read = 0;
    252           break;
    253         case ERROR_MORE_DATA:
    254           /* `bytes_read' is supposedly valid.  */
    255           break;
    256         case ERROR_NOACCESS:

is culprit... There *are* some relatively spooky looking calls in there,
coming from a POSIX perspective.

But according to my MS API docs on ReadFile - it shall not return until
it has read the number of bytes requested (or times out, specified
through SetCommTimeouts I believe - although I do not see it used under
fhandler_base. I presume there is another way through the win32 API when
using sockets?):

"If hFile is not opened with FILE_FLAG_OVERLAPPED and lpOverlapped is NULL,
the read operation starts at the current file position and ReadFile does
not return until the operation is complete, and then the system updates
the file pointer."

ERROR_MORE_DATA is not surprisingly defined as:
"ERROR_MORE_DATA: More data is available."

The API references it here:

"If a named pipe is being read in message mode and the next message is
longer than the nNumberOfBytesToRead parameter specifies, ReadFile returns
FALSE and GetLastError returns ERROR_MORE_DATA. The remainder of the message
may be read by a subsequent call to the ReadFile or PeekNamedPipe function."

However this applies to named pipes - not necessarily sockets.  But I'm
weary of this section:

    253         case ERROR_MORE_DATA:
    254           /* `bytes_read' is supposedly valid.  */
    255           break;

Mainly because I do not see anywhere where there is an explicit check in the
form of:

if (len != bytes_read)	/* bytes_read is really ulen */
	handle_problem();

Let's just throw out the wild assumption that win32 does something funky
when data requested via ReadFile() spans an MTU size or resides in a
following TCP packet associated with the stream - throwing an error and
saying ERROR_MORE_DATA. An example case being mine where I request 13
bytes and we get 2 for instance.  Upon returning from raw_read(), not much
is done in the way of error checking there either:

Within fhandler_base::read():
    725   raw_read (ptr + copied_chars, len);
    726   if (!copied_chars)
    727     /* nothing */;
    728   else if ((ssize_t) len > 0)
    729     len += copied_chars;
    730   else
    731     len = copied_chars;
    732
    733   if (rbinary () || len <= 0)
    734     goto out;


My actual readv() wrapping code is very basic and standard, so I don't think
it's doing anything evil or causing a problem:

    400 size_t n_recv_iov(int s, const struct iovec *v, size_t c, int tout)
    401 {
    402         size_t          br;
    403         int             res;
    404         struct timeval  to;
    405         fd_set          fds, fds_m;
    406
    407         FD_ZERO(&fds_m);
    408         FD_SET(s, &fds_m);
    409
    410         while (1) {
    411                 fds = fds_m;
    412                 to.tv_sec = tout;
    413                 to.tv_usec = 0;
    414
    415                 if ((br = readv(s, v, c)) == (size_t)-1) {
    416                         switch (errno) {
    417                         case EWOULDBLOCK:
    418                         case EINTR:
    419                                 break;
    420                         default:
    421                                 perror("readv");
    422                                 return -1;
    423                         }
    424                 } else {
    425                         break;
    426                 }
    427
    428                 if ((res = select(s + 1, &fds, NULL, NULL, &to)) == 0)
    429                         return -1;	/* timeout */
    430                 else if (res == -1) {
    431                         perror("select");
    432                         return -1;	/* never happen */
    433                 }
    434         }
    435
    436         return br;
    437 }

And my call to it is basic as well:

     61         IOV_SET(&packet[0], &byte_tl, sizeof(byte_tl));
     62         IOV_SET(&packet[1], &byte_vl, sizeof(byte_vl));
     63         IOV_SET(&packet[2], &byte_flags, sizeof(byte_flags));
     64         IOV_SET(&packet[3], &nbo_s, sizeof(nbo_s));
     65         IOV_SET(&packet[4], &nbo_t_onl, sizeof(nbo_t_onl));
     66         IOV_SET(&packet[5], &nbo_t_ofl, sizeof(nbo_t_ofl));
     67
     68         for (error = 0; !error; ) {
     69                 error = 1;
     70
     71                 if ((hl = n_recv_iov(s, packet, NE(packet), 60)) == (size_t)-1)
     72                         break;
     73
     74			assert(byte_vl < sizeof(byte_var));
     75
     76                 if ((vl = n_recv(s, byte_var, byte_vl, 60)) == (size_t)-1)
     77                         break;
     78                 if (hl == 0 || vl == 0)
     79                         break;
     80
     81                 error = 0;
     82
     83                 /* process_data(); */
     84         }

Sorry for the ultra mail, but I know for a fact that readv() on cygwin is
doing bad things when faced with a lot of data to read from the wire. Any
insights?

-cl


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]