This is the mail archive of the cygwin-patches@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Fwd: 1.5.11-1: sftp performance problem]


On Sep 12,  4:42pm, Corinna Vinschen wrote:
-- Subject: Re: [Fwd: 1.5.11-1: sftp performance problem]
>
> Do I understand that right?  sftp is in the blocking read on the pipe,
> there is data in the pipe and nevertheless read doesn't return?  That's
> odd.

Yes, very.  I'm still experimenting, so the following description
should be considered tentative:

ReadFile seems to want to return the number of bytes requested, not
a partial amount, at least the way we're using it with pipes now.

Bytes in the pipe seem to be buffered on the write side until there
are enough to satisfy the read, or the pipe fills (or maybe there is
some other high-water mark, but it is very close to full).  The system
also seems to empty the buffer after a certain amount of time ... I'm
not sure about the interval, but it's fairly long.

This all works most of the time.  But it interacts badly with the
POSIX atomic write requirements related to PIPE_BUF.  In particular,
select should say that a pipe is not writable when there is < PIPE_BUF
space available (and our implementation does this).

So, when sftp does a large blocking read (at least 16k, which is the
size of the pipe buffer), and ssh almost fills the pipe, then select
stops further writes from happening, and we only deliver data when
the timer expires, which is why transfers are so slow.

This also explains why the problem disappears when we use sftp -B to
reduce the size of the reads, because the data is delivered before
the pipe fills to within PIPE_BUF bytes.  And similarly when I increase
the size of the pipe buffer.

Most other programs that use select or nonblocking I/O aren't affected,
because they only try to read whatever is already in the pipe, and
they get the data immediately.

> |                                         I guess this means that local
> | pipes always do buffering as described in the previous paragraph, and
> | this can't be disabled using FILE_FLAG_WRITE_THROUGH.
> 
> Did you try that?

I haven't yet, but I will.  Disabling buffering would fix the problem.
Or if we could somehow control the buffering parameters (the high-water
mark or the timer), that would also probably be sufficient.  In particular,
setting the high-water mark to reserve PIPE_BUF bytes would be perfect.
I want to try FlushFileBuffers too, but we can't let it block (hmmm ...
maybe in a separate thread?).

The real problem is that NtQueryInformationFile returns information
about the data buffered on the write side of the pipe, and that
doesn't account for the possibility that a large read might already
be blocked on the other side, which effectively means that more space
is actually available.  If we could determine the amount of requested
data for the read, that would help.  The system must know, since it
seems to affect the buffering algorithm, but I don't know offhand how
to get at the info.

> Dunno if that's a *better* idea, but would it be reasonable to try changing
> pipes to use overlapped I/O?

Maybe, but that seems complicated.  I'm hoping for something simpler.

>                               Or what if a read from a pipe always asks for
> the number of available bytes using NtQueryInformationFile and then only
> actually reads this number of bytes and returns that to the caller?
> 
-- End of excerpt from Corinna Vinschen

That's similar to nonblocking reads, and it might work, and I guess for
blocking reads we would never attempt to read more than the size of the
pipe minus PIPE_BUF.  If we don't want to return partial data, then we
could redo reads until all of the data is delivered to the calling
program, but I think partial reads should be OK.

On the read side, we'd use PeekNamedPipe instead of NtQueryInformationFile.

--
Bob


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]