ssh.exe on cygwin: Write error / odd problem, any hints welcome
Sun Jul 14 07:47:00 GMT 2013
Dear cygwin list;
I'm having a problem, which appears related to Cygwin / ssh / Windows. It's evidenced by 'Write failed: Connection reset by peer". Others have reported it when using rsync with Cygwin, but it's actually an error that comes from ssh.exe, and I've duplicated the error by pasting a large amount of data into a ssh window.
1. With respect to rsync, it might be because of select() determining that the pipe file handle to ssh is writable when it isn't. I understand this has been a problem in the past, and may still be. This is not the avenue I'm even chasing right now, so please don't evenyone note that select + pipe writeability + Cygwin is a persistent problem.
2. I suspect ssh.exe + Cygwin is deadlocking somewhere. It's especially likely that the client and server keepalive mechanism is involved as I'm able to consistently reproduce errors and just prior to getting the Write failed message, I get a keepalive packet. This is the problem I started out actually trying to solve.
3. My reason for writing this list, however, is that there appears to be some sort of a global resource that I managed to bugger up, and only a reboot corrected. Having been experimenting with overflowing ssh.exe by pasting a great deal of info into it while connected to a server and putting various debug statements into ssh.exe to track down the issue, I decided to give plink.exe a try. I was thinking maybe the problem was with windows itself somehow, so give a different program NOT using ssh.exe or Cygwin a try. I did that, and having done that...the outcome was that a number of ssh.exe processes that we normally run for maintenance jobs all started failing intermittently to the test server. I then shut down 100% of all software using Cygwin, and all the ssh.exe processes. we gave it 5 minutes or so, and then started up the maintenance ssh.exe processes again (they do various port forwardings). They continued to exhibit problems moving port forwarded data. the ssh.exe processes don't use stdin/stdout or pipes, they're all network based.... I don't think they use any of stdin/stdout/stderr or pipes for pure network forward channels.
I completely appreciate this isn't entirely a Cygwin problem. That said, is anyone aware of any static buffers or other resources built within cgywin, or used by Cygwin but that are static in Windows, that could have been filled or exhausted - I don't even know an avenue to chase now, and soliciting hints. i.e. something whereby if I paste "too much" data into plink.exe on stdin that somehow some cgywin ssh.exe processes start having problems, and moreover would not recover by stopping all Cygwin processes and restarting.
Any hints or pointers to any code within openssh, Cygwin, etc, that might be worth poking around in and adding debug statements.
The good news is this is the first time we've seen anything remotely like this, and have a fairly extensive install base. We seem to be able to duplicate on our test setup consistently, which is good. It happens to be win 2008 R2. We've never seen any evidence of fork() or other problems that more typically get reported.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin