problems with gawk 3.1.5-3 hanging -- more info

David Carter carter@carter.to
Thu Mar 30 21:51:00 GMT 2006


Corinna Vinschen wrote:

> O_TEXT is correct because gawk is a text tool in the first place and
> it should treat input lines identical, regardless if they have DOS
> or UNIX lineendings.

Hi Corinna, thanks for the prompt reply.

If I understand you correctly, the fix in -3 has to do with converting 
DOS-style CRLFs to LFs. This appears to be the issue. The ouput from 
rsync (on all platforms--windows/unix/POSIX/whatever) contains CR 
characters (0x0d) by themselves. This is what accounts for the output of 
rsync "overwriting" itself when you run it alone from a bash prompt.

Here's a snippet of hexdump output from rsync:

$ rsync -Pv /cygdrive/c/backup2 10.0.0.204:~ | xxd
0000000: 6261 636b 7570 320a 2020 2020 2020 2020  backup2.
0000010: 2037 3030 2020 2030 2520 2020 2030 2e30   700   0%    0.0
0000020: 306b 422f 7320 2020 2030 3a30 303a 3030  0kB/s    0:00:00
0000030: 0d20 2020 2020 3133 3736 3137 3620 2020  .     1376176
0000040: 3025 2020 2020 312e 3238 4d42 2f73 2020  0%    1.28MB/s
0000050: 2020 303a 3133 3a33 350d 2020 2020 2032    0:13:35.     2

You can see the 0d all by itself at address 0000030, and again at 0000059.

It appears to me that by opening the file as O_TEXT, that gawk is 
hanging because it is waiting for that LF char to follow the CR (which 
never comes). Does this sound likely to you?

> I can't tell why it fails for you, because I can't reproduce this
> locally.  

I'm working on a short script that reproduces the problem for all 
parties; I'll post it here when I have it. Or would you rather I send it 
directly to you?

Also, I took a look at some of the source for other utilites that work 
with text input; these included tail, head, cat, and sed. I don't see 
any of those utilities opening up the input file the way you are in 
gawk, and in fact a look at the ChangeLog for coreutils hints that they 
used setmode at one time and since removed it (why, I don't know). 
Comments abound like this in the ChangeLog:

ChangeLog:      * src/cat.c (main): Avoid setmode; use POSIX-specified 
routines instead.

My thinking was, "gawk should probably open files the same way sed 
does," but maybe my thinking is in error on this point. Your thoughts?

> As for the O_BINARY mode, in theory there's a way to
> accomplish that without rebuilding gawk by setting the BINMODE
> variable:
> 
>   gawk -v BINMODE=r [...]
> 
> Unfortunately it turns out that this doesn't work because gawk fails
> to call the setmode function in this case on Cygwin.  I'll upload a
> patched gawk soon.  If you want to apply it by yourself, try this:
>  (snip...)

This is a suitable workaround for me, but I would like to humbly submit 
that gawk shouldn't hang regardless of the input given to it. If the 
input isn't acceptable, perhaps it should error to stderr or some such 
and exit. Your thoughts?

Again, I'll come up with a short shell script that reproduces the issue 
for you, and hopefully together we can come up with an agreeable solution.

Regards;

David Carter

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list