This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: echo is wrong...


My comments are marked below by:
[Bob McGowan]
comment
[END]

-----Original Message-----
From: Andrew Dalgleish [mailto:andrewd@axonet.com.au]
Sent: Monday, April 13, 1998 11:51 PM
To: gnu-win32@cygnus.com
Subject: RE: echo is wrong...




--> -----Original Message-----
--> From:	Larry Hall [SMTP:lhall@rfk.com]
--> Sent:	1998 April 13, Monday 23:53
--> To:	earnie_boyd@hotmail.com
--> Cc:	gw32
--> Subject:	Re: echo is wrong...
--> 
--> At 05:13 AM 4/13/98 -0700, Earnie Boyd wrote:
--> >---Larry Hall <lhall@rfk.com> wrote:
--> 
--<snip>
--> >Why?  Who in there right mind would want anything but binary pipe
--> >reads?  What purpose would text pipes give?  I can't think of any. 
--> >Pipes should always just pass along any data received.  They should
--> >never do anything with the data, including interpret a ^Z as the
end
--> >of file.
--<snip> 
--> I completely agree with you Earnie.  Not that I want to start up a
--> text vs
--> binary war again but I've always come down on the side of using
--> binary.
--> While there may be reasons why its beneficial to have "text" mode
--> files,
--> its not at all clear to me that there are any benefits whatsoever to
--> having
--> "text" mode pipes.  If there are some good reasons (and it might be 
--> interesting to hear what people think these could be), its also not
--> clear 
--> to me that there are enough Win32 programs that would rely on "text"
--> mode
--> pipes to warrant the pain it causes all those who attempt to use the

--> Cygwin utilities.
--[Andrew Dalgleish]  
--Assuming you have text mode files, there is a very good reason for
using
--text mode pipes.
--It is not a good idea to have a tool operate in two different modes
(one
--mode for reading from a file, one mode for reading from a pipe).
--The characters which get passed through a pipe should be exactly the
--same as the characters which would be written to a file.
--This means translating end-of-line when reading and writing to a pipe,
--but *only* if a tool opens the pipe in text mode.
[Bob McGowan]
You appear to be assuming that the application (more, cat or whatever)
manipulates
the pipe.  I don't know how MS Win systems (or DOS, for that matter) do
it, but I do
know that in UNIX shells (including bash) all I/O redirection is handled
by the
shell.  And the shell does not "know" whether the tool being used will
want its
data in binary or text mode.  Safest, I think, is to do binary mode for
the pipe.
Then at least the data is passed in a consistent way.
[END]
----snip snip


--The plain vanilla Win32 tools are just as inconsistent with ^Z.
--What little documentation there is suggests that ^Z is only used to
--terminate stdin coming from the console, and is NOT the end-of-file
--marker when reading from a file or a pipe.
[Bob McGowan]
The whole point of this discussion is that the ^Z IS interpreted.  A
binary file,
containing an embedded ^Z character, read through a text mode file
descriptor,
will return EOF on reading the ^Z character.  This results in the
"truncated" file
problems that so many posters have been talking about.
[END]

--Remember that fgetc() returns an int so it can hold EOF, if ^Z was the
--end-of-file then fgetc() would return a char.
[Bob McGowan]
This logic does not work, for 2 reasons:
1) If you go back to C compilers ported to the DOS environment, you will
find
all sorts of UNIX'ish stuff that is clearly not supported by DOS (a good
example
is the stat structure, which has all 3 time fields, which all hold the
same value,
as well as fields for user id, group id and i-node number, none of which
are valid
for DOS).  The DOS way of doing things is being translated, as best as
possible,
into the UNIX/C way.  So, for text mode file descriptors, the underlying
code could
very well take a ^Z character and return whatever it needs to emulate
the UNIX/C
world.
2) Even on UNIX, this does not quite work.  At the OS level, the read()
system call
will return 0 characters read on EOF, which is then translated by higher
level
routines to be whatever EOF is defined to be.  The reason EOF is defined
as an INT
is so it can hold a value (generally -1, but not necessarily) that is
guaranteed to
NOT be a char.
[END]
--file A contains "123^Z456\n" (8 characters, ^Z == 0x1A)

--type A
--displays "123"

--more<A
--displays "123-456"

--type A|more
--displays "123-456"

--type A>B
--leaves B with 8 chars (123^Z456\n)

--type A|more>B
--leaves B with 11 chars (123^Z456\r\n\r\n)

--more<A >B
--leaves B with 9 chars (123^Z456\r\n)

--I would suggest that ^Z is *never* used for the end-of-file when
reading
--from a file or pipe.
[Bob McGowan]
Per my commnents above, I clearly disagree with this statement.  Also,
MS
itself has had to deal with this sort of thing.  Refer to the
documentation
for "copy" and the /b switch, which forces binary mode.  I have done
binary
downloads of "split" files which I needed to use in the MS environment.
The
tool to "cat" them together is "copy":  copy a+b+c+... destfile
But, the file+file format defaults to text mode and the above fails.
The
proper format is:  copy /b a+b+c+... destfile
And the reason has always been the presence, in the binary "split"
pieces, of
^Z characters.

The series of examples only prove that the utility is taking a peek at
what
is going on (writing a pipe vs. the console) and changing the file mode
as the
programmer deemed necessary.  And I guess this could technically be
taken to 
mean that ^Z is then not used as an end of file mark, but that is
because the
file is "probably" being accessed in binary mode (note this is a guess,
I have
no access to any source to prove the point).
[END]

--I use text files, and on the few occasions I run into problems I
remind
--myself that cygwin32 is not unix.
--It's great, but it's not unix, so I don't expect everything to work
--perfectly.
--But I am satisfied more often than I am not.

--Regards,
--Andrew Dalgleish
[Bob McGowan]
I think that the series of examples you have here just shows to what
lengths
MS Win has to go to get things "right".  Also, the point that this is
NOT UNIX
is well taken.  A good understanding of the DOS/MS Win way of doing
things helps
a lot in understanding what is going on.  And it is infinitely better,
as it
stands.

But, the environment being set up IS trying to emulate UNIX as much as
possible.
And pipes as well as commands like "cat" are "expected" to do the right
thing
with both text and binary items.  I think the safest, most consistent
and
reliable way of working this is to use binary mode file semantics in all
cases.
The other alternative would be to add code to test files and adjust the
I/O
in some way.  But this adds complexity and potential problems.

The final point is that I not only want the tools to work in as much of
a UNIX
way as possible, but they have to work consistently on both text and
binary files
to be useful.  And pipes and commands that "always" work in text mode,
cause me
no end of problems.
[END]

---
Please accept my apologies for sending you this directly as well as to
the list.
The list is so flakey and slow currently that I felt this would be the
more
reliable and speedy way to get an answer to you.

Bob McGowan
i'm:  bob dot mcgowan at artecon dot com 
-
For help on using this list (especially unsubscribing), send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]