This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Additional carriage return added by cygwin commands to DOS text files


ttjqryfbndgdx wrote:
Note that I don't have the issue with cat.
bash-3.2$ cat test1 > test2
bash-3.2$ xxd test2
0000000: 6161 610d 0a62 6262 0d0a                 aaa..bbb..

"cat" consider input and output as binary. So the syntax "cat a > b" is always equivalent as "cp a b".

Now if you think that cat should consider the files as text, telling Cygwin to remove CR on input and add them on output:
There is an error on input (the CR are not removed)
and an error on output (they are not added).
The 2 errors cancel themselves, so the result is still good.


I don't have it with sort used alone :
bash-3.2$ /usr/bin/sort test1 > test2
bash-3.2$ xxd test2
0000000: 6161 610d 0a62 6262 0d0a                 aaa..bbb..

"sort" open both input and output as text, it is what I call a "good text filter", like "more".


But get it when using sort in a pipe with cat :
bash-3.2$ cat test1 | /usr/bin/sort > test2
bash-3.2$ xxd test2
0000000: 6161 610d 0d0a 6262 620d 0d0a            aaa...bbb...

"cat" opens test1 in binary: error on input.
The unexpected CRs goes into cat memory, then into the pipe, then into the sort memory, then into the output file, where additional CR are inserted, because sort use text-mode output.


But using more instead of cat solves the issue :
bash-3.2$ more test1 | /usr/bin/sort > test2
bash-3.2$ xxd test2
0000000: 6161 610d 0a62 6262 0d0a                 aaa..bbb..

Same as sort.


test1 is opened in text mode by more, CRs are automatically stripped.
The correct data free of CR goes through "more" memory, the pipe, then "sort" memory.
Then test2 is opened for output in text mode and the CR automagically appears.


The key thing to understand is that when text files are opened using text mode (as they should always be), the programs never see the CR in memory. They are automatically stripped/appended by Cygwin when reading/writing into real files. Note that pipes (unlike real files) always contain binary data, without CRs.

No mystery (but hard to understand at first).

--
Vincent Rivière

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]