line endings, file path names (was: Updated: sed-4.1.5-2)

Joachim Achtzehnter joachima@netacquire.com
Thu Aug 3 18:43:00 GMT 2006


Corinna Vinschen wrote:

[JA wrote:]
>> Thank you very much for this fix. It will make life easier for all of
>> us who struggle with a mix of native and Cygwin tools. It is very much
>> appreciated that as far as line endings are concerned the attitude
>> taken by Cygwin developers is not "use POSIX line endings".
> 
> Sorry, but that's not why I did it.  My personal opinion is still 
> strongly on the "use POSIX line endings" side.

Too bad.

> I made the fix only so that other mailing lists don't suffer

This is a strange reason for changing sed's functional behaviour, but since
I like the outcome I won't complain. :-)

> CRLF lineendings are in the top 10 of the worst ideas in the OS
> business.

I agree 100%, and I also agree that DOS path names were a horrendous idea 
too, but neither of these questions are at issue here.

> and I'm seriously contemplating (for years) to just remove textmode
> from Cygwin.

This is where I disagree completely. From "CRLF was a bad idea" does not 
follow "hence we should not support it". This would just be sticking your 
head in the sand. Bad idea or not, you, or rather a text processing tool 
like sed, cannot avoid being faced by millions of documents that use CRLF 
and a few with Mac line endings too. The realization that it was a bad idea 
does not make these go away.

The only realistic approach here, and more so with line endings than with 
the path name issue, is that taken by XML (about which I usually have no 
good word to say):

  2.11 End-of-Line Handling

  XML parsed entities are often stored in computer files which, for editing
  convenience, are organized into lines. These lines are typically separated
  by some combination of the characters carriage-return (#xD) and line-feed
  (#xA).

  To simplify the tasks of applications, the characters passed to an
  application by the XML processor must be as if the XML processor
  normalized all line breaks in external parsed entities (including the
  document entity) on input, before parsing, by translating both the
  two-character sequence #xD #xA and any #xD that is not followed by #xA
  to a single #xA character.

With respect to "text mode" don't forget that this is also part of the ISO 
standard for C and C++, although those standards don't go as far as XML does.

Another way to look at the issue: You can definitely always blame the whole 
mess on those who started the whole CRLF thing and I'm all on your side, 
but users of your tools will have to muddle through this mess one way or 
another. You can make it easier for your users by making the tools tolerate 
  inputs that are affected by the mess that exists in real life, or you can 
make it difficult. If you take the latter route people will gravitate 
toward other tools in the long run. Cygwin has become as popular as it is 
because it helped get the job done, where the job is dealing with a mixed 
environment (POSIX-like behaviour in a non-POSIX environment).

Joachim

-- 
work:     joachima@netacquire.com   (http://www.netacquire.com)
private:  joachim@kraut.ca          (http://www.kraut.ca)

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list