This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: igncr vs text mode mounts, performance vs compatibility


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Rob Walker on 10/18/2006 6:38 PM:
> I looked into my scripts a little harder, have better results, some new
> conclusions:

Rob, please avoid http://cygwin.com/acronyms/#TOFU.

Thanks for calculating some timings.

> 
> -----------------------------------------------------
> line ending  | mount mode | igncr | "user" time
> -----------------------------------------------------
> CRLF         |  text      |  set  | 1.0114s

Here, both cygwin and bash are checking for \r (obviously, the bash check
won't find any), and bash is forced to read the file one byte at a time.

> -----------------------------------------------------
> CRLF         |  text      | clear | 0.984s

Slightly faster; bash is still forced to read one byte at a time, but it
is not wasting efforts checking for \r.  This matches the bash-3.1-6
behavior, regardless of mount point.

> -----------------------------------------------------
>  LF          |  text      |  set  | 0.56995s
> -----------------------------------------------------
>  LF          |  text      | clear | 0.5653s

For these two, now bash can read a buffer at a time.  OK, so bash's check
for \r is in the noise compared to the speed penalty for slower file reads.

> -----------------------------------------------------
> CRLF         |  bin       |  set  | 0.59435s

When bash must filter \r, the timing is still noticeable, making it even
slower than a text mount that need not filter \r.

> -----------------------------------------------------
> CRLF         |  bin       | clear | whoops!
> -----------------------------------------------------
>  LF          |  bin       |  set  | 0.5545s
> -----------------------------------------------------
>  LF          |  bin       | clear | 0.5576s

Indeed, as I predicted, LF only on binary mounts are as fast as you can
get; the minor difference here on igncr is probably due to statistical
variance.

> 
> In the bin mode section (the Cygwin recommended mount mode): note here
> that there's an approx 7% penalty between the most accomodating case
> (CRLF on a binmode mount with igncr set) and the most restrictive case
> (LF only on a bin mode mount with igncr clear).  Less than 10% penalty
> on this perverse benchmark (handling _nothing_ but linefeeds) seems like
> a small price for compatibility.

But there's also the issue of POSIX compatibility - ignoring \r is not
POSIX compatible.  And any speed penalty, however slight, that it
noticeable in a benchmark, even if the penalty is in the noise for real
life cases, is worth addressing - if everyone took the attitude that their
patch was only 10% worse in the worst case, we'd have some slow programs.

On the other hand, the complaint factor on the mailing list is a tangible
factor, although much harder to objectively measure.  If I make igncr the
default (or more likely, if I make it depend on the state of
POSIXLY_CORRECT), I will be noticeably saving myself time by not having to
plow through so many emails from clueless users wondering why their CRLF
scripts don't work on cygwin, since those same scripts won't work on Linux
either.

>>
>> Are you saying that these people expect bash to treat CRLF as if the
>> CR were non-whitespace?  Can you give me an example where this would
>> be a useful feature?

It may not be a well-used feature, but I won't go so far as to call it not
useful.  One possible use - a script written with \n line endings, but
which wants to intentionally generate an output file with \r\n line
endings (this sounds like something sharutils might want to do).  On
Linux, literal \r in a here-doc get output to the file.  So it stands to
reason that someone might want to do the same action on cygwin when using
a binary mount.  Since cygwin's goal is to provide a Linux emulation, I
don't see any reason to artificially limit cygwin by making bash always
ignore \r; rather, I think it is only safe to ignore \r when explicitly
told to do so (either by a text mount, or by using igncr).

- --
Life is short - so eat dessert first!

Eric Blake             ebb9@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFN2y+84KuGfSFAYARAvs/AKDB1KWuMvOwVL7a2XRqapHpI0kO4QCeIv5U
dhd/hrxm0UJUf1Cs0F0OFF4=
=pKZR
-----END PGP SIGNATURE-----

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]