This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

LONG! perl-5.8.0 handling of \n (was: perl-5.6.1 handling of \n


The reason I got interested in this was 5.8.0's breaking of code
working in 5.6.1. The code compared the number of bytes in the
internal representation of an email message with the number
stored in the file. 

Here is the result of my earlier script run on 5.6.1

For underlying /binary/ mount mode
Discipline: default     String length: 8        File size: 8
Discipline: binary      String length: 8        File size: 8
Discipline: text        String length: 10       File size: 10

For underlying /text/ mount mode
Discipline: default     String length: 10       File size: 10
Discipline: binary      String length: 8        File size: 8
Discipline: text        String length: 10       File size: 10

Here is the same script on 5.8.0:

For underlying /binary/ mount mode
Discipline: default     String length: 8        File size: 10
Discipline: binary      String length: 8        File size: 8
Discipline: text        String length: 8        File size: 10

For underlying /text/ mount mode
Discipline: default     String length: 8        File size: 10
Discipline: binary      String length: 8        File size: 8
Discipline: text        String length: 8        File size: 10

If some of the values were 'wrong' under 5.6.1, at least they
were equal :-) With 5.8.0, it is finding the 'right' string
length in all cases, but now this value is only equal to the file
size when binmode() is used (ie writing to a Unix style file is
forced), even on an underlying binary mode mount.

It appears the following from perldoc
perlcygwin is no longer an adequate account of what is happening.

       o Text/Binary
           When a file is opened it is in either text or binary
           mode.  In text mode a file is subject to CR/LF/Ctrl-Z
           translations.  With Cygwin, the default mode for an
           open() is determined by the mode of the mount that
           underlies the file.  Perl provides a binmode() func-
           tion to set binary mode on files that otherwise would
           be treated as text.  sysopen() with the "O_TEXT" flag
           sets text mode on files that otherwise would be
           treated as binary:

It appears that it is no longer just a choice between writing to
a binary mode mount or with binmode, as opposed to a text mode
mount or with O_TEXT.

According to perldoc perldelta

       o   Previous versions of perl and some readings of some
           sections of Camel III implied that ":raw" "discipline"
           was the inverse of  ":crlf".  Turning off "clrfness"
           is no longer enough to make a stream truly binary. So
           the PerlIO ":raw" discipline is now formally defined
           as being equivalent to binmode(FH) - which is in turn
           defined as doing whatever is necessary to pass each
           byte as-is without any translation.  In particular
           binmode(FH) - and hence ":raw" - will now turn off
           both CRLF and UTF-8  translation and remove other
           "layers" (e.g. :encoding()) which would modify byte
           stream.

This seems to be a consequence of the new IO,

       o   IO is now by default done via PerlIO rather than sys-
           tem's "stdio".  PerlIO allows "layers" to be "pushed"
           onto a file handle to alter the handle's behaviour.
           Layers can be specified at open time via 3-arg form of
           open:

              open($fh,'>:crlf :utf8', $path) || ...

           or on already opened handles via extended "binmode":

              binmode($fh,':encoding(iso-8859-7)');

           The built-in layers are: unix (low level read/write),
           stdio (as in previous Perls), perlio (re-implementa-
           tion of stdio buffering in a portable manner), crlf
           (does CRLF <=> "\n" translation as on Win32, but
           available on any platform).  A mmap layer may be
           available if platform supports it (mostly UNIXes).

           Layers to be applied by default may be specified via
           the 'open' pragma.

perldoc perlio says about defaults:

       If the platform is MS-DOS like and normally does CRLF to
       "\n" translation for text files then the default layers
       are :

         unix crlf

       (The low level "unix" layer may be replaced by a platform
       specific low level layer.)

       Otherwise if "Configure" found out how to do "fast" IO
       using system's stdio, then the default layers are :

         unix stdio

       Otherwise the default layers are

         unix perlio

       ...

       The default can be overridden by setting the environment
       variable PERLIO to a space separated list of layers (unix
       or platform low level layer is always pushed first).

       ...

         cd .../perl/t
         PERLIO=stdio  ./perl harness
         PERLIO=perlio ./perl harness


So my earlier script may have been an adequate test bed for
5.6.1. The read on the file used a default open, and the string
read in seemed to reflect what had been written to the file. With
5.8.0 however, the read with a default open appears to be doing a
translation of CRLF to \n, because the platform is 'MS-DOS like'.
I need a script to test the effects of the various layers.

In any case, looking at the results of the earlier script for
5.8.0 at the top of the email and comparing them with those for
5.6.1, it also appears that default writes to a file, EVEN IF ON
AN UNDERLYING BINARY MOUNT, will now leave CRs in the file. This
is something that people won't be too happy about, I think.

-- 
Greg Matheson                    The best jokes are 
Chinmin College                  those you play on
                                 yourself.
Taiwan Penpals Archive <URL: http://netcity.hinet.net/kurage>

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]