This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

perl & \n (was: perl-5.8.0 breaks code "working" on 5.6.1 over "\n"


This is a followup to my email about differences between the way
perl-5.8.0 and perl-5.6.1 handle \n. This is a cygwin problem
but also one about portable code.

I am going to write about some experimentation. A lot of this
is for my own benefit, trying to understand \n on cygwin.

The code I am interested in is comparing the length of a string
with \n in it and the corresponding number of bytes in the file
the string comes from.

According to perldoc perlcygwin:

       o Text/Binary
           When a file is opened it is in either text or binary
           mode.  In text mode a file is subject to CR/LF/Ctrl-Z
           translations.  With Cygwin, the default mode for an
           open() is determined by the mode of the mount that
           underlies the file.  Perl provides a binmode() func-
           tion to set binary mode on files that otherwise would
           be treated as text.  sysopen() with the "O_TEXT" flag
           sets text mode on files that otherwise would be
           treated as binary:

Here is code which shows the problem.

-s file returns the size of the message on disk in bytes. length
returns the number of bytes in the string.

        #!/usr/bin/perl

        # sysopen(O, "file.txt", O_WRONLY|O_CREAT|O_TEXT)       # open line 1
        # open O, ">file.txt";      # open line 2
        # binmode O, ':raw';    # open line 3
        print O "123\n567\n";
        close O;

        open I, "file.txt";
        while ( <I> )
        {
                $string .= $_;
        }

        print "String is: $string\n";
        print "Length of string is: " . length ( $string ) . "\n";
        print "Size of file is: " . -s "file.txt";

For Win32 systems:

This is perl, v5.6.1 built for MSWin32-x86-multi-thread

With just open O, ">file.txt" uncommented

	String is: 123
	567

	Length of string is: 8
	Size of file is: 10

As a standard DOS text file, \n is CRLF, so -s "file.txt" ne length $string

If binmode O, ':raw' is also uncommented

	String is: 123
	567

	Length of string is: 8
	Size of file is: 8

Just the \cJ is written to disk. But then the file is unreadable
by Notepad and other Windows applications.

This was for comparison. Of course on Unix, binmode doesn't make
any difference. The length of both string and file are 8.

Now for cygwin. With perl, v5.6.1 built for cygwin-multi

I am going to try the combinations suggested in perldoc
perlcygwin.

With just the 'open O, ">file.txt";' line uncommented,

	sizetest.pl
	String is: 123
	567

	Length of string is: 10
	Size of file is: 8

That's strange, that the opposite of Win32 where the string is 8
bytes and the file is 10. I think the underlying mode here is text
mode. I think on cygwin, perl's -s test must know what the
underlying mode is, which it doesn't know on Win32.

Now with 'binmode O, ':raw';' uncommented to force the binary
mode write on "file.txt".

	sizetest.pl
	String is: 123
	567

	Length of string is: 8
	Size of file is: 8

Only \cJ is being written to disk.

Now with 'sysopen(O, "file.txt", O_WRONLY|O_CREAT|O_TEXT);' to force text mode.

	String is: 123
	567

	Length of string is: 8
	Size of file is: 8

Here the read seems to have been aware that the file was text mode.
But in this same case, with the O_TEXT flag, if the file has no 'txt' extension.

	String is: 123
	567

	Length of string is: 8
	Size of file is: 10

The result of the -s test, at least for 5.6.1, and at least in
the case of a force text mode write, seems to be dependent on
whether it has a 'txt' extension. The name does not have an
effect when I try forcing the binary mode write, or the default mode
write.

Now with the new perl-5.8.0, there is PerlIO, which apparently
replaces the C stdio library, and disciplines, which expand
binmode possibilities beyond :raw and :crlf.

With just the 'open O, ">file.txt";' the default write line uncommented,

	String is: 123
	567

	Length of string is: 8
	Size of file is: 10

This is the opposite of 5.6.1, at least with underlying text
mode, and the same as Win32 with 5.6.1.  The underlying mode is
DOS text, I am pretty sure.

Again with 'binmode O, ':raw';' uncommented to force the binary
mode write on "file.txt".

	String is: 123
	567

	Length of string is: 8
	Size of file is: 8

This is the same as 5.6.1.

And with 'sysopen(O, "file.txt", O_WRONLY|O_CREAT|O_TEXT);' to
force text mode.

	String is: 123
	567

	Length of string is: 8
	Size of file is: 8

This is the same as 5.6.1. And now, with 5.8.0, whether the file
has an extension or not doesn't matter for any of these write
methods.

Well, that's the end of my experiments. They are not complete. I
didn't try default binary mode mounts. It seems the most reliable
or portable or something are forced text mode writes, to a file
with a 'txt' file extension, or is that just a jump to a
conclusion.

In any case, with the aim being to write portable code that also
works on cygwin whether you are using 5.6.1 or 5.8.0, what do we
suggest to developers? I don't think you want to force a binary
file mail message format on cygwin users, because this would mean
you couldn't use Windows applications. I also don't think
developers are keen on sprinkling binmode all through their IO
routines either, especially when it won't help solve Win32
problems.

-- 
Greg Matheson                You can't get there from here.
Chinmin College

Taiwan Penpals Archive <URL: http://netcity.hinet.net/kurage>

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]