This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project. See the Cygwin home page for more information.
[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index] [Subject Index] [Author Index] [Thread Index]

Re: text/binary



---Peter Ring <PRI@cddk.dk> wrote:
>
> There's a discussion about this subject going on on the Perl-XML list.
> I'd like to quote a recent posting (see below). The idea is that the 
> *application*, or in this context, an application framework, should 
> hide the different record separator (linebreak) conventions.
> 
> What I miss is a discussion about the following:
> 
> - how to decide which convention to follow when physically 
>   *writing* a file to a mixed OS file system
8<

That would have to be dependent on the intended use of the file and
the tools with which the file is to be processed.  If you're talking
about a text file that is to be read by the programs of an intended
platform and the file could be shared among different platforms then
you'll have to give the user the option to decide how to save the file
or tools to convert the file from one format to another.  There would
have to be an initialization file to fill in the initial values for
determining the normal desires of the user.  You would have to output
the file in raw mode inserting the correct values for the line
endings.  Upon reading the file you would have to do the reverse of
writing the file.  This is just a short list of what would be
necessary and I'm not claiming to be an authority on this.

8<
> - how to parse different combinations of the most common record 
>   separator characters when *reading*. E.g., should "\xA\xD\xA" 
>   correspond to "\n\n", "\n\n\n", or whatever?
8<

That would have to be dependent on the intended platform from which
the text file came from and the intended use of the file.  If it is a
text file and all you're concerned with is an extra \r character, it
will either be at the beginning of the line, the end of the line, or
maybe both.  Adjust the line to remove or ignore them.

8<
> I guess the answer is "it depends". With XML, you don't have to 
> guess the character set (since the encoding will either be UTF-8 or 
> UTF-16, or will be declared in the XML declaration of the document). 
> But you still have guess the record separator, or know from context.
8< 

Yep.  It all depends on the intended use.  Here's a test in insanity:
write a program to convert a MS WORD document into an html document. ;^)

8<
> Kind regards
8<

And the same to you

8< 
> Peter Ring
8<

Earnie.

P.S.: I hope you don't mind my including the list on this.  My
apologies of you do.

8< 
> -----Original Message-----
> From: Bart Schuller [mailto:schuller+perl-xml@lunatech.com] 
> Sent: Thursday, March 04, 1999 23:58
> To: Perl-XML Mailing List
> Subject: Re: writing XML: use Øn or ØxA ?
> 
> 
> On Thu, Mar 04, 1999 at 12:48:11PM -0800, Tim Bray wrote:
> > I'm happy with the solution Enno Derksen proposed.  Because that
way,
> > if you write an XML file on a Mac, with \D linebreaks, then you
ship it 
> > over to a unix or DOS box, it will show up, to a XML::Parser
customer, 
> > as a logical \n.  It's reasonable to argue about whether this
should be 
> > done at the expat, the XS, or the XML::Parser layer, but I
wouldn't want 
> > want to lose expat's current behavior of treating \xD, \xA, and
\xD\xA 
> > as the same thing.  Changing it down in expat might risk a code
fork.  
> > James?
> 
> I fail to see how specifying that XML::Parser will only ever give
perl's
> "\n" would risk losing its ability to treat xD, \xA, and \xD\xA on
> _input_
> as the same thing.
> 
> The XML people standardized on \xA without taking any specific
> programming language into account, just that it can handle Unicode.
> Given that there is one unique end-of-line marker with no way of
knowing
> which of the 3 from the previous paragraph it really was, it doesn't
> matter which one we choose.
> 
> Given further that perl's "\n" would have certain advantages, I'd say
> that
> 
>     every perl XML API should present end of line as perl's "\n"
> 
> The advantages:
> 
> - when printed out, it can be parsed again, because "\n" maps to one
of
>   the 3 sequences (at least on platforms that can be expected to
handle
>   unicode at all). \xA has the same advantage
> - we can take advantage of perl's bias for handling "\n" as default
>   record separator and in regular expressions with /./
> - there's no danger of one platform becoming a source of not quite
>   portable code because _it_ happens to use \xA, and so is easier to
>   program for
> 
> Some of the same points can be made for the stdio library, which also
> special-cases "\n", but I leave that to users of other languages.
> 
> Basically, just because XML did something that should have been done
> ages ago (standardize which byte sequence represents end of line),
> doesn't mean that that makes it easier to process using existing OS's,
> libs and programming languages.
> 
> -- 
> The idea is that the first face shown to people is one they can
readily
> accept - a more traditional logo. The lunacy element is only revealed
> subsequently, via the LunaDude. [excerpted from the Lunatech Identity
> Manual]
> 
> 
 
==
-                        \\||//
-------------------o0O0--Earnie--0O0o-------------------
--                earnie_boyd@yahoo.com               --
-- http://www.freeyellow.com/members5/gw32/index.html --
----------------------ooo0O--O0ooo----------------------

PS: Newbie's, you should visit my page.
_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
Want to unsubscribe from this list?
Send a message to cygwin-unsubscribe@sourceware.cygnus.com