Wed Mar 10 06:05:00 GMT 1999
---Peter Ring <PRI@cddk.dk> wrote:
> There's a discussion about this subject going on on the Perl-XML list.
> I'd like to quote a recent posting (see below). The idea is that the
> *application*, or in this context, an application framework, should
> hide the different record separator (linebreak) conventions.
> What I miss is a discussion about the following:
> - how to decide which convention to follow when physically
> *writing* a file to a mixed OS file system
That would have to be dependent on the intended use of the file and
the tools with which the file is to be processed. If you're talking
about a text file that is to be read by the programs of an intended
platform and the file could be shared among different platforms then
you'll have to give the user the option to decide how to save the file
or tools to convert the file from one format to another. There would
have to be an initialization file to fill in the initial values for
determining the normal desires of the user. You would have to output
the file in raw mode inserting the correct values for the line
endings. Upon reading the file you would have to do the reverse of
writing the file. This is just a short list of what would be
necessary and I'm not claiming to be an authority on this.
> - how to parse different combinations of the most common record
> separator characters when *reading*. E.g., should "\xA\xD\xA"
> correspond to "\n\n", "\n\n\n", or whatever?
That would have to be dependent on the intended platform from which
the text file came from and the intended use of the file. If it is a
text file and all you're concerned with is an extra \r character, it
will either be at the beginning of the line, the end of the line, or
maybe both. Adjust the line to remove or ignore them.
> I guess the answer is "it depends". With XML, you don't have to
> guess the character set (since the encoding will either be UTF-8 or
> UTF-16, or will be declared in the XML declaration of the document).
> But you still have guess the record separator, or know from context.
Yep. It all depends on the intended use. Here's a test in insanity:
write a program to convert a MS WORD document into an html document. ;^)
> Kind regards
And the same to you
> Peter Ring
P.S.: I hope you don't mind my including the list on this. My
apologies of you do.
> -----Original Message-----
> From: Bart Schuller [ mailto:email@example.com ]
> Sent: Thursday, March 04, 1999 23:58
> To: Perl-XML Mailing List
> Subject: Re: writing XML: use ÃÂn or ÃÂxA ?
> On Thu, Mar 04, 1999 at 12:48:11PM -0800, Tim Bray wrote:
> > I'm happy with the solution Enno Derksen proposed. Because that
> > if you write an XML file on a Mac, with \D linebreaks, then you
> > over to a unix or DOS box, it will show up, to a XML::Parser
> > as a logical \n. It's reasonable to argue about whether this
> > done at the expat, the XS, or the XML::Parser layer, but I
> > want to lose expat's current behavior of treating \xD, \xA, and
> > as the same thing. Changing it down in expat might risk a code
> > James?
> I fail to see how specifying that XML::Parser will only ever give
> "\n" would risk losing its ability to treat xD, \xA, and \xD\xA on
> as the same thing.
> The XML people standardized on \xA without taking any specific
> programming language into account, just that it can handle Unicode.
> Given that there is one unique end-of-line marker with no way of
> which of the 3 from the previous paragraph it really was, it doesn't
> matter which one we choose.
> Given further that perl's "\n" would have certain advantages, I'd say
> every perl XML API should present end of line as perl's "\n"
> The advantages:
> - when printed out, it can be parsed again, because "\n" maps to one
> the 3 sequences (at least on platforms that can be expected to
> unicode at all). \xA has the same advantage
> - we can take advantage of perl's bias for handling "\n" as default
> record separator and in regular expressions with /./
> - there's no danger of one platform becoming a source of not quite
> portable code because _it_ happens to use \xA, and so is easier to
> program for
> Some of the same points can be made for the stdio library, which also
> special-cases "\n", but I leave that to users of other languages.
> Basically, just because XML did something that should have been done
> ages ago (standardize which byte sequence represents end of line),
> doesn't mean that that makes it easier to process using existing OS's,
> libs and programming languages.
> The idea is that the first face shown to people is one they can
> accept - a more traditional logo. The lunacy element is only revealed
> subsequently, via the LunaDude. [excerpted from the Lunatech Identity
-- firstname.lastname@example.org --
-- http://www.freeyellow.com/members5/gw32/index.html --
PS: Newbie's, you should visit my page.
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com
Want to unsubscribe from this list?
Send a message to email@example.com
More information about the Cygwin