This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

RE: html to xml

To: xsl-list at mulberrytech dot com
Subject: RE: html to xml
From: Sebastian Rahtz <sebastian dot rahtz at computing-services dot oxford dot ac dot uk>
Date: Thu, 26 Oct 2000 17:01:52 +0100
References: <OFB232F744.9DC1B94C-ON85256984.004C4064@pok.ibm.com>
Reply-To: xsl-list at mulberrytech dot com

Joseph Kesselman/Watson/IBM writes:
 > 
 > >If your HTML is valid, you can try James Clark's tool SX
 > 
 > If it isn't valid HTML,  "tidy"  will clean it up... and then XMLify it, if
 > you use the right options. Tidy is available from the W3C's website.

hmm. having been fighting this tidy-then-transform system for the last
day or two, can anyone tell me how they solve two (related) problems?

 a) as we know, authors scatter <h1>, <h3> etc across their document
 like pointers. my target DTD needs structured divisions. who has some
 good XSLT code to sort it out? I have evolved a dirtyish solution,
 involing disable-output-escaping, but if someone else has a reliable
 clean system, I'd love to see it

 b) HTML allows PCDATA practically anywhere, so far as I can see. so
 I get

   <h3>Hello</h3>
   I am the walrus

 where my target DTD wants something more like

  <h3>Hello</h3>
  <p>I am the walrus

  How do others deal with this?

sebastian

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Follow-Ups:
- Elements to attributes
  - From: Peter Sparkes
- RE: html to xml
  - From: Lisa van Gelder

References:
- RE: html to xml
  - From: Joseph Kesselman/Watson/IBM

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]