This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: html to xml
- To: xsl-list at mulberrytech dot com
- Subject: RE: html to xml
- From: Sebastian Rahtz <sebastian dot rahtz at computing-services dot oxford dot ac dot uk>
- Date: Thu, 26 Oct 2000 17:01:52 +0100
- References: <OFB232F744.9DC1B94C-ON85256984.004C4064@pok.ibm.com>
- Reply-To: xsl-list at mulberrytech dot com
Joseph Kesselman/Watson/IBM writes:
>
> >If your HTML is valid, you can try James Clark's tool SX
>
> If it isn't valid HTML, "tidy" will clean it up... and then XMLify it, if
> you use the right options. Tidy is available from the W3C's website.
hmm. having been fighting this tidy-then-transform system for the last
day or two, can anyone tell me how they solve two (related) problems?
a) as we know, authors scatter <h1>, <h3> etc across their document
like pointers. my target DTD needs structured divisions. who has some
good XSLT code to sort it out? I have evolved a dirtyish solution,
involing disable-output-escaping, but if someone else has a reliable
clean system, I'd love to see it
b) HTML allows PCDATA practically anywhere, so far as I can see. so
I get
<h3>Hello</h3>
I am the walrus
where my target DTD wants something more like
<h3>Hello</h3>
<p>I am the walrus
How do others deal with this?
sebastian
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list