This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: rendering marginal XML
- To: xsl-list at lists dot mulberrytech dot com
- Subject: Re: [xsl] rendering marginal XML
- From: Peter Flynn <peter at silmaril dot ie>
- Date: Fri, 02 Nov 2001 15:24:08 +0000
- Organization: Silmaril Consultants
- References: <20011026154357.FVQE5977.femail30.sdc1.sfba.home.com@there>
- Reply-To: xsl-list at lists dot mulberrytech dot com
Jay Kline wrote:
> I run a list server that generates its logs in XML format.
> It appears to be valid XML,
Only if you have a DTD for it. Otherwise it's just well-formed
> but it uses a method that is rather clumbersome.
(or "evilly-formed" as a colleague of mine refers to this type
of stuff :-). "Clumbersome" is an excellent description. It
looks as if it was designed by someone who had heard XML
described, but had never seen any before.
It uses this form:
>
> <msgSent>
> <time>time sent</time>
> <origin>me@here.com</origin>
> <r>you@there.com</r>
> <recieved>time recieved</recieved>
> <status>Any error messages, etc</status>
> <r>you2@there.com</r>
> <recieved>time recieved</recieved>
> <status>Any error messages, etc</status>
> (this repeats for each recipient)
> </msgSent>
> (this repeats for each message)
>
> The problem is the <recieved> and <status> tags refer to the
> imediately preceding <r> tag. I would like to generate a list
> from these logs that contains only email addresses that had errors.
The first thing I do with defective designs like this is rationalise
the file so I can work with it. In this case it is simple to
pass it through sgmlnorm (part of James Clark's SP) pretending it
is SGML, so you can force the addition of a new element type to
enclose r, recieved [do they really spell it like that?] and status.
$ sgmlnorm sgml-spec.dec message.sgml >message.xml
where sgml-spec.dec is (in my test) the old DocBook SGML Declaration
with GENERAL YES changed to GENERAL NO, and a trivial DTD:
<!ELEMENT msgSent - - (time,origin,trace+)>
<!ELEMENT trace O O (r,recieved,status?)>
<!ELEMENT (time,origin,r,recieved,status) - - (#PCDATA)>
(assuming status is optional and is only present where there has
been an error). The result is
<msgSent>
<time>time sent</time>
<origin>me@here.com</origin>
<trace>
<r>you@there.com</r>
<recieved>time recieved</recieved>
<status>Any error messages, etc</status>
</trace>
<trace>
<r>you2@there.com</r>
<recieved>time recieved</recieved>
<status>Any error messages, etc</status>
</trace>
</msgSent>
(indents courtesy of xxml.el). Now you can test in XPath for
the presence or absence of "trace/status".
That took about a minute to write and another minute to test.
It makes a lot of assumptions about the non-use of declared or
undeclared entities, other element types and constraints you may
have omitted for brevity, etc. It might in your case be simpler
just to run it through sed or some other editing process to do
the same job. Some people will also consider it overkill: your
call
But it's a fine example of an XML structure designed without
forethought or foreknowledge: thanks for sharing it.
///Peter
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list