This is the mail archive of the xsl-list@mulberrytech.com mailing list .
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Because HTML has a close enough family resemblance to XML that the presumption is that any string '& a m p ;' in your input (spaces put there to sanitize for obnoxious mailers), you want to *see* an & character displayed in your HTML browser, which requires that it be *represented* as '& a m p ;' in your (conformant) HTML source code (i.e. the serialized output of your transform).Why doesn't this XML content: • produce this output: • after parsing/xslt in my xhtml document???
It's bloody nigh impossible to get my XML parser (Xalan-Java) NOT to recognize entities except for this one case where recognizing it would solve all my problems.Nope, it's recognizing this one too, it's just properly turning it *back* into an entity when you are serializing the file.
The xsl list FAQ under "Entities" item 13 "Passing Entities through a Transform" says that all entities are resolved before the transform and implies the only way to get around this is with a perlscript to strip entities of their ampersands. This cannot be the whole truth because:The whole idea of changing the & into &#amp; is to stop it from being a reference (no you're right Xalan won't resolve it), thereby allowing the transmission of the string unchanged, so it can be twiddled back into the entity reference. If it had been a reference going in, it would have disappeared, leaving behind ... the character it had referred to. (Lots of the time this is actually fine.)
a) xalan won't resolve &#amp; in the above example
and b) everyone trying to produce html for posting would be screwed by having XML docs with proper unicode references--nobody could set set stuff up so cruelly (right?)Well, actually they had no choice, it was either be cruel to be kind, or magically uninstall all the browsers ever deployed in the bad old days of HTML, when browsers cared less about "standards" than about conquering the universe. (Come to think of it, that would have been nice, I wonder why they didn't.)
c) In XSLT quickly, there's an example of how to define entities in the xsl stylesheet using <xsltext> to avoid this (p.90-91)--only you can't use this technique on a numbered entity because evidently that's not valid xml so they don't exist, even though they're all over the place.Who says it's not valid XML? You can refer with a numbered character reference (entity) to any character allowed in XML.
I know this is an old subject; but after hours of investigating, I still don't get it. I need to know why the above example doesn't produce the right numbered entity reference, and what other ways there are to preserve entities through a transformYou can't. An entity reference cannot be preserved, period. The whole idea is that a parser will resolve the reference, turning it into the thing you said it was supposed to be.
, and possibly how unicode/numbered entities are defined and can be redefined. There just has to be a way to do this within xslt. I'm sorry that I still don't get this--please help anyway, somebody.*If* you are writing your output to a file -- and always will be -- you can use a feature supported in some XSL processors that starts with a 'd' and has three words, two of which are "output" and "escaping" (I forget the third). But this is *not as honest* a solution as the paper-bag workaround. At least then you are aware of what you are doing.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |