This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Escaped Character Entity's and the MSXML3 parser..... was formerly RE: ampersand output


Since most of this revolves around the specific
use of Microsoft MSXML3 parser vs. xslt in general
....I believe the issue is similar to the one
I am currently struggling with...(some on here
have yet to comprehend my dilemma and the reply's
indicate that fact)

So "regardless" of what the XSLT spec currently
states....the real world is having a difficult
time in this one particular area using XML
...so before some readers say this scenario has been hashed
before....a little time to make the case is requested...


A common multi-application scenario would be that
app#1 provides a web interface that accumulates
data from a client so that they can customize
the visual aspects of a web page for a web application
...i.e. app#1 (using a web interface) ask's the user
to check off backgroundcolor's, font sizes, font types,
font colors, hyperlink address's and image href's...etc
and saves that all that off in an xml structure...

<CustomOptionsForClientsWebPage>
		<tabledatacell number="1">
			<anImage footnote="Painted by Bill &amp; Jane Painter"

href="http://someserver/images/MSASPApp.asp&amp;ImageName=bluegoose";>bluegoo
se.jpg</anImage>
			<texttag>This is a blue goose</textag>
			<fontcolor>#123456</fontcolor>
			<fontname>verdana></fontname>
		</tabledatacell>
	</page>
	.
	.
	.
</CustomOptionsForClientsWebPage>

App#1 would also provide a preview mode for the client
so the might look like at demo page before saving
this off for later user in production...

So the above structure is assembled either on the
client and posted as one long xml structure or
is assembled on a server ...in either case....the
xml structure is created and stored in a database
field....(as a binary type for those that wonder just how
it is done).


Now when preview is called upon, the preview mode
call's upon App#2 to assemble the structure into
a preview HTML page string ....the reason for App#2 to
do this and not App#1 is that preview would be
the same method for building a final page for production
as it would be for preview ...all that is required
is that right before App#2 response.writes  to the
client in production mode, the string is routed
to a new opennewwindow function on the client who
is running App#1 (to allow them to preview their choices)
and they see the production page in a seconde browser
window as a preview....

App#2 reads from the database field the xml structure
and then loads that xml structure up in the MSXML3
parser and then creates an XSLT page (or HTML page)
from many string bits with html tag attributes....

And finally here is the problem with all this....


When the program (App#2) assembles the XSLT or HTML
string structure...
1...it first reads the xml structure from the database
2...it then loads the xml structure into MSXML3 parser (instance A)
3...it then navigates the xml structure via MSXML3 to the
   property it needs to read
4...it then places value from the xml structure into
   a HTML element attribute ...
5...and then will load that final structure up
   into another instance of MSXML3 parser (instance B)
   and transform it into HTML and response.write that to the client...



seems fine doesn't it ....and does work very well until
you run into the following snag....



If you need to set some html property in 4 that has
a character entity in it, then when you read it from parser
instance A into the html element or attribute property,
it is transformed by instance A and no longer is escaped...
...so that when parser instance B attempts to load it...
...parser instance B fails on the load ...

e.g. the footnote attribute in the anImage element in
the example above...
parser A will load the following fine...
	footnote="Painted by Bill &amp; Jane Painter"
parser B will not load the following
footnote="Painted by Bill & Jane Painter"

Now some on here have said ...double escape it ...meaning
that you would escape twice so that when parser instance
A loaded it, then it would still be escaped so that
when parser instance B loaded it, then it would load ...

e.g.
footnote="Painted by Bill &amp;amp; Jane Painter"

which would appear logical, except when you realize that
the number of times the xml structure is loaded and
read in App#1 would not be the same number as
App#2, and therefore, double escaping would be out of
sync between the 2 applications...

So the builder of the applications is reduced to not
being able to find a method by which they can reliably
depend upon the MSXML3 parser to reliably provide
a string value that has a character entity in it..

the simple rule provided by Microsoft is ...
if a xml element or attribute has a character entity
in it   e.g.

&
<
>
'

you can escape it once so that it will load successfully
..but if you try to read it out and then put it into
another structure that will be subsequently loaded by
any other instance of the MSXML3 parser, it of course
will fail unless you do 1 of 5 things...

1.)
escape the character entity x number of times in the original structure
and then keep some counter somewhere that tells you
how many times you have loaded it and read it out
...and know how many total times you will do this
before you do some final thing with it...

2.)
you write a nice little custom function that add's
and removes unescaped character entitys after each
read from a parser and before each load ....
(oh hey nice boat anchor you wrapped on that
MSXML3 speed boat there buddy...)

3.)
use custom character entitys instead of what is provided for by the xml
community...

so instead of using   "&amp;" for an "&", you would use
"JoeAmp&amp;JoeAmp", and then you would always have to send
all the reads and writes from the parser to the function that
add's or removes you little custom escape  (like that would make
a nice improvment on scalability wouldn't it)

4.)
load it only once in a single instance of the parser
and develop only monolithic web applications

5.)
don't use character entitys

6.)
don't use MSXML3 nor XML nor XSLT...


Well before I go let me just add again...

that I did find that you could write to the text
property of the MSXML3 parser after the intial load
and once you wrote an escaped character entity to
the text property it stayed escaped.....which was
a bit odd since the text property automagically
escaped the value when loading the original XML
structure....

The thing that is interesting is if the character
entity will stay escaped after you load it and then
subsequently immediately re-write the original value
to it ....what is going on inside the MSXML3 parser
to not cause the text value to transform after writing
to the text value ....is there some secret switch...

...in a state of wonder...

Thanks for the time for those who read this all the
way through to understand some of things real world
coders are dealing with in the production mode...

Sincerely
JDGarrett
(p.s. forgive me of any typo's)






|-----Original Message-----
|From: owner-xsl-list@lists.mulberrytech.com
|[mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of Julian
|Reschke
|Sent: Thursday, October 18, 2001 1:31 PM
|To: xsl-list@lists.mulberrytech.com
|Subject: RE: [xsl] ampersand output
|
|
|Look at the FAQ again :-)
|
|Ampersands in the HTML src attribute MUST be escaped as "&amp;" (otherwise
|it's invalid HTML).
|
|
|> -----Original Message-----
|> From: owner-xsl-list@lists.mulberrytech.com
|> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of Eric Vitiello
|> Sent: Thursday, October 18, 2001 8:07 PM
|> To: xsl-list
|> Subject: [xsl] ampersand output
|>
|>
|> I've been following the &nbsp; thread, and looked up my question
|> in the FAQ, but I've been unable to find an answer.
|>
|> I've also seen some messages with examples of exactly what I'm
|> trying to do, but they aren't working...
|>
|> I'm trying to output the following stylesheet:
|>
|> <?xml version="1.0"?>
|> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
|> version="1.0">
|> <xsl:output method="html"/>
|>
|>   <xsl:template match="/family-tree">
|>     <html>
|>       <body>
|>       	<embed
|> src="/default.asp?{'person=p1'}{'&amp;tree='}{@surname}"
|> width="600" height="300" type="image/svg+xml"/>
|>       </body>
|>     </html>
|>   </xsl:template>
|>
|>
|> the problem is the SRC tag.  instead of outputting a & it is
|> outputting &amp;  so the output looks like:
|>
|> <html>
|> 	<body>
|> 		<embed
|> src="/default.asp?person=p1&amp;tree=vitiello" width="600"
|> height="300" type="image/svg+xml"/>
|> 	</body>
|> </html>
|>
|> I have also tried &#38;  but it also outputs &amp;
|>
|> I am using MSXML 3.0.
|>
|> any ideas?
|>
|> Eric Vitiello
|> perceive designs
|> <www.perceive.net>
|>
|>
|>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
|>
|
|
| XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
|



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]