This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

That unwanted white space in HTML output


Warren Hedley wrote:
> The whitespace between <a> and <img> elements is a fairly 
> common problem [...] can anyone suggest any other element
> types where this behaviour might be necessary?

Yes, all "inline" elements. These are enumerated in the HTML 4 DTDs as the
following:

(strict)
TT | I | B | BIG | SMALL | EM | STRONG | DFN | CODE | SAMP | KBD | VAR |
CITE | ABBR | ACRONYM | A | IMG | OBJECT | BR | SCRIPT | MAP | Q | SUB | SUP
| SPAN | BDO | INPUT | SELECT | TEXTAREA | LABEL | BUTTON

(transitional)
TT | I | B | U | S | STRIKE | BIG | SMALL | EM | STRONG | DFN | CODE | SAMP
| KBD | VAR | CITE | ABBR | ACRONYM | A | IMG | APPLET | OBJECT | FONT |
BASEFONT | BR | SCRIPT | MAP | Q | SUB | SUP | SPAN | BDO | IFRAME | INPUT |
SELECT | TEXTAREA | LABEL | BUTTON

I believe a clause should be included in a future version of the XSLT spec:
"When emitting a result tree as HTML, whitespace should never be added
inside inline elements."

Example:

What would normally be emitted as unindented XML like this:
<p><a href="foo"><img src="bar"/></a><br/>some text</p>

...could be emitted as indented HTML like this:
<p>
<a href="foo"><img src="bar"/></a><br/>some text
</p>


The reason why this rule is needed is because if whitespace is added, it and
any adjacent whitespace is interpreted as a single "word separator" relative
to adjacent text. The browser is supposed to render this separator in a
manner apporpriate to the language script being used, which isn't something
that is always predictable. In the Latin-based languages, the word separator
is a breaking space.

In the case of inline images, applets and objects, you end up with the
image, applet or object being equivalent to some text, with the bottom edge
aligned along the baseline of adjacent text, as per the spec. This is
normally desirable behavior, but can be problematic if you are trying to
stack images on top of each other. The space allotted for descending
characters and the space between the bottom edge of descenders and the top
edge of the next row of text is often undesirable.

I made an example of this at http://www.skew.org/xml/misc_demos/whitespace/
and reported it to James Clark as an argument for changing the behavior of
XT's HTMLOutputHandler. He gave me a simple "thanks" for the info, but the
problem has yet to be resolved.

In the mean time, I've modified HTMLOutputHandler.java with an ugly
workaround, removing 'br' from the list of blockElements (which seems to be
an error anyway). This of course doesn't resolve every situation, but was
enough for my purposes, for now.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]