This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Passing through character entities intact

To: xsl-list at lists dot mulberrytech dot com
Subject: Re: [xsl] Passing through character entities intact
From: Mike Brown <mike at skew dot org>
Date: Sat, 13 Oct 2001 12:38:19 -0600 (MDT)
Reply-To: xsl-list at lists dot mulberrytech dot com

David Carlisle wrote:
> The browser's don't normally have any method of specigying the font
> encoding do they? only the character encoding in the file.

Saying a font can support a particular encoding is a bit misleading. There
is no mapping of fonts to 'supported' encodings, because the relationships
between encodings, scripts, and the subsets of Unicode characters that
each font contains glyphs for can get quite complex.

My understanding is that the font data files themselves typically map
glyph rendering instructions to a subset of Unicode values in the Basic
Multilingual Plane, and they also provide an internal listing of scripts
they support (e.g., 'Latin', 'Hebrew', 'Cyrillic', etc.).

Browsers can use this information to offer their users the ability to
choose, for each script, which fonts to use from among those that support
that particular script. IE is particularly good at this.

HTML and CSS document authors can specify what fonts to try to use, of
course, but it is an exercise for those document authors to know which
fonts are likely to support which characters & scripts. It is an effort 
that requires distinguishing encoding as an almost entirely separate 
issue.

Also, in HTML/CSS, the fonts are only referenced by name or more general
classifications like 'serif' 'sans-serif' and 'monospace', so one can't
predict whether the user agent will have a particular version of a font
available. For example, MS occasionally updates the fonts it ships with
its software in order to provide greater coverage. There is no provision
in HTML/CSS for testing this coverage or inferring it from version
numbers.

...

Why this thread has gone on this long, I do not know. The original poster
stated that he knew what the issues were and he knew it wasn't wrong to be
getting character references from Xalan. He asked the question in terms of
"passing through" entity references from source to output, but he was
really just asking how to get &mdash; in his output, for Netscrape
compatibility.

The simple answer (well, I suppose simplicity is relative) should have
been that XSLT processors are under no obligation to use the SGML/HTML
character entity refs when outputting HTML, because all HTML user agents
are required to support numeric character references; that Netscape 4
doesn't is Netscape's problem; the XSLT spec isn't going to force XSLT
processors to provide support for entity output when it isn't truly
necessary.

That said, there is nothing stopping an XSLT processor from allowing an
external (i.e., not controlled from the stylesheet) configuration option
that tells it whether to use entity references or numeric character
references. He should check the documentation for the processor that he is
using. For example, this paragraph in the Xalan-J README:

"For HTML output, Xalan-Java 2 outputs character entity references (&copy; 
etc.) for the special characters designated in Appendix A. DTDs of the 
XHTML 1.0: The Extensible HyperText Markup Language. Xalan-Java 1.x, on 
the other hand, outputs literal characters for some of these special 
characters"

   - Mike
____________________________________________________________________________
  mike j. brown, fourthought.com  |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  personal: http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

References:
- Re: Passing through character entities intact
  - From: David Carlisle

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]