This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Character entities
- To: "'xsl-list at mulberrytech dot com'" <xsl-list at mulberrytech dot com>
- Subject: RE: Character entities
- From: Mike Brown <mbrown at corp dot webb dot net>
- Date: Mon, 14 Feb 2000 11:23:03 -0700
- Reply-To: xsl-list at mulberrytech dot com
> Does HTML "know" UTF-8?
Like XML, HTML 4.0 is mostly defined in terms of UCS/Unicode characters,
which of course must be encoded. There is a mechanism for a document to
signal its own character encoding via a META declaration. This could be
overridden by a charset parameter in an HTTP Content-Type header.
Since HTML doesn't prescribe UTF-8 as a default and because the META
declaration can appear pretty far down in the document HEAD, the
recommendation states that only ASCII (U+0000 through U+007F) characters
should be used in the document up to that point.
This stuff is discussed at
http://www.w3.org/TR/1999/REC-html401-19991224/charset.html#spec-char-encodi
ng
It is worth pointing out that the value of the recommendation is only as
good as the user agents' support for it. The 4.0 browsers seem to do okay
with automatically selecting the proper encoding when interpreting a
document, but you may have noticed that they also let the user manually
choose it even if the document signaled its own encoding.
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list