This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: parsing and translating xml:lang attribute


Matthias O. Will wrote:
> > <Language xml:lang="ge"/>

"de", right? :)

> but Xalan complains while parsing and produces the following error message:
> 
> > Parser error: Attribute "xml:lang" is required and must be specified for
> > element type "Language"

Is Xerces your parser? There doesn't seem to be anything wrong ... are you
sure it's complaining about that specific instance of <Language>? (check
the line number)

> The second issue is that the values for this attribute are conforming to
> the two-digit language abbreviations according to ISO 639, but my target
> DTD uses three-digit language strings according to ISO 639-2 (e. g. 'de'
> would be translated into 'ger'). I do have a list of both, but I wonder how
> to technically best achieve the mapping using XSL.

xml:lang values must be RFC 1766 'language tags' ('tag' being a most
unfortunate choice of word in an XML context... I prefer 'identifier').
RFC 1766 mandates, essentially, that if the identifier is just 2
characters, or if the 3rd character is '-' then the first 2 characters
must be an ISO 639:1988 2-letter language code. The author recently
clarified that the intent was to refer to ISO 639:1988 and its successors,
so you should be using the most up-to-date list of 2-letter language
ccodes from ISO 639-1. RFC 1766 does not allow 3-letter codes at all. It
was a little short-sighted in this regard and is being revised to address
this issue (and the fact that ISO 639-2 codes are far more complete!)

...so if you are intending to put 3-letter codes in an xml:lang value in
the target document, then you're wrong to do so :)

Anyway, to answer your question:

<?xml version="1.0" encoding="utf-8"?>
<!-- langCodeMap.xml -->
<langCodeMap>
  <langCode iso639-1="de" iso639-2="ger"/>
  <langCode iso639-1="en" iso639-2="eng"/>
  ...
</langCodeMap>

and in the XSLT...

<xsl:variable name="langCodes" select="document('langCodeMap.xml')/langCodeMap/langCode"/>
<xsl:variable name="langIn" select="Language/@xml:lang"/>
<LanguageOut xml:lang="{$langCodes[@iso639-1 = $langIn]/@iso639-2}"/>

There are of course various ways to do it.. this is just one.

I question the use of xml:lang on an element called 'Language' though.
xml:lang identifies a language that the element content is in; it isn't
supposed to be a substitute for the content itself.

For example,

<Language xml:lang="en">German</Language>
<Language xml:lang="de">Deutsch</Language>


   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at         My XML/XSL resources:
webb.net in Denver, Colorado, USA           http://www.skew.org/xml/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]