This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: National Language Collating Sequences and Index Generation


Hi Eliot,

> I have to generate back-of-the-book indexes for many national
> languages, including Arabic, Hebrew, Thai, Simplified Chinese,
> Traditional Chinese, Korean, and Japanese. I've successfully adapted
> the Docbook index generation code to produce the basic index, but
> now I'm faced with the challenge of both doing correct sorting for
> these languages and generating the appropriate index groups.

I'm assuming that you've tried using the lang attribute on xsl:sort to
do this and it doesn't fulfil your requirements (perhaps because it
isn't implemented for the languages you're using in your processor)?

> For the index groups I'm assuming I'll have to have per-language
> code that establishes the relevant group characters and their
> ordering.
>
> I looked in the FAQ but didn't find much about this issue. Can
> anyone provide any pointers to helpful information, either in books
> or in the archives of this forum?

I think that Dimitre's generic templates for sorting could be really
useful in this situation. With them, you could create a set of
specialised templates, each dealing with sorting in a particular
language, by comparing two strings. Unfortunately, I can't locate a
copy of it, but I'm sure Dimitre will give one to you if you ask him,
or you might be able to get something out of:

http://www.vbxml.com/snippetcentral/main.asp?view=viewsnippet&lang=&id=v20010310050532
or
http://www.dpawson.co.uk/xsl/sect2/generic.html
or
http://vbxml.com/xsl/articles/fp

The code that you would have to write would be the template comparing
the two strings. Joerg's given you some ideas about storing the
sequence as a set of nodes containing the characters; you could also
store them as a string. Perhaps something like:

<xsl:template name="compareStrings">
  <xsl:param name="string1" />
  <xsl:param name="string2" />
  <xsl:param name="chars" select="'abcdefghijklmnopqrstuvwxyz'" />
  <xsl:choose>
    <xsl:when test="$string1 = '' and $string2 = ''">0</xsl:when>
    <xsl:when test="$string2 = ''">1</xsl:when>
    <xsl:when test="$string1 = ''">-1</xsl:when>
    <xsl:otherwise>
      <xsl:variable name="char1">
        select="string-length(
                  substring-before($chars, substring($string1, 1, 1)))" />
      <xsl:variable name="first2"
        select="string-length(
                  substring-before($chars, substring($string2, 1, 1)))" />
      <xsl:choose>
        <xsl:when test="$char1 > $char2">1</xsl:when>
        <xsl:when test="$char2 > $char1">-1</xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="compareStrings">
            <xsl:with-param name="string1"
                            select="substring($string1, 2)" />
            <xsl:with-param name="string2"
                            select="substring($string2, 2)" />
            <xsl:with-param name="chars" select="$chars" />
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

though obviously expanded to handle case, and to handle characters
that are missing.

I must say, though, that in the general case you're probably better
off farming the body of this template out to an extension function
written in a language that knows how to compare strings written in
different languages. You can still use the generic templates to manage
the sort, but use the extension function to handle the actual
comparison of the strings.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]