This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Support For Automatic Thai Word Breaking In XSL-FO


I have to eventually support the composition of Thai documents. The
primary challenge I see there is doing automatic word breaking of Thai.
As I understand it, the Thai language does not have a well-defined
notion of word and therefore Thai as normally written may not have
enough break points to allow lines to be properly flowed. In my research
into the issue I've found some software (written for TeX) that does
automatic line breaking but I didn't find anything that had been
integrated with any XSLT or XSL-FO processors. As far as I can discover,
MS Word is the main non-TeX tool that provides acceptable Thai word
breaking.

My question: has anybody integrated any Thai word breaking algorithms
into an XSL context?

In looking at the free code that's out there, it looks like it wouldn't
be too hard to extend Saxon, for example, to apply the word breaking
algorithm to text nodes when xml:lang="th". It's not enough to do a
pre-process on the XML document using the existing code because the Thai
characters may be represented as numeric character references or
character entities and the existing code expects some form of Unicode or
Thai code page encoding. Thus, the algorithms would need to be applied
post-parse.

Thanks,

Eliot Kimber
ISOGEN International, LLC

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]