This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: representing charsets


Corinna Vinschen:
> Andy Koppe:
>> 3) Represent charsets as enum constants (or #defines) rather than
>> strings throughout, with the singlebyte charsets ordered in such a way
>> that they correspond to their order in the conversion tables, along
>> these lines:
>>
>> enum {
>> Â CS_UTF8 = 0,
>>
>> Â /* ISO singlebyte codepages */
>> Â CS_ISO8859_1 = 1,
>> Â CS_ISO8859_2 = 2,
>> Â ...
>> Â CS_ISO8859_11 = 11,
>> Â /* ISO-8859-12 doesn't exist */
>> Â CS_ISO8859_13 = 12,
>> Â ...
>> Â CS_ISO8859_16 = 15,
>>
>> Â /* Windows singlebyte codepages */
>> Â CS_CP437 = 100,
>> Â CS_CP720 = 101,
>> Â CS_CP737 = 102,
>> Â ...
>>
>> Â /* Multibyte codepages */
>> Â CS_SJIS = 200,
>> Â CS_GBK = 201,
>> Â ...
>> }
>
> But what is that good for? ÂWhich advantage do you have?

- No need to pass around both charset name and the charset table index.
- The __cp_index and __iso8859_index functions can be junked.
__cp_mbtowc/wctomb obtain the index with (cs_id - CS_CP437). Similar
for ISO.
- Only one list of valid codepages (since the one in __cp_index can go).
- Get rid of the hack where the likes of KOI8-R or PT154 are
internally represented as "CPxxx" names, some of which don't actually
correspond to Windows codepages.
- All those strcpy() calls in setlocale become simple assignments,
e.g. charset_id = CS_EUCJP instead of strcpy(charset, "EUCJP"). Not
relevant performance-wise, but in terms of space (for embedded
targets).
- Similarly, charset comparisons become simple integer comparisons
instead of strcmps.


> If you
> only keep the number, where do you get the charset name from?

A new function, e.g. 'void __get_charset_name(int cs_id, char *buf)',
where a buffer of size ENCODING_LEN+1 needs to be passed in.
nl_langinfo(CODESET) would simply call that  instead of doing its own
strcmp-heavy parsing of internal names to turn them back into official
names.


> Btw., while I was writing the above, it occured to me that we
> don't really need the index into the iso or cp array. ÂWhat we
> really need is a pointer to the array member, which can be used
> immediately.

Good idea, although it won't make much of a difference, because array
indexing is cheap, basically just a shift and an add.

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]