This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Why does iconv signal EILSEQ whith legal sequences (deviation from standard?)


Hi,

> 3. ISO C Amendment 1 (MSE)
> http://www.unix.org/version2/whatsnew/login_mse.html
> EILSEQ 
> 
> A invalid wide-character encoding, or a sequence of bytes which do not
> form a valid multibyte character, was encountered.
>
> 4. The Open Group Base Specifications Issue 6
> IEEE Std 1003.1, 2004 Edition
> http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_03.html

You cannot take the _general_ description of an errno value literally for
every function that may set it. For example, ENOMEM returned by mincore()
means something totally different than ENOMEM returned by malloc().

> The Single UNIX Â Specification, Version 2
> Copyright  1997 The Open Group

This is outdated. Look in SUSV3, also known as POSIX:2001, also known as
"The Open Group Base Specifications Issue 6".

> If iconv() encounters a character in the input buffer that is valid, but
> for which an identical character does not exist in the target codeset,
> iconv() performs an implementation-dependent conversion on this
> character.

The GNU implementations of iconv() - those in glibc and in libiconv - prefer to
return in this case, like in the case of invalid input, with errno = EILSEQ.
This gives the program that calls iconv() the opportunity to provide an
arbitrary error handling or replacement character sequence. For example, the
iconv program from GNU libiconv 1.11 will support these options:

Options controlling conversion problems:
  -c                          discard unconvertible characters
  --unicode-subst=FORMATSTRING
                              substitution for unconvertible Unicode characters
  --byte-subst=FORMATSTRING   substitution for unconvertible bytes
  --widechar-subst=FORMATSTRING
                              substitution for unconvertible wide characters

It is impossible for user-written programs to support similar options in an
efficient way if the iconv() function behaves as specified in POSIX.

Bruno


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]