This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: iconv_open behaviour on EILSEQ



On Sat, 4 May 2002, Andreas Schwab wrote:

> Stefan Hoffmeister <bug.glibc-gnu.org@econos.de> writes:

> |> Empirically, I can see that *outbuf and *outbytesleft have been modified
> |> to reflect the successful conversions up to the point where the
> |> character triggering EILSEQ is located.

> |> SUSv2 is completely silent about the state of anything in the presence
> |> of EILSEQ; same problem in the last publicly accessible draft of SUSv3.
>
> POSIX.1-2001 says:
>
>     If a sequence of input bytes does not form a valid character in the
>     specified codeset, conversion shall stop after the previous
>     successfully converted character.  [...] The variable pointed to by
>     outbuf shall be updated to point to the byte following the last byte
>     of converted output data.  The value pointed to by outbytesleft shall
>     be decremented to reflect the number of bytes still available in the
>     output buffer.  [...]

> This is pretty unambiguous, IMHO.  Even in presence of errors the argument
> pointers must be updated.

  Yes, it seems pretty clear.  Now I have another question about
iconv()'s behavior. What is it supposed to do when it encounters a *valid*
byte sequence in the specified source codeset which cannot be converted
to the specified target codeset. For instance, what would happen if I
try to convert a UTF-8 string to one of legacy encodings with iconv()
and the UTF-8 string happens to have characters not covered by the
repertoire of the target encoding/codeset.

  To borrow Stefan's expression :-), I found empirically that
iconv() returns (size_t) -1 and errno is set to EILSEQ in that case.
I also found that inbyteleft, outbyteleft, *in and *out are updated to
reflect that the conversion stopped when it came across a valid (in the
source codeset) but unconvertible (to the target codeset) byte sequence.
Is this documented in POSIX.1-2001?

    Thank you in advance for any illuminating reply,

    Jungshik Shin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]