This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Bug in wide character fseek/ftell


I'm trying to track down what I believe is a problem in the wide stream seek/tell support. This is reproducible with the head of the trunk.

Compile & run the attached test and run against the trunk libc.so (which *must* be able to find its localearchive). You'll get results like:

before=4
fseek=0
after=4

Expected output is

before=4
fseek=4
after=8




If you look at the source you can see that we write a few wide chars, call and print the ftell result (before=4). We then seek to the end of the file, then print the result of a second fell (the result of which is wrong(fseek=0), should be fseek=4)). We then write a few more chars and print the result of a final ftell (after=4, should be after=8)


Walking through the explicit fseek call, everything looks fine to me at the end of the call fseek (fp, 0, SEEK_END):

$10 = {_flags = -72539008, _IO_read_ptr = 0x7ffff7ff6004 "",
_IO_read_end = 0x7ffff7ff6004 "", _IO_read_base = 0x7ffff7ff6000 "abc\n",
_IO_write_base = 0x7ffff7ff6000 "abc\n",
_IO_write_ptr = 0x7ffff7ff6000 "abc\n",
_IO_write_end = 0x7ffff7ff6000 "abc\n",
_IO_buf_base = 0x7ffff7ff6000 "abc\n", _IO_buf_end = 0x7ffff7ff7000 "\017",
_IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0,
_markers = 0x0, _chain = 0x7ffff7dd6060, _fileno = 7, _flags2 = 0,
_old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000', _shortbuf = "",
_lock = 0x6020c0, _offset = 4, __pad1 = 0x602138, __pad2 = 0x6020d0,
__pad3 = 0x0, __pad4 = 0x0, __pad5 = 0, _mode = 1,
_unused2 = '\000' <repeats 19 times>}


The read_ptr/read_end values are 4 bytes into the buffer. _offset is 4, all is good AFAICT.

The second ftell internally calls fseek (fp, 0, SEEK_SET) which ought to result in no change to the state of the file pointer.



If we look at the the _IO_seek_cur case in _IO_wfile_seekoff we see:

   case _IO_seek_cur:
      /* Adjust for read-ahead (bytes is buffer).  To do this we must
         find out which position in the external buffer corresponds to
         the current position in the internal buffer.  */
      cv = fp->_codecvt;
      clen = (*cv->__codecvt_do_encoding) (cv);

      if (clen > 0)
        {
          offset -= (fp->_wide_data->_IO_read_end
                     - fp->_wide_data->_IO_read_ptr) * clen;
          /* Adjust by readahead in external buffer.  */
          offset -= fp->_IO_read_end - fp->_IO_read_ptr;
        }
      else
        {
          int nread;

delta = fp->_wide_data->_IO_read_ptr - fp->_wide_data->_IO_read_base;
fp->_wide_data->_IO_state = fp->_wide_data->_IO_last_state;
nread = (*cv->__codecvt_do_length) (cv, &fp->_wide_data->_IO_state,
fp->_IO_read_base,
fp->_IO_read_end, delta);
fp->_IO_read_ptr = fp->_IO_read_base + nread;
fp->_wide_data->_IO_read_end = fp->_wide_data->_IO_read_ptr;
offset -= fp->_IO_read_end - fp->_IO_read_base - nread;
}


      if (fp->_offset == _IO_pos_BAD)
        goto dumb;
      /* Make offset absolute, assuming current pointer is file_ptr(). */
      offset += fp->_offset;

      dir = _IO_seek_set;
      break;

Basically this code is trying to compute the current offset and convert the SEEK_END into a SEEK_SET.

For this case, we're using UTF8 which is a variable length encoding, so codecvt_do_encoding returns 0, causing us to get into the "else" case.

The state of the file pointer when we enter the else clause is:
17 = {_flags = -72539008, _IO_read_ptr = 0x7ffff7ff6004 "",
_IO_read_end = 0x7ffff7ff6004 "", _IO_read_base = 0x7ffff7ff6000 "abc\n",
_IO_write_base = 0x7ffff7ff6000 "abc\n",
_IO_write_ptr = 0x7ffff7ff6000 "abc\n",
_IO_write_end = 0x7ffff7ff6000 "abc\n",
_IO_buf_base = 0x7ffff7ff6000 "abc\n", _IO_buf_end = 0x7ffff7ff7000 "\017",
_IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0,
_markers = 0x0, _chain = 0x7ffff7dd6060, _fileno = 7, _flags2 = 0,
_old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000', _shortbuf = "",
_lock = 0x6020c0, _offset = 4, _codecvt = 0x602138, _wide_data = 0x6020d0,
_freeres_list = 0x0, _freeres_buf = 0x0, _freeres_size = 0, _mode = 1,
_unused2 = '\000' <repeats 19 times>}
(gdb) p *fp->_wide_data
$18 = {_IO_read_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_read_end = 0x7ffff7ff2000 L"abc\n",
_IO_read_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_write_end = 0x7ffff7ff2000 L"abc\n",
_IO_buf_base = 0x7ffff7ff2000 L"abc\n",
_IO_buf_end = 0x7ffff7ff6000 L"\xa636261", _IO_save_base = 0x0,
_IO_backup_base = 0x0, _IO_save_end = 0x0, _IO_state = {__count = 0,
__value = {__wch = 0, __wchb = "\000\000\000"}}, _IO_last_state = {
[ ... ]


delta will be zero, which in turn cause codecvt_do_length to return zero and nread to have the value zero.

The next statement munges fp->_IO_read_ptr. That seems wrong since we're doing a SEEK_SET to our current location, I'd think that wouldn't change the state of any of these fields. The value changes from 0x7ffff7ff6004 to 0x7ffff7ff6000.

Then we munge fp->_wide_data->->_IO_read_end. Again, not sure what this is supposed to accomplish.

So at this line:

649 offset -= fp->_IO_read_end - fp->_IO_read_base - nread;

we have the following state in FP:


$19 = {_flags = -72539008, _IO_read_ptr = 0x7ffff7ff6000 "abc\n",
_IO_read_end = 0x7ffff7ff6004 "", _IO_read_base = 0x7ffff7ff6000 "abc\n",
_IO_write_base = 0x7ffff7ff6000 "abc\n",
_IO_write_ptr = 0x7ffff7ff6000 "abc\n",
_IO_write_end = 0x7ffff7ff6000 "abc\n",
_IO_buf_base = 0x7ffff7ff6000 "abc\n", _IO_buf_end = 0x7ffff7ff7000 "\017",
_IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0,
_markers = 0x0, _chain = 0x7ffff7dd6060, _fileno = 7, _flags2 = 0,
_old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000', _shortbuf = "",
_lock = 0x6020c0, _offset = 4, _codecvt = 0x602138, _wide_data = 0x6020d0,
_freeres_list = 0x0, _freeres_buf = 0x0, _freeres_size = 0, _mode = 1,
_unused2 = '\000' <repeats 19 times>}
(gdb) p *fp->_wide_data
$20 = {_IO_read_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_read_end = 0x7ffff7ff2000 L"abc\n",
_IO_read_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_write_end = 0x7ffff7ff2000 L"abc\n",
_IO_buf_base = 0x7ffff7ff2000 L"abc\n",
_IO_buf_end = 0x7ffff7ff6000 L"\xa636261", _IO_save_base = 0x0,
_IO_backup_base = 0x0, _IO_save_end = 0x0, _IO_state = {__count = 0,
[ ... ]


Remember that nread is zero. So this will subtract 4 from offset causing it to have the value -4.

We then in fp->_offset which should give us an absolute offset... Which is 0. BZZZT.


I get that this code is turning a SEEK_END with offset 0 to a SEEK_CUR with an absolute offset. The computation of offset in that ELSE conditional seems wrong, as does changing the various pointers within the file structure.
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>


int main(int argc, char** argv)
{
        FILE *fp;

        setlocale(LC_ALL, "en_US.utf8");
        fp = fopen("output.txt", "w+");
        if (fp == NULL) {
                perror("fopen");
                exit(1);
        }
        if (fputws(L"abc\n", fp) == -1) {
                perror("fputws");
                exit(1);
        }
        printf("before=%d\n", ftell(fp));
        if (fseek(fp, 0L, SEEK_END) == -1) {
                perror("fseek");
                exit(1);
        }
        printf("fseek=%d\n", ftell(fp));
        if (fputws(L"xyz\n", fp) == -1) {
                perror("fputws");
                exit(1);
        }
        printf("after=%d\n", ftell(fp));
        fclose(fp);
}



Note if you try this with the head of the truck, make sure it's able to find your localearchive file. Else the code won't know you're dealing with a charset with variable length encodings -- this causes glibc to take a slightly different (and simpler path) which doesn't exhibit this problem.

Any help would be greatly appreciated.

Thanks,
Jeff




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]