This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Bug in wide character fseek/ftell
- From: Jeff Law <law at redhat dot com>
- To: libc-alpha <libc-alpha at sourceware dot org>
- Date: Tue, 10 Jul 2012 11:29:49 -0600
- Subject: Bug in wide character fseek/ftell
I'm trying to track down what I believe is a problem in the wide stream
seek/tell support. This is reproducible with the head of the trunk.
Compile & run the attached test and run against the trunk libc.so (which
*must* be able to find its localearchive). You'll get results like:
before=4
fseek=0
after=4
Expected output is
before=4
fseek=4
after=8
If you look at the source you can see that we write a few wide chars,
call and print the ftell result (before=4). We then seek to the end of
the file, then print the result of a second fell (the result of which is
wrong(fseek=0), should be fseek=4)). We then write a few more chars and
print the result of a final ftell (after=4, should be after=8)
Walking through the explicit fseek call, everything looks fine to me at
the end of the call fseek (fp, 0, SEEK_END):
$10 = {_flags = -72539008, _IO_read_ptr = 0x7ffff7ff6004 "",
_IO_read_end = 0x7ffff7ff6004 "", _IO_read_base = 0x7ffff7ff6000
"abc\n",
_IO_write_base = 0x7ffff7ff6000 "abc\n",
_IO_write_ptr = 0x7ffff7ff6000 "abc\n",
_IO_write_end = 0x7ffff7ff6000 "abc\n",
_IO_buf_base = 0x7ffff7ff6000 "abc\n", _IO_buf_end = 0x7ffff7ff7000
"\017",
_IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0,
_markers = 0x0, _chain = 0x7ffff7dd6060, _fileno = 7, _flags2 = 0,
_old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000',
_shortbuf = "",
_lock = 0x6020c0, _offset = 4, __pad1 = 0x602138, __pad2 = 0x6020d0,
__pad3 = 0x0, __pad4 = 0x0, __pad5 = 0, _mode = 1,
_unused2 = '\000' <repeats 19 times>}
The read_ptr/read_end values are 4 bytes into the buffer. _offset is 4,
all is good AFAICT.
The second ftell internally calls fseek (fp, 0, SEEK_SET) which ought to
result in no change to the state of the file pointer.
If we look at the the _IO_seek_cur case in _IO_wfile_seekoff we see:
case _IO_seek_cur:
/* Adjust for read-ahead (bytes is buffer). To do this we must
find out which position in the external buffer corresponds to
the current position in the internal buffer. */
cv = fp->_codecvt;
clen = (*cv->__codecvt_do_encoding) (cv);
if (clen > 0)
{
offset -= (fp->_wide_data->_IO_read_end
- fp->_wide_data->_IO_read_ptr) * clen;
/* Adjust by readahead in external buffer. */
offset -= fp->_IO_read_end - fp->_IO_read_ptr;
}
else
{
int nread;
delta = fp->_wide_data->_IO_read_ptr -
fp->_wide_data->_IO_read_base;
fp->_wide_data->_IO_state = fp->_wide_data->_IO_last_state;
nread = (*cv->__codecvt_do_length) (cv,
&fp->_wide_data->_IO_state,
fp->_IO_read_base,
fp->_IO_read_end, delta);
fp->_IO_read_ptr = fp->_IO_read_base + nread;
fp->_wide_data->_IO_read_end = fp->_wide_data->_IO_read_ptr;
offset -= fp->_IO_read_end - fp->_IO_read_base - nread;
}
if (fp->_offset == _IO_pos_BAD)
goto dumb;
/* Make offset absolute, assuming current pointer is file_ptr(). */
offset += fp->_offset;
dir = _IO_seek_set;
break;
Basically this code is trying to compute the current offset and convert
the SEEK_END into a SEEK_SET.
For this case, we're using UTF8 which is a variable length encoding, so
codecvt_do_encoding returns 0, causing us to get into the "else" case.
The state of the file pointer when we enter the else clause is:
17 = {_flags = -72539008, _IO_read_ptr = 0x7ffff7ff6004 "",
_IO_read_end = 0x7ffff7ff6004 "", _IO_read_base = 0x7ffff7ff6000
"abc\n",
_IO_write_base = 0x7ffff7ff6000 "abc\n",
_IO_write_ptr = 0x7ffff7ff6000 "abc\n",
_IO_write_end = 0x7ffff7ff6000 "abc\n",
_IO_buf_base = 0x7ffff7ff6000 "abc\n", _IO_buf_end = 0x7ffff7ff7000
"\017",
_IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0,
_markers = 0x0, _chain = 0x7ffff7dd6060, _fileno = 7, _flags2 = 0,
_old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000',
_shortbuf = "",
_lock = 0x6020c0, _offset = 4, _codecvt = 0x602138, _wide_data =
0x6020d0,
_freeres_list = 0x0, _freeres_buf = 0x0, _freeres_size = 0, _mode = 1,
_unused2 = '\000' <repeats 19 times>}
(gdb) p *fp->_wide_data
$18 = {_IO_read_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_read_end = 0x7ffff7ff2000 L"abc\n",
_IO_read_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_write_end = 0x7ffff7ff2000 L"abc\n",
_IO_buf_base = 0x7ffff7ff2000 L"abc\n",
_IO_buf_end = 0x7ffff7ff6000 L"\xa636261", _IO_save_base = 0x0,
_IO_backup_base = 0x0, _IO_save_end = 0x0, _IO_state = {__count = 0,
__value = {__wch = 0, __wchb = "\000\000\000"}}, _IO_last_state = {
[ ... ]
delta will be zero, which in turn cause codecvt_do_length to return zero
and nread to have the value zero.
The next statement munges fp->_IO_read_ptr. That seems wrong since
we're doing a SEEK_SET to our current location, I'd think that wouldn't
change the state of any of these fields. The value changes from
0x7ffff7ff6004 to 0x7ffff7ff6000.
Then we munge fp->_wide_data->->_IO_read_end. Again, not sure what this
is supposed to accomplish.
So at this line:
649 offset -= fp->_IO_read_end - fp->_IO_read_base - nread;
we have the following state in FP:
$19 = {_flags = -72539008, _IO_read_ptr = 0x7ffff7ff6000 "abc\n",
_IO_read_end = 0x7ffff7ff6004 "", _IO_read_base = 0x7ffff7ff6000
"abc\n",
_IO_write_base = 0x7ffff7ff6000 "abc\n",
_IO_write_ptr = 0x7ffff7ff6000 "abc\n",
_IO_write_end = 0x7ffff7ff6000 "abc\n",
_IO_buf_base = 0x7ffff7ff6000 "abc\n", _IO_buf_end = 0x7ffff7ff7000
"\017",
_IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0,
_markers = 0x0, _chain = 0x7ffff7dd6060, _fileno = 7, _flags2 = 0,
_old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000',
_shortbuf = "",
_lock = 0x6020c0, _offset = 4, _codecvt = 0x602138, _wide_data =
0x6020d0,
_freeres_list = 0x0, _freeres_buf = 0x0, _freeres_size = 0, _mode = 1,
_unused2 = '\000' <repeats 19 times>}
(gdb) p *fp->_wide_data
$20 = {_IO_read_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_read_end = 0x7ffff7ff2000 L"abc\n",
_IO_read_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_base = 0x7ffff7ff2000 L"abc\n",
_IO_write_ptr = 0x7ffff7ff2000 L"abc\n",
_IO_write_end = 0x7ffff7ff2000 L"abc\n",
_IO_buf_base = 0x7ffff7ff2000 L"abc\n",
_IO_buf_end = 0x7ffff7ff6000 L"\xa636261", _IO_save_base = 0x0,
_IO_backup_base = 0x0, _IO_save_end = 0x0, _IO_state = {__count = 0,
[ ... ]
Remember that nread is zero. So this will subtract 4 from offset
causing it to have the value -4.
We then in fp->_offset which should give us an absolute offset... Which
is 0. BZZZT.
I get that this code is turning a SEEK_END with offset 0 to a SEEK_CUR
with an absolute offset. The computation of offset in that ELSE
conditional seems wrong, as does changing the various pointers within
the file structure.
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
int main(int argc, char** argv)
{
FILE *fp;
setlocale(LC_ALL, "en_US.utf8");
fp = fopen("output.txt", "w+");
if (fp == NULL) {
perror("fopen");
exit(1);
}
if (fputws(L"abc\n", fp) == -1) {
perror("fputws");
exit(1);
}
printf("before=%d\n", ftell(fp));
if (fseek(fp, 0L, SEEK_END) == -1) {
perror("fseek");
exit(1);
}
printf("fseek=%d\n", ftell(fp));
if (fputws(L"xyz\n", fp) == -1) {
perror("fputws");
exit(1);
}
printf("after=%d\n", ftell(fp));
fclose(fp);
}
Note if you try this with the head of the truck, make sure it's able to
find your localearchive file. Else the code won't know you're dealing
with a charset with variable length encodings -- this causes glibc to
take a slightly different (and simpler path) which doesn't exhibit this
problem.
Any help would be greatly appreciated.
Thanks,
Jeff