This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Carlos suggested that I write up a brief description of ftell and its interaction with the offset cache, so here it is: https://sourceware.org/glibc/wiki/File%20offsets%20in%20a%20stdio%20stream%20and%20ftell Siddhesh On Tue, Mar 11, 2014 at 06:16:16PM +0530, Siddhesh Poyarekar wrote: > Here's take 2: > > The ftell implementation was made conservative to ensure that > incorrectly cached offsets never affect it. However, this causes > problems for append mode when a file stream is rewound. Additionally, > the 'clever' trick of using stat to get position for append mode files > caused more problems than it solved and broke some old behavior. I > have described below the various problems that it caused and then > finally the solution. > > For a and a+ mode files, rewinding the stream should result in ftell > returning 0 as the offset, but the stat() trick caused it to > (incorrectly) always return the end of file. Now I couldn't find > anything in POSIX that specifies the stream position after rewind() > for a file opened in 'a' mode, but for 'a+' mode it should be set to > 0. For 'a' mode too, it probably makes sense to keep it set to 0 in > the interest of retaining old behavior. > > The initial file position for append mode files is implementation > defined, so the implementation could either retain the current file > position or move the position to the end of file. The earlier ftell > implementation would move the offset to end of file for append-only > mode, but retain the old offset for a+ mode. It would also cache the > offset (this detail is important). My patch broke this and would set > the initial position to end of file for both append modes, thus > breaking old behavior. I was ignorant enough to write an incorrect > test case for it too. > > The Change: > > I have now brought back the behavior of seeking to end of file for > append-only streams, but with a slight difference. I don't cache the > offset though, since we would want ftell to query the current file > position through lseek while the stream is not active. Since the > offset is moved to the end of file, we can rely on the file position > reported by lseek and we don't need to resort to the stat() nonsense. > > Finally, the cache is always reliable, except when there are unflished > writes in an append mode stream (i.e. both a and a+). In the latter > case, it is safe to just do an lseek to SEEK_END. The value can be > safely cached too, since the file handle is already active at this > point. Incidentally, this is the only state change we affect in the > file handle (apart from taking locks of course). > > I have also updated the test case to correct my impression of the > initial file position for a+ streams to the initial behavior. I have > verified that this does not break any existing tests in the testsuite > and also passes with the new tests. > > Siddhesh > > [BZ #16680] > * libio/fileops.c (_IO_file_open): Seek to end of file but > don't cache the offset. > (get_file_offset): Remove function. > (do_ftell): Use cached offset when available. > * libio/iofdopen.c (_IO_new_fdopen): Seek to end of file but > don't cache the offset. > * libio/tst-ftell-active-handler.c (do_rewind_test): New test > case. > (do_one_test): Call it. > (do_ftell_test): Fix up expected old offset for a+ mode. > * libio/wfileops.c (do_ftell_wide): Used cached offset when > available. > > --- > libio/fileops.c | 102 +++++++++++++---------------------- > libio/iofdopen.c | 9 ++++ > libio/tst-ftell-active-handler.c | 112 +++++++++++++++++++++++++++++++++++++-- > libio/wfileops.c | 47 ++++++++-------- > 4 files changed, 178 insertions(+), 92 deletions(-) > > diff --git a/libio/fileops.c b/libio/fileops.c > index 2e7bc8d..cf68dbf 100644 > --- a/libio/fileops.c > +++ b/libio/fileops.c > @@ -232,13 +232,18 @@ _IO_file_open (fp, filename, posix_mode, prot, read_write, is32not64) > return NULL; > fp->_fileno = fdesc; > _IO_mask_flags (fp, read_write,_IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING); > - if ((read_write & _IO_IS_APPENDING) && (read_write & _IO_NO_READS)) > - if (_IO_SEEKOFF (fp, (_IO_off64_t)0, _IO_seek_end, _IOS_INPUT|_IOS_OUTPUT) > - == _IO_pos_BAD && errno != ESPIPE) > - { > - close_not_cancel (fdesc); > - return NULL; > - } > + /* For append mode, send the file offset to the end of the file. Don't > + update the offset cache though, since the file handle is not active. */ > + if ((read_write & (_IO_IS_APPENDING | _IO_NO_READS)) > + == (_IO_IS_APPENDING | _IO_NO_READS)) > + { > + _IO_off64_t new_pos = _IO_SYSSEEK (fp, 0, _IO_seek_end); > + if (new_pos == _IO_pos_BAD && errno != ESPIPE) > + { > + close_not_cancel (fdesc); > + return NULL; > + } > + } > _IO_link_in ((struct _IO_FILE_plus *) fp); > return fp; > } > @@ -929,43 +934,13 @@ _IO_file_sync_mmap (_IO_FILE *fp) > return 0; > } > > -/* Get the current file offset using a system call. This is the safest method > - to get the current file offset, since we are sure that we get the current > - state of the file. Before the stream handle is activated (by using fread, > - fwrite, etc.), an application may alter the state of the file descriptor > - underlying it by calling read/write/lseek on it. Using a cached offset at > - this point will result in returning the incorrect value. Same is the case > - when one switches from reading in a+ mode to writing, where the buffer has > - not been flushed - the cached offset would reflect the reading position > - while the actual write position would be at the end of the file. > - > - do_ftell and do_ftell_wide may resort to using the cached offset in some > - special cases instead of calling get_file_offset, but those cases should be > - thoroughly described. */ > -_IO_off64_t > -get_file_offset (_IO_FILE *fp) > -{ > - if ((fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING) > - { > - struct stat64 st; > - bool ret = (_IO_SYSSTAT (fp, &st) == 0 && S_ISREG (st.st_mode)); > - if (ret) > - return st.st_size; > - else > - return EOF; > - } > - else > - return _IO_SYSSEEK (fp, 0, _IO_seek_cur); > -} > - > - > -/* ftell{,o} implementation. Don't modify any state of the file pointer while > - we try to get the current state of the stream. */ > +/* ftell{,o} implementation. The only time we modify the state of the stream > + is when we have unflushed writes. In that case we seek to the end and > + record that offset in the stream object. */ > static _IO_off64_t > do_ftell (_IO_FILE *fp) > { > - _IO_off64_t result = 0; > - bool use_cached_offset = false; > + _IO_off64_t result, offset = 0; > > /* No point looking at unflushed data if we haven't allocated buffers > yet. */ > @@ -974,39 +949,37 @@ do_ftell (_IO_FILE *fp) > bool was_writing = (fp->_IO_write_ptr > fp->_IO_write_base > || _IO_in_put_mode (fp)); > > + bool append_mode = (fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING; > + > + /* When we have unflushed writes in append mode, seek to the end of the > + file and record that offset. This is the only time we change the file > + stream state and it is safe since the file handle is active. */ > + if (was_writing && append_mode) > + { > + result = _IO_SYSSEEK (fp, 0, _IO_seek_end); > + if (result == _IO_pos_BAD) > + return EOF; > + else > + fp->_offset = result; > + } > + > /* Adjust for unflushed data. */ > if (!was_writing) > - result -= fp->_IO_read_end - fp->_IO_read_ptr; > + offset -= fp->_IO_read_end - fp->_IO_read_ptr; > else > - result += fp->_IO_write_ptr - fp->_IO_read_end; > - > - /* It is safe to use the cached offset when available if there is > - unbuffered data (indicating that the file handle is active) and the > - handle is not for a file open in a+ mode. The latter condition is > - because there could be a scenario where there is a switch from read > - mode to write mode using an fseek to an arbitrary position. In this > - case, there would be unbuffered data due to be appended to the end of > - the file, but the offset may not necessarily be the end of the > - file. It is fine to use the cached offset when the a+ stream is in > - read mode though, since the offset is maintained correctly in that > - case. Note that this is not a comprehensive set of cases when the > - offset is reliable. The offset may be reliable even in some cases > - where there is no unflushed input and the handle is active, but it's > - just that we don't have a way to identify that condition reliably. */ > - use_cached_offset = (result != 0 && fp->_offset != _IO_pos_BAD > - && ((fp->_flags & (_IO_IS_APPENDING | _IO_NO_READS)) > - == (_IO_IS_APPENDING | _IO_NO_READS) > - && was_writing)); > + offset += fp->_IO_write_ptr - fp->_IO_read_end; > } > > - if (use_cached_offset) > - result += fp->_offset; > + if (fp->_offset != _IO_pos_BAD) > + result = fp->_offset; > else > - result += get_file_offset (fp); > + result = _IO_SYSSEEK (fp, 0, _IO_seek_cur); > > if (result == EOF) > return result; > > + result += offset; > + > if (result < 0) > { > __set_errno (EINVAL); > @@ -1016,7 +989,6 @@ do_ftell (_IO_FILE *fp) > return result; > } > > - > _IO_off64_t > _IO_new_file_seekoff (fp, offset, dir, mode) > _IO_FILE *fp; > diff --git a/libio/iofdopen.c b/libio/iofdopen.c > index 3f266f7..364e175 100644 > --- a/libio/iofdopen.c > +++ b/libio/iofdopen.c > @@ -167,6 +167,15 @@ _IO_new_fdopen (fd, mode) > _IO_mask_flags (&new_f->fp.file, read_write, > _IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING); > > + /* For append mode, send the file offset to the end of the file. Don't > + update the offset cache though, since the file handle is not active. */ > + if ((read_write & (_IO_IS_APPENDING | _IO_NO_READS)) > + == (_IO_IS_APPENDING | _IO_NO_READS)) > + { > + _IO_off64_t new_pos = _IO_SYSSEEK (&new_f->fp.file, 0, _IO_seek_end); > + if (new_pos == _IO_pos_BAD && errno != ESPIPE) > + return NULL; > + } > return &new_f->fp.file; > } > libc_hidden_ver (_IO_new_fdopen, _IO_fdopen) > diff --git a/libio/tst-ftell-active-handler.c b/libio/tst-ftell-active-handler.c > index 5d5fc26..aef9cc0 100644 > --- a/libio/tst-ftell-active-handler.c > +++ b/libio/tst-ftell-active-handler.c > @@ -88,6 +88,107 @@ static size_t file_len; > typedef int (*fputs_func_t) (const void *data, FILE *fp); > fputs_func_t fputs_func; > > +/* Test that ftell output after a rewind is correct. */ > +static int > +do_rewind_test (const char *filename) > +{ > + int ret = 0; > + struct test > + { > + const char *mode; > + int fd_mode; > + size_t old_off; > + size_t new_off; > + } test_modes[] = { > + {"w", O_WRONLY, 0, data_len}, > + {"w+", O_RDWR, 0, data_len}, > + {"r+", O_RDWR, 0, data_len}, > + /* The new offsets for 'a' and 'a+' modes have to factor in the > + previous writes since they compulsorily append to the end of the > + file. */ > + {"a", O_WRONLY, 0, 3 * data_len}, > + {"a+", O_RDWR, 0, 4 * data_len}, > + }; > + > + /* Empty the file before the test so that our offsets are simple to > + calculate. */ > + FILE *fp = fopen (filename, "w"); > + if (fp == NULL) > + { > + printf ("Failed to open file for emptying\n"); > + return 1; > + } > + fclose (fp); > + > + for (int j = 0; j < 2; j++) > + { > + for (int i = 0; i < sizeof (test_modes) / sizeof (struct test); i++) > + { > + FILE *fp; > + int fd; > + int fileret; > + > + printf ("\trewind: %s (file, \"%s\"): ", j == 0 ? "fdopen" : "fopen", > + test_modes[i].mode); > + > + if (j == 0) > + fileret = get_handles_fdopen (filename, fd, fp, > + test_modes[i].fd_mode, > + test_modes[i].mode); > + else > + fileret = get_handles_fopen (filename, fd, fp, test_modes[i].mode); > + > + if (fileret != 0) > + return fileret; > + > + /* Write some content to the file, rewind and ensure that the ftell > + output after the rewind is 0. POSIX does not specify what the > + behavior is when a file is rewound in 'a' mode, so we retain > + current behavior, which is to keep the 0 offset. */ > + size_t written = fputs_func (data, fp); > + > + if (written == EOF) > + { > + printf ("fputs[1] failed to write data\n"); > + ret |= 1; > + } > + > + rewind (fp); > + long offset = ftell (fp); > + > + if (offset != test_modes[i].old_off) > + { > + printf ("Incorrect old offset. Expected %zu, but got %ld, ", > + test_modes[i].old_off, offset); > + ret |= 1; > + } > + else > + printf ("old offset = %ld, ", offset); > + > + written = fputs_func (data, fp); > + > + if (written == EOF) > + { > + printf ("fputs[1] failed to write data\n"); > + ret |= 1; > + } > + > + /* After this write, the offset in append modes should factor in the > + implicit lseek to the end of file. */ > + offset = ftell (fp); > + if (offset != test_modes[i].new_off) > + { > + printf ("Incorrect new offset. Expected %zu, but got %ld\n", > + test_modes[i].new_off, offset); > + ret |= 1; > + } > + else > + printf ("new offset = %ld\n", offset); > + } > + } > + return ret; > +} > + > /* Test that the value of ftell is not cached when the stream handle is not > active. */ > static int > @@ -107,11 +208,13 @@ do_ftell_test (const char *filename) > {"w", O_WRONLY, 0, data_len}, > {"w+", O_RDWR, 0, data_len}, > {"r+", O_RDWR, 0, data_len}, > - /* For 'a' and 'a+' modes, the initial file position should be the > + /* For the 'a' mode, the initial file position should be the > current end of file. After the write, the offset has data_len > - added to the old value. */ > + added to the old value. For a+ mode however, the initial file > + position is the file position of the underlying file descriptor, > + since it is initially assumed to be in read mode. */ > {"a", O_WRONLY, data_len, 2 * data_len}, > - {"a+", O_RDWR, 2 * data_len, 3 * data_len}, > + {"a+", O_RDWR, 0, 3 * data_len}, > }; > for (int j = 0; j < 2; j++) > { > @@ -157,7 +260,7 @@ do_ftell_test (const char *filename) > if (off != test_modes[i].new_off) > { > printf ("Incorrect new offset. Expected %zu but got %ld\n", > - test_modes[i].old_off, off); > + test_modes[i].new_off, off); > ret |= 1; > } > else > @@ -322,6 +425,7 @@ do_one_test (const char *filename) > ret |= do_ftell_test (filename); > ret |= do_write_test (filename); > ret |= do_append_test (filename); > + ret |= do_rewind_test (filename); > > return ret; > } > diff --git a/libio/wfileops.c b/libio/wfileops.c > index 8b2e108..3199861 100644 > --- a/libio/wfileops.c > +++ b/libio/wfileops.c > @@ -597,12 +597,12 @@ done: > } > > /* ftell{,o} implementation for wide mode. Don't modify any state of the file > - pointer while we try to get the current state of the stream. */ > + pointer while we try to get the current state of the stream except in one > + case, which is when we have unflushed writes in append mode. */ > static _IO_off64_t > do_ftell_wide (_IO_FILE *fp) > { > _IO_off64_t result, offset = 0; > - bool use_cached_offset = false; > > /* No point looking for offsets in the buffer if it hasn't even been > allocated. */ > @@ -615,6 +615,20 @@ do_ftell_wide (_IO_FILE *fp) > > fp->_wide_data->_IO_write_base) > || _IO_in_put_mode (fp)); > > + bool append_mode = (fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING; > + > + /* When we have unflushed writes in append mode, seek to the end of the > + file and record that offset. This is the only time we change the file > + stream state and it is safe since the file handle is active. */ > + if (was_writing && append_mode) > + { > + result = _IO_SYSSEEK (fp, 0, _IO_seek_end); > + if (result == _IO_pos_BAD) > + return EOF; > + else > + fp->_offset = result; > + } > + > /* XXX For wide stream with backup store it is not very > reasonable to determine the offset. The pushed-back > character might require a state change and we need not be > @@ -703,37 +717,24 @@ do_ftell_wide (_IO_FILE *fp) > position is fp._offset - (_IO_read_end - new_write_ptr). */ > offset -= fp->_IO_read_end - fp->_IO_write_ptr; > } > - > - /* It is safe to use the cached offset when available if there is > - unbuffered data (indicating that the file handle is active) and > - the handle is not for a file open in a+ mode. The latter > - condition is because there could be a scenario where there is a > - switch from read mode to write mode using an fseek to an arbitrary > - position. In this case, there would be unbuffered data due to be > - appended to the end of the file, but the offset may not > - necessarily be the end of the file. It is fine to use the cached > - offset when the a+ stream is in read mode though, since the offset > - is maintained correctly in that case. Note that this is not a > - comprehensive set of cases when the offset is reliable. The > - offset may be reliable even in some cases where there is no > - unflushed input and the handle is active, but it's just that we > - don't have a way to identify that condition reliably. */ > - use_cached_offset = (offset != 0 && fp->_offset != _IO_pos_BAD > - && ((fp->_flags & (_IO_IS_APPENDING | _IO_NO_READS)) > - == (_IO_IS_APPENDING | _IO_NO_READS) > - && was_writing)); > } > > - if (use_cached_offset) > + if (fp->_offset != _IO_pos_BAD) > result = fp->_offset; > else > - result = get_file_offset (fp); > + result = _IO_SYSSEEK (fp, 0, _IO_seek_cur); > > if (result == EOF) > return result; > > result += offset; > > + if (result < 0) > + { > + __set_errno (EINVAL); > + return EOF; > + } > + > return result; > } > > -- > 1.8.3.1 >
Attachment:
pgpEyCdf2XSsn.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |