This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Fix offset caching for streams and use it for ftell (BZ #16680)


Carlos suggested that I write up a brief description of ftell and its
interaction with the offset cache, so here it is:

https://sourceware.org/glibc/wiki/File%20offsets%20in%20a%20stdio%20stream%20and%20ftell

Siddhesh

On Tue, Mar 11, 2014 at 06:16:16PM +0530, Siddhesh Poyarekar wrote:
> Here's take 2:
> 
> The ftell implementation was made conservative to ensure that
> incorrectly cached offsets never affect it.  However, this causes
> problems for append mode when a file stream is rewound.  Additionally,
> the 'clever' trick of using stat to get position for append mode files
> caused more problems than it solved and broke some old behavior.  I
> have described below the various problems that it caused and then
> finally the solution.
> 
> For a and a+ mode files, rewinding the stream should result in ftell
> returning 0 as the offset, but the stat() trick caused it to
> (incorrectly) always return the end of file.  Now I couldn't find
> anything in POSIX that specifies the stream position after rewind()
> for a file opened in 'a' mode, but for 'a+' mode it should be set to
> 0.  For 'a' mode too, it probably makes sense to keep it set to 0 in
> the interest of retaining old behavior.
> 
> The initial file position for append mode files is implementation
> defined, so the implementation could either retain the current file
> position or move the position to the end of file.  The earlier ftell
> implementation would move the offset to end of file for append-only
> mode, but retain the old offset for a+ mode.  It would also cache the
> offset (this detail is important).  My patch broke this and would set
> the initial position to end of file for both append modes, thus
> breaking old behavior.  I was ignorant enough to write an incorrect
> test case for it too.
> 
> The Change:
> 
> I have now brought back the behavior of seeking to end of file for
> append-only streams, but with a slight difference.  I don't cache the
> offset though, since we would want ftell to query the current file
> position through lseek while the stream is not active.  Since the
> offset is moved to the end of file, we can rely on the file position
> reported by lseek and we don't need to resort to the stat() nonsense.
> 
> Finally, the cache is always reliable, except when there are unflished
> writes in an append mode stream (i.e. both a and a+).  In the latter
> case, it is safe to just do an lseek to SEEK_END.  The value can be
> safely cached too, since the file handle is already active at this
> point.  Incidentally, this is the only state change we affect in the
> file handle (apart from taking locks of course).
> 
> I have also updated the test case to correct my impression of the
> initial file position for a+ streams to the initial behavior.  I have
> verified that this does not break any existing tests in the testsuite
> and also passes with the new tests.
> 
> Siddhesh
> 
> 	[BZ #16680]
> 	* libio/fileops.c (_IO_file_open): Seek to end of file but
> 	don't cache the offset.
> 	(get_file_offset): Remove function.
> 	(do_ftell): Use cached offset when available.
> 	* libio/iofdopen.c (_IO_new_fdopen): Seek to end of file but
> 	don't cache the offset.
> 	* libio/tst-ftell-active-handler.c (do_rewind_test): New test
> 	case.
> 	(do_one_test): Call it.
> 	(do_ftell_test): Fix up expected old offset for a+ mode.
> 	* libio/wfileops.c (do_ftell_wide): Used cached offset when
> 	available.
> 
> ---
>  libio/fileops.c                  | 102 +++++++++++++----------------------
>  libio/iofdopen.c                 |   9 ++++
>  libio/tst-ftell-active-handler.c | 112 +++++++++++++++++++++++++++++++++++++--
>  libio/wfileops.c                 |  47 ++++++++--------
>  4 files changed, 178 insertions(+), 92 deletions(-)
> 
> diff --git a/libio/fileops.c b/libio/fileops.c
> index 2e7bc8d..cf68dbf 100644
> --- a/libio/fileops.c
> +++ b/libio/fileops.c
> @@ -232,13 +232,18 @@ _IO_file_open (fp, filename, posix_mode, prot, read_write, is32not64)
>      return NULL;
>    fp->_fileno = fdesc;
>    _IO_mask_flags (fp, read_write,_IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING);
> -  if ((read_write & _IO_IS_APPENDING) && (read_write & _IO_NO_READS))
> -    if (_IO_SEEKOFF (fp, (_IO_off64_t)0, _IO_seek_end, _IOS_INPUT|_IOS_OUTPUT)
> -	== _IO_pos_BAD && errno != ESPIPE)
> -      {
> -	close_not_cancel (fdesc);
> -	return NULL;
> -      }
> +  /* For append mode, send the file offset to the end of the file.  Don't
> +     update the offset cache though, since the file handle is not active.  */
> +  if ((read_write & (_IO_IS_APPENDING | _IO_NO_READS))
> +      == (_IO_IS_APPENDING | _IO_NO_READS))
> +    {
> +      _IO_off64_t new_pos = _IO_SYSSEEK (fp, 0, _IO_seek_end);
> +      if (new_pos == _IO_pos_BAD && errno != ESPIPE)
> +	{
> +	  close_not_cancel (fdesc);
> +	  return NULL;
> +	}
> +    }
>    _IO_link_in ((struct _IO_FILE_plus *) fp);
>    return fp;
>  }
> @@ -929,43 +934,13 @@ _IO_file_sync_mmap (_IO_FILE *fp)
>    return 0;
>  }
>  
> -/* Get the current file offset using a system call.  This is the safest method
> -   to get the current file offset, since we are sure that we get the current
> -   state of the file.  Before the stream handle is activated (by using fread,
> -   fwrite, etc.), an application may alter the state of the file descriptor
> -   underlying it by calling read/write/lseek on it.  Using a cached offset at
> -   this point will result in returning the incorrect value.  Same is the case
> -   when one switches from reading in a+ mode to writing, where the buffer has
> -   not been flushed - the cached offset would reflect the reading position
> -   while the actual write position would be at the end of the file.
> -
> -   do_ftell and do_ftell_wide may resort to using the cached offset in some
> -   special cases instead of calling get_file_offset, but those cases should be
> -   thoroughly described.  */
> -_IO_off64_t
> -get_file_offset (_IO_FILE *fp)
> -{
> -  if ((fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING)
> -    {
> -      struct stat64 st;
> -      bool ret = (_IO_SYSSTAT (fp, &st) == 0 && S_ISREG (st.st_mode));
> -      if (ret)
> -	return st.st_size;
> -      else
> -	return EOF;
> -    }
> -  else
> -    return _IO_SYSSEEK (fp, 0, _IO_seek_cur);
> -}
> -
> -
> -/* ftell{,o} implementation.  Don't modify any state of the file pointer while
> -   we try to get the current state of the stream.  */
> +/* ftell{,o} implementation.  The only time we modify the state of the stream
> +   is when we have unflushed writes.  In that case we seek to the end and
> +   record that offset in the stream object.  */
>  static _IO_off64_t
>  do_ftell (_IO_FILE *fp)
>  {
> -  _IO_off64_t result = 0;
> -  bool use_cached_offset = false;
> +  _IO_off64_t result, offset = 0;
>  
>    /* No point looking at unflushed data if we haven't allocated buffers
>       yet.  */
> @@ -974,39 +949,37 @@ do_ftell (_IO_FILE *fp)
>        bool was_writing = (fp->_IO_write_ptr > fp->_IO_write_base
>  			  || _IO_in_put_mode (fp));
>  
> +      bool append_mode = (fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING;
> +
> +      /* When we have unflushed writes in append mode, seek to the end of the
> +	 file and record that offset.  This is the only time we change the file
> +	 stream state and it is safe since the file handle is active.  */
> +      if (was_writing && append_mode)
> +	{
> +	  result = _IO_SYSSEEK (fp, 0, _IO_seek_end);
> +	  if (result == _IO_pos_BAD)
> +	    return EOF;
> +	  else
> +	    fp->_offset = result;
> +	}
> +
>        /* Adjust for unflushed data.  */
>        if (!was_writing)
> -	result -= fp->_IO_read_end - fp->_IO_read_ptr;
> +	offset -= fp->_IO_read_end - fp->_IO_read_ptr;
>        else
> -	result += fp->_IO_write_ptr - fp->_IO_read_end;
> -
> -      /* It is safe to use the cached offset when available if there is
> -	 unbuffered data (indicating that the file handle is active) and the
> -	 handle is not for a file open in a+ mode.  The latter condition is
> -	 because there could be a scenario where there is a switch from read
> -	 mode to write mode using an fseek to an arbitrary position.  In this
> -	 case, there would be unbuffered data due to be appended to the end of
> -	 the file, but the offset may not necessarily be the end of the
> -	 file.  It is fine to use the cached offset when the a+ stream is in
> -	 read mode though, since the offset is maintained correctly in that
> -	 case.  Note that this is not a comprehensive set of cases when the
> -	 offset is reliable.  The offset may be reliable even in some cases
> -	 where there is no unflushed input and the handle is active, but it's
> -	 just that we don't have a way to identify that condition reliably.  */
> -      use_cached_offset = (result != 0 && fp->_offset != _IO_pos_BAD
> -			   && ((fp->_flags & (_IO_IS_APPENDING | _IO_NO_READS))
> -			       == (_IO_IS_APPENDING | _IO_NO_READS)
> -			       && was_writing));
> +	offset += fp->_IO_write_ptr - fp->_IO_read_end;
>      }
>  
> -  if (use_cached_offset)
> -    result += fp->_offset;
> +  if (fp->_offset != _IO_pos_BAD)
> +    result = fp->_offset;
>    else
> -    result += get_file_offset (fp);
> +    result = _IO_SYSSEEK (fp, 0, _IO_seek_cur);
>  
>    if (result == EOF)
>      return result;
>  
> +  result += offset;
> +
>    if (result < 0)
>      {
>        __set_errno (EINVAL);
> @@ -1016,7 +989,6 @@ do_ftell (_IO_FILE *fp)
>    return result;
>  }
>  
> -
>  _IO_off64_t
>  _IO_new_file_seekoff (fp, offset, dir, mode)
>       _IO_FILE *fp;
> diff --git a/libio/iofdopen.c b/libio/iofdopen.c
> index 3f266f7..364e175 100644
> --- a/libio/iofdopen.c
> +++ b/libio/iofdopen.c
> @@ -167,6 +167,15 @@ _IO_new_fdopen (fd, mode)
>    _IO_mask_flags (&new_f->fp.file, read_write,
>  		  _IO_NO_READS+_IO_NO_WRITES+_IO_IS_APPENDING);
>  
> +  /* For append mode, send the file offset to the end of the file.  Don't
> +     update the offset cache though, since the file handle is not active.  */
> +  if ((read_write & (_IO_IS_APPENDING | _IO_NO_READS))
> +      == (_IO_IS_APPENDING | _IO_NO_READS))
> +    {
> +      _IO_off64_t new_pos = _IO_SYSSEEK (&new_f->fp.file, 0, _IO_seek_end);
> +      if (new_pos == _IO_pos_BAD && errno != ESPIPE)
> +	return NULL;
> +    }
>    return &new_f->fp.file;
>  }
>  libc_hidden_ver (_IO_new_fdopen, _IO_fdopen)
> diff --git a/libio/tst-ftell-active-handler.c b/libio/tst-ftell-active-handler.c
> index 5d5fc26..aef9cc0 100644
> --- a/libio/tst-ftell-active-handler.c
> +++ b/libio/tst-ftell-active-handler.c
> @@ -88,6 +88,107 @@ static size_t file_len;
>  typedef int (*fputs_func_t) (const void *data, FILE *fp);
>  fputs_func_t fputs_func;
>  
> +/* Test that ftell output after a rewind is correct.  */
> +static int
> +do_rewind_test (const char *filename)
> +{
> +  int ret = 0;
> +  struct test
> +    {
> +      const char *mode;
> +      int fd_mode;
> +      size_t old_off;
> +      size_t new_off;
> +    } test_modes[] = {
> +	  {"w", O_WRONLY, 0, data_len},
> +	  {"w+", O_RDWR, 0, data_len},
> +	  {"r+", O_RDWR, 0, data_len},
> +	  /* The new offsets for 'a' and 'a+' modes have to factor in the
> +	     previous writes since they compulsorily append to the end of the
> +	     file.  */
> +	  {"a", O_WRONLY, 0, 3 * data_len},
> +	  {"a+", O_RDWR, 0, 4 * data_len},
> +    };
> +
> +  /* Empty the file before the test so that our offsets are simple to
> +     calculate.  */
> +  FILE *fp = fopen (filename, "w");
> +  if (fp == NULL)
> +    {
> +      printf ("Failed to open file for emptying\n");
> +      return 1;
> +    }
> +  fclose (fp);
> +
> +  for (int j = 0; j < 2; j++)
> +    {
> +      for (int i = 0; i < sizeof (test_modes) / sizeof (struct test); i++)
> +	{
> +	  FILE *fp;
> +	  int fd;
> +	  int fileret;
> +
> +	  printf ("\trewind: %s (file, \"%s\"): ", j == 0 ? "fdopen" : "fopen",
> +		  test_modes[i].mode);
> +
> +	  if (j == 0)
> +	    fileret = get_handles_fdopen (filename, fd, fp,
> +					  test_modes[i].fd_mode,
> +					  test_modes[i].mode);
> +	  else
> +	    fileret = get_handles_fopen (filename, fd, fp, test_modes[i].mode);
> +
> +	  if (fileret != 0)
> +	    return fileret;
> +
> +	  /* Write some content to the file, rewind and ensure that the ftell
> +	     output after the rewind is 0.  POSIX does not specify what the
> +	     behavior is when a file is rewound in 'a' mode, so we retain
> +	     current behavior, which is to keep the 0 offset.  */
> +	  size_t written = fputs_func (data, fp);
> +
> +	  if (written == EOF)
> +	    {
> +	      printf ("fputs[1] failed to write data\n");
> +	      ret |= 1;
> +	    }
> +
> +	  rewind (fp);
> +	  long offset = ftell (fp);
> +
> +	  if (offset != test_modes[i].old_off)
> +	    {
> +	      printf ("Incorrect old offset.  Expected %zu, but got %ld, ",
> +		      test_modes[i].old_off, offset);
> +	      ret |= 1;
> +	    }
> +	  else
> +	    printf ("old offset = %ld, ", offset);
> +
> +	  written = fputs_func (data, fp);
> +
> +	  if (written == EOF)
> +	    {
> +	      printf ("fputs[1] failed to write data\n");
> +	      ret |= 1;
> +	    }
> +
> +	  /* After this write, the offset in append modes should factor in the
> +	     implicit lseek to the end of file.  */
> +	  offset = ftell (fp);
> +	  if (offset != test_modes[i].new_off)
> +	    {
> +	      printf ("Incorrect new offset.  Expected %zu, but got %ld\n",
> +		      test_modes[i].new_off, offset);
> +	      ret |= 1;
> +	    }
> +	  else
> +	    printf ("new offset = %ld\n", offset);
> +	}
> +    }
> +  return ret;
> +}
> +
>  /* Test that the value of ftell is not cached when the stream handle is not
>     active.  */
>  static int
> @@ -107,11 +208,13 @@ do_ftell_test (const char *filename)
>  	  {"w", O_WRONLY, 0, data_len},
>  	  {"w+", O_RDWR, 0, data_len},
>  	  {"r+", O_RDWR, 0, data_len},
> -	  /* For 'a' and 'a+' modes, the initial file position should be the
> +	  /* For the 'a' mode, the initial file position should be the
>  	     current end of file. After the write, the offset has data_len
> -	     added to the old value.  */
> +	     added to the old value.  For a+ mode however, the initial file
> +	     position is the file position of the underlying file descriptor,
> +	     since it is initially assumed to be in read mode.  */
>  	  {"a", O_WRONLY, data_len, 2 * data_len},
> -	  {"a+", O_RDWR, 2 * data_len, 3 * data_len},
> +	  {"a+", O_RDWR, 0, 3 * data_len},
>      };
>    for (int j = 0; j < 2; j++)
>      {
> @@ -157,7 +260,7 @@ do_ftell_test (const char *filename)
>  	  if (off != test_modes[i].new_off)
>  	    {
>  	      printf ("Incorrect new offset.  Expected %zu but got %ld\n",
> -		      test_modes[i].old_off, off);
> +		      test_modes[i].new_off, off);
>  	      ret |= 1;
>  	    }
>  	  else
> @@ -322,6 +425,7 @@ do_one_test (const char *filename)
>    ret |= do_ftell_test (filename);
>    ret |= do_write_test (filename);
>    ret |= do_append_test (filename);
> +  ret |= do_rewind_test (filename);
>  
>    return ret;
>  }
> diff --git a/libio/wfileops.c b/libio/wfileops.c
> index 8b2e108..3199861 100644
> --- a/libio/wfileops.c
> +++ b/libio/wfileops.c
> @@ -597,12 +597,12 @@ done:
>  }
>  
>  /* ftell{,o} implementation for wide mode.  Don't modify any state of the file
> -   pointer while we try to get the current state of the stream.  */
> +   pointer while we try to get the current state of the stream except in one
> +   case, which is when we have unflushed writes in append mode.  */
>  static _IO_off64_t
>  do_ftell_wide (_IO_FILE *fp)
>  {
>    _IO_off64_t result, offset = 0;
> -  bool use_cached_offset = false;
>  
>    /* No point looking for offsets in the buffer if it hasn't even been
>       allocated.  */
> @@ -615,6 +615,20 @@ do_ftell_wide (_IO_FILE *fp)
>  			   > fp->_wide_data->_IO_write_base)
>  			  || _IO_in_put_mode (fp));
>  
> +      bool append_mode = (fp->_flags & _IO_IS_APPENDING) == _IO_IS_APPENDING;
> +
> +      /* When we have unflushed writes in append mode, seek to the end of the
> +	 file and record that offset.  This is the only time we change the file
> +	 stream state and it is safe since the file handle is active.  */
> +      if (was_writing && append_mode)
> +	{
> +	  result = _IO_SYSSEEK (fp, 0, _IO_seek_end);
> +	  if (result == _IO_pos_BAD)
> +	    return EOF;
> +	  else
> +	    fp->_offset = result;
> +	}
> +
>        /* XXX For wide stream with backup store it is not very
>  	 reasonable to determine the offset.  The pushed-back
>  	 character might require a state change and we need not be
> @@ -703,37 +717,24 @@ do_ftell_wide (_IO_FILE *fp)
>  	     position is fp._offset - (_IO_read_end - new_write_ptr).  */
>  	  offset -= fp->_IO_read_end - fp->_IO_write_ptr;
>  	}
> -
> -      /* It is safe to use the cached offset when available if there is
> -	 unbuffered data (indicating that the file handle is active) and
> -	 the handle is not for a file open in a+ mode.  The latter
> -	 condition is because there could be a scenario where there is a
> -	 switch from read mode to write mode using an fseek to an arbitrary
> -	 position.  In this case, there would be unbuffered data due to be
> -	 appended to the end of the file, but the offset may not
> -	 necessarily be the end of the file.  It is fine to use the cached
> -	 offset when the a+ stream is in read mode though, since the offset
> -	 is maintained correctly in that case.  Note that this is not a
> -	 comprehensive set of cases when the offset is reliable.  The
> -	 offset may be reliable even in some cases where there is no
> -	 unflushed input and the handle is active, but it's just that we
> -	 don't have a way to identify that condition reliably.  */
> -      use_cached_offset = (offset != 0 && fp->_offset != _IO_pos_BAD
> -			   && ((fp->_flags & (_IO_IS_APPENDING | _IO_NO_READS))
> -			       == (_IO_IS_APPENDING | _IO_NO_READS)
> -			       && was_writing));
>      }
>  
> -  if (use_cached_offset)
> +  if (fp->_offset != _IO_pos_BAD)
>      result = fp->_offset;
>    else
> -    result = get_file_offset (fp);
> +    result = _IO_SYSSEEK (fp, 0, _IO_seek_cur);
>  
>    if (result == EOF)
>      return result;
>  
>    result += offset;
>  
> +  if (result < 0)
> +    {
> +      __set_errno (EINVAL);
> +      return EOF;
> +    }
> +
>    return result;
>  }
>  
> -- 
> 1.8.3.1
> 


Attachment: pgpEyCdf2XSsn.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]