1.3.3-2: fseek fails on multiples of 1024 (binary mode)

Christopher Faylor cgf@redhat.com
Tue Oct 23 09:49:00 GMT 2001


Again, these observations should go to the mailing list which is
responsible for maintaining the code that you've analyzed:
newlib@sources.redhat.com .

I've redirected this discussion there.

Thanks for your in-depth analysis of the problem.

cgf

On Tue, Oct 23, 2001 at 06:21:40PM +0200, Pavel Tsekov wrote:
>Upon further investigation this problem seems to be much bigger than
>I've initially thought and there is no easy patch to it (at least I
>think so) so I'll describe what've found about it.  I started a patch
>though and will would like to know if you like my approach or will
>suggest another one.
>
>So here is what I've found:
>
>1.  The described problem appears only when you have buffering enabled.
>
>2.  It basically has to do with a intended optimisation which involves
>not reading the same buffered region twice, if the fseek call requests
>a offset which is in the buffered region.  This optiomisation is
>performed only if the file is opened in READONLY mode.
>
>3.  The problem is to happen only if you fseek backwards from the
>current file position and the difference between the current file
>position and the new position is not greater than the buffer size used
>to buffer reads.  A requirement for the problem to appear is to have
>two or more fseeks between no other operation between them.
>
>4.  This problem is likely to affect all BSD flavour OSes - the newlib
>stdio code is derived from BSD.  I looked at the fseek code in OpenBSD
>and FreeBSD and I think it has the error.
>
>Now let me point you to the code location which introduces the problem:
>
>--> Target is where do we wont to go
>
> if (whence == SEEK_SET)
>    target = offset;
>  else
>    {
>      if (_fstat_r (ptr, fp->_file, &st))
>        goto dumb;
>      target = st.st_size + offset;
>    }
>
>--> What is actually done here up to the next --> mark is that fseek
>    tries ot determine the position where the last read occured.  fp->_r is
>    number if bytes left in the buffer to be read, n = fp->_p -
>    fp->_bf._base is the bytes from the buffer already read.  n + fp->_r is
>    the size of the buffer.  By decrementing the current file positon with
>    n + fp->_r the code expects to find the position where the last read
>    occured.
>
>  if (!havepos)
>    {
>      if (fp->_flags & __SOFF)
>        curoff = fp->_offset;
>      else
>        {
>          curoff = (*seekfn) (fp->_cookie, 0L, SEEK_CUR);
>          if (curoff == POS_ERR)
>            goto dumb;
>        }
>      curoff -= fp->_r;
>      if (HASUB (fp))
>        curoff -= fp->_ur;
>    }
> 
>  /*
>   * Compute the number of bytes in the input buffer (pretending
>   * that any ungetc() input has been discarded).  Adjust current
>   * offset backwards by this count so that it represents the
>   * file offset for the first byte in the current input buffer.
>   */
> 
>  if (HASUB (fp))
>    {
>      curoff += fp->_r;       /* kill off ungetc */
>      n = fp->_up - fp->_bf._base;
>      curoff -= n;
>      n += fp->_ur;
>    }
>  else
>    {
>      n = fp->_p - fp->_bf._base;
>      curoff -= n;
>      n += fp->_r;
>    }
>
>--> Here we try to determine if the position we request
>    is in the buffered region. If we determnine erronosly
>    that the requested position is in this buffer we got the
>    error which started this thread.
>
>  /*
>   * If the target offset is within the current buffer,
>   * simply adjust the pointers, clear EOF, undo ungetc(),
>   * and return.  (If the buffer was modified, we have to
>   * skip this; see fgetline.c.)
>   */
> 
>  if ((fp->_flags & __SMOD) == 0 &&
>      target >= curoff && target < curoff + n)
>    {
>      register int o = target - curoff;
> 
>      fp->_p = fp->_bf._base + o;
>      fp->_r = n - o;
>      if (HASUB (fp))
>        FREEUB (fp);
>      fp->_flags &= ~__SEOF;
>      return 0;
>    }
>
>Let me show you with some digits whats going on:
>
>We have 2048 bytes file.  We have bufsize 1024.  We fread 1024 bytes -
>this fills exactly the internal buffer thus 'n' will become 1024,
>fp->_r will become 0.  We request then file position 0 from the end of
>file.  ftell reports 2048.  Now we want to go back to 1024 and read 8
>bytes.  fseek tries to optiomize since we are in read only mode ...  it
>does this
>
>curoff = 2048 (current position)
>
>curoff -= r; 2048 - 0 = 2048
>curoff -= n; 2048 - 1024 = 1024 Determines where last read occured
>
>n + r = 1024 + 0; Determines the upper limit of the internal buffer
>
>now the final check 
>
>target >= curoff && target < curoff + n
>1024 >= 1024 && 1024 < 1024 + 1024 is TRUE
>
>we read from the old buffer which is error.
>
>I think to fix the situation so this optimisation really works the best
>thing is to add a new member to FILE which is the last known read
>position and replace curoff in the last check with this member.
>However this requires much more work than just a simple patch - So do
>you think there is another (better) way to do this.  Or perhaps just
>remove this optimisation at all ?  Which is easy and will require
>chabges just to fseek.
>
>Corinna Vinschen wrote:
>>
>>On Tue, Oct 23, 2001 at 01:39:42PM +0200, Pavel Tsekov wrote:
>>>I think I found the problem - is there anyone who can test a patch ?
>>
>>Just send it to the cygwin-patches mailing list, please.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list