This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Deadlock scenario on close() in io/fileio


Nick Garnett wrote:
Rutger Hofman <rutger@cs.vu.nl> writes:

Good morning list,

I ran into a deadlock with io/fileio on invoking close(fd) in
io/fileio/current/src/fd.cxx.

The problem arises because close() grabs fd_lock, then calls
cyg_fd_free() which in turn calls the file system's fo_close(), which
will flush any pending data; this flush invokes write() which will
again try to grab fd_lock. Twice the same lock in one thread ->
deadlock.

The stack trace on deadlock clearly shows this:

close()
   cyg_fd_free()
     lock(fdlock)       <======================================
     fd_close()
       fp_ucount_dec()
         cyg_yaffs_fo_close()
           yaffs_close()
             yaffs_FlushFile()
               yaffs_UpdateObjectHeader()
                 readwritev()
                   cyg_fp_get()
                     lock(fdlock)  <===========================

The code of close() even has a comment in the call to cyg_fd_free()
that points out that the file's fo_close may be called.

Is YAFFS recursively calling back into its own entrypoints? Or is it calling a different filesytem? I hope the latter, it should be using it's own internal interfaces to do anything else.

You are absolutely right. The second I/O is caused by a trace print to stdout from YAFFS.


I think there are a few possible solutions:
1) a full-fledged continuation mechanism where all locks are released
when a layer is left;
2) allow fdlock to be grabbed recursively, as in Java-style
synchronized locking;

These both look horrible. Recursive mutexes are evil and are to be resisted at all costs -- they are generally a quick workaround for something that should be solved with a bit more thought.

You should tell the Java designers... :-)


3) check if this is the only occurrence of this deadlock scenario, and
check if the lock can be released in fp_ucount_dec without impairing
atomicity

As it happens I fixed this, and other similar problems, in the eCosCentric sources some time ago. I'll make a proper contribution later, but for now try this out and see if it solves your problem:

This is solution 3). It works for my test program. Thanks! I'll strip out my recursive mutexes again.


Rutger

Index: ChangeLog
===================================================================
RCS file: /cvs/ecos/ecos/packages/io/fileio/current/ChangeLog,v
retrieving revision 1.68
diff -u -5 -r1.68 ChangeLog
--- ChangeLog 1 May 2008 09:31:09 -0000 1.68
+++ ChangeLog 11 Dec 2008 14:25:41 -0000
@@ -1,5 +1,12 @@
+2008-12-11 Nick Garnett <nickg@ecoscentric.com>
+
+ * src/fd.cxx (fp_ucount_dec, fd_close): Fix locking strategy so
+ that the fdlock is free when filesystem close entry is
+ called. This allows other threads to peform descriptor operations
+ during the close.
+
2008-04-02 Xinghua Yang <yxinghua@sunnorth.com.cn>
Andrew Lunn <andrew.lunn@ascom.ch>
* cdl/fileio.cdl: Use CYGPKG_FILEIO_DIRENT_DTYPE to enable/disable
d_type field of struct dirent.
Index: src/fd.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/io/fileio/current/src/fd.cxx,v
retrieving revision 1.7
diff -u -5 -r1.7 fd.cxx
--- src/fd.cxx 7 Apr 2004 11:18:54 -0000 1.7
+++ src/fd.cxx 11 Dec 2008 14:25:41 -0000
@@ -147,24 +147,45 @@
// These must all be called with the fdlock already locked.
//--------------------------------------------------------------------------
// Decrement the use count on a file object and if it goes to zero,
// close the file and deallocate the file object.
+//
+// A word on locking here: It is necessary for the filesystem
+// fo_close() function to be called with the file lock claimed, but
+// the fdlock released, to permit other threads to perform fd-related
+// operations. The original code here took the file lock and released
+// the fdlock before the call and then locked the fdlock and released
+// the file lock after. The idea was that there was no point at which
+// a lock of some sort was not held. However, if two threads are
+// running through this code simultaneously, this could lead to
+// deadlock, particularly if the filesystem's syncmode specifies fstab
+// or mtab level locking. So the code now unlocks the file lock before
+// reclaiming the fdlock. This leaves a small window where no locks
+// are held, where in theory some other thread could jump in and mess
+// things up. However, this is benign; if the other thread is
+// accessing some other file object there will be no conflict and by
+// definition no other thread can access this file object since we are
+// executing here because no file descriptors point to this file
+// object any longer. Additionally, the file object is only marked
+// free, by zeroing the f_flag field, once the fdlock has been
+// reclaimed.
static int fp_ucount_dec( cyg_file *fp )
{
int error = 0;
if( (--fp->f_ucount) <= 0 )
{ cyg_file_lock( fp, fp->f_syncmode );
+ FILEIO_MUTEX_UNLOCK(fdlock);
error = fp->f_ops->fo_close(fp);
cyg_file_unlock( fp, fp->f_syncmode );
+ FILEIO_MUTEX_LOCK(fdlock); - if( error == 0 )
- fp->f_flag = 0;
+ fp->f_flag = 0;
}
return error;
}
@@ -178,21 +199,20 @@
cyg_file *fp;
CYG_ASSERT(((0 <= fd) && (fd<CYGNUM_FILEIO_NFD)), "fd out of range"); fp = desc[fd];
+ desc[fd] = FD_ALLOCATED;
if( fp != FD_ALLOCATED && fp != NULL)
{
// The descriptor is occupied, decrement its usecount and
// close the file if it goes zero.
error = fp_ucount_dec( fp );
}
- desc[fd] = FD_ALLOCATED;
-
return error;
}
//==========================================================================






--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]