This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Rutger Hofman <rutger@cs.vu.nl> writes:
Good morning list,
I ran into a deadlock with io/fileio on invoking close(fd) in io/fileio/current/src/fd.cxx.
The problem arises because close() grabs fd_lock, then calls cyg_fd_free() which in turn calls the file system's fo_close(), which will flush any pending data; this flush invokes write() which will again try to grab fd_lock. Twice the same lock in one thread -> deadlock.
The stack trace on deadlock clearly shows this:
close() cyg_fd_free() lock(fdlock) <====================================== fd_close() fp_ucount_dec() cyg_yaffs_fo_close() yaffs_close() yaffs_FlushFile() yaffs_UpdateObjectHeader() readwritev() cyg_fp_get() lock(fdlock) <===========================
The code of close() even has a comment in the call to cyg_fd_free() that points out that the file's fo_close may be called.
Is YAFFS recursively calling back into its own entrypoints? Or is it calling a different filesytem? I hope the latter, it should be using it's own internal interfaces to do anything else.
I think there are a few possible solutions: 1) a full-fledged continuation mechanism where all locks are released when a layer is left; 2) allow fdlock to be grabbed recursively, as in Java-style synchronized locking;
These both look horrible. Recursive mutexes are evil and are to be resisted at all costs -- they are generally a quick workaround for something that should be solved with a bit more thought.
3) check if this is the only occurrence of this deadlock scenario, and check if the lock can be released in fp_ucount_dec without impairing atomicity
As it happens I fixed this, and other similar problems, in the eCosCentric sources some time ago. I'll make a proper contribution later, but for now try this out and see if it solves your problem:
Index: ChangeLog
===================================================================
RCS file: /cvs/ecos/ecos/packages/io/fileio/current/ChangeLog,v
retrieving revision 1.68
diff -u -5 -r1.68 ChangeLog
--- ChangeLog 1 May 2008 09:31:09 -0000 1.68
+++ ChangeLog 11 Dec 2008 14:25:41 -0000
@@ -1,5 +1,12 @@
+2008-12-11 Nick Garnett <nickg@ecoscentric.com>
+
+ * src/fd.cxx (fp_ucount_dec, fd_close): Fix locking strategy so
+ that the fdlock is free when filesystem close entry is
+ called. This allows other threads to peform descriptor operations
+ during the close.
+
2008-04-02 Xinghua Yang <yxinghua@sunnorth.com.cn>
Andrew Lunn <andrew.lunn@ascom.ch>
* cdl/fileio.cdl: Use CYGPKG_FILEIO_DIRENT_DTYPE to enable/disable
d_type field of struct dirent.
Index: src/fd.cxx
===================================================================
RCS file: /cvs/ecos/ecos/packages/io/fileio/current/src/fd.cxx,v
retrieving revision 1.7
diff -u -5 -r1.7 fd.cxx
--- src/fd.cxx 7 Apr 2004 11:18:54 -0000 1.7
+++ src/fd.cxx 11 Dec 2008 14:25:41 -0000
@@ -147,24 +147,45 @@
// These must all be called with the fdlock already locked.
//--------------------------------------------------------------------------
// Decrement the use count on a file object and if it goes to zero,
// close the file and deallocate the file object.
+//
+// A word on locking here: It is necessary for the filesystem
+// fo_close() function to be called with the file lock claimed, but
+// the fdlock released, to permit other threads to perform fd-related
+// operations. The original code here took the file lock and released
+// the fdlock before the call and then locked the fdlock and released
+// the file lock after. The idea was that there was no point at which
+// a lock of some sort was not held. However, if two threads are
+// running through this code simultaneously, this could lead to
+// deadlock, particularly if the filesystem's syncmode specifies fstab
+// or mtab level locking. So the code now unlocks the file lock before
+// reclaiming the fdlock. This leaves a small window where no locks
+// are held, where in theory some other thread could jump in and mess
+// things up. However, this is benign; if the other thread is
+// accessing some other file object there will be no conflict and by
+// definition no other thread can access this file object since we are
+// executing here because no file descriptors point to this file
+// object any longer. Additionally, the file object is only marked
+// free, by zeroing the f_flag field, once the fdlock has been
+// reclaimed.
static int fp_ucount_dec( cyg_file *fp )
{
int error = 0;
if( (--fp->f_ucount) <= 0 )
{ cyg_file_lock( fp, fp->f_syncmode );
+ FILEIO_MUTEX_UNLOCK(fdlock);
error = fp->f_ops->fo_close(fp);
cyg_file_unlock( fp, fp->f_syncmode );
+ FILEIO_MUTEX_LOCK(fdlock); - if( error == 0 )
- fp->f_flag = 0;
+ fp->f_flag = 0;
}
return error;
}
@@ -178,21 +199,20 @@
cyg_file *fp;
CYG_ASSERT(((0 <= fd) && (fd<CYGNUM_FILEIO_NFD)), "fd out of range"); fp = desc[fd];
+ desc[fd] = FD_ALLOCATED;
if( fp != FD_ALLOCATED && fp != NULL)
{
// The descriptor is occupied, decrement its usecount and
// close the file if it goes zero.
error = fp_ucount_dec( fp );
}
- desc[fd] = FD_ALLOCATED;
-
return error;
}
//==========================================================================
-- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |