1.7.5: Occasional failure of CreatePipe or signal handing due to thread-unsafe code in cwdstuff::set

John Carey aeolus.electric-cloud.com@electric-cloud.usa.net
Tue Aug 10 21:54:00 GMT 2010


On Aug 10 13:44 +0200 Corinna Vinschen wrote:
> > On Aug 10 00:19, John Carey wrote:
...
> > Presumably the issues mentioned in the source comment may
> > also occur, but what I have seen myself is a subsequent call to
> > SetCurrentDirectory() closing, not the current directory handle, but
> > the handle that was allocated by the other thread.
> > 
> > Specifically, dll_crt0_1() calls sigproc_init(), then cwdstuff::set(),
> > and finally wait_for_sigthread().  The function sigproc_init() creates
> > the signal handling thread, which creates the signal handling pipe as
> > one of its very first actions, then sets the event for which
> > wait_for_sigthread() waits.  I think this scenario happens:
> > 
> >   1. main thread: cwdstuff::set(): NtClose().
> > 
> >   2. signal handling thread: CreatePipe() opens a handle to
> >   '\Device\NamedPipe\' and stores it in a global variable (because
> >   this is the first call to CreatePipe()).
> > 
> >   3. main thread: cwdstuff::set(): DuplicateHandle().
> > 
> > In this case, the current directory handle value has changed, which is
> > not the intend of the NtClose-Duplicate sequence.
> 
> No, that's not intended.  However, the code just *tries* to preserve the
> handle value, but it does not rely on it.  The NtClose is safe because
> the handle is the actual CWD handle at the time of the call and the PEB
> is locked.  The DuplicateHandle call just uses the phdl storage to store
> the new handle value, but it *never* tests if the new handle value is
> identical to the old one.  So, if all works well, it's the same handle
> value as before.  If not, *shrug*.
> 
> > CreateFile to fail with ERROR_INVALID_HANDLE as mentioned in the
> > source comments, but I have not seen that.  I think that the
> 
> I have.  You can easily test this as well on Vista and W7.  Just drop
> the DuplicateHandle call and simply store h in *phdl.  Then create a
> testcase:
> 
>   chdir ("/");
>   h = CreateFile ("foobar", ...);
>   if (h == INVALID_HANDLE_VALUE) printf ("%lu\n", GetLastError());

Thanks for the test case for the CreateFile() problem; I used it to
create the following test, in which Windows 7 CreateFile() fails with
ERROR_INVALID_HANDLE even when using a stock Cygwin 1.7.5 DLL:

> build@aeolus-w764c17 ~
> $ uname -a
> CYGWIN_NT-6.1-WOW64 aeolus-w764c17 1.7.5(0.225/5/3) 2010-04-12 19:07 i686 Cygwin
> 
> 
> build@aeolus-w764c17 ~
> $ cat test.c
> #include <windows.h>
> #include <pthread.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <stdlib.h>
> 
> pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
> 
> void * run (void *arg)
> {
>     int count = (int)arg >> 1;
>     int sync = (int)arg & 1;
> 
>     while (count--) {
>         if (sync) pthread_mutex_lock(&mutex);
> 
>         HANDLE h = CreateFile (
>             "foobar",
>             GENERIC_WRITE,
>             0,
>             NULL,
>             CREATE_ALWAYS,
>             FILE_ATTRIBUTE_NORMAL,
>             INVALID_HANDLE_VALUE);
>         if (h == INVALID_HANDLE_VALUE) {
>             printf ("%lu\n", GetLastError());
>             exit(1);
>         }
>         CloseHandle(h);
> 
>         if (sync) pthread_mutex_unlock(&mutex);
>     }
> 
>     return 0;
> }
> 
> int main (int argc, char** argv)
> {
>     int count;
>     int sync;
>     pthread_t tid;
>     void *result;
> 
>     if (argc != 2 && argc != 3) {
>         fprintf(stderr, "Usage: %s <count> [<sync>]\n", argv[0]);
>         return 2;
>     }
> 
>     count = atol (argv[1]);
>     sync = argc >= 3 ? !!atoi(argv[2]) : 0;
> 
>     pthread_create (&tid, 0, &run, (void*)((count << 1) | sync));
>     while (count--) {
>         if (sync) pthread_mutex_lock(&mutex);
>         chdir ("/");
>         if (sync) pthread_mutex_unlock(&mutex);
>     }
> 
>     pthread_join (tid, &result);
>     return 0;
> }
> 
> build@aeolus-w764c17 ~
> $ gcc -o ~/test ~/test.c
> 
> build@aeolus-w764c17 ~
> $ ~/test 1000 1
> 123
> 
> build@aeolus-w764c17 ~
> $ ~/test 1000 0
> 123
> 
> build@aeolus-w764c17 ~
> $ cd /
> 
> build@aeolus-w764c17 /
> $ ~/test 1000 1
> 
> build@aeolus-w764c17 /
> $ ~/test 1000 0
> 6
> 
> build@aeolus-w764c17 /
> $

6 = "The handle is invalid.":
I think that this is the error previously discussed.
Note that passing "1" as the second program argument
synchronizes the two threads and prevents the error.

123 = "The filename, directory name, or volume label syntax is incorrect.":
I have not yet investigated why that error occurs, but it does
not happen if I run the test in the Cygwin root directory.

...
> > I think this other scenario also happens:
> > 
> >   1. signal handling thread: CreatePipe() opens a handle to
> >   '\Device\NamedPipe\' (because it is the first call to CreatePipe()).
> > 
> >   2. main thread: cwdstuff::set(): NtClose().
> > 
> >   3. signal handling thread: CreatePipe() opens pipe handles.
> > 
> >   4. main thread: cwdstuff::set(): DuplicateHandle().
> > 
> > In this case it is Cygwin signal handling that is sabotaged by
> > subsequent calls to SetCurrentDirectory(), because they close one of
> > the pipe handles used for Cygwin signal handling.
>
> This is pure speculation.  Please explain to me how DuplicateHandle,
> which duplicates a *local* handle of a locally opened directory, which
> is stored in the PEB's CurrentDirectoryHandle under RtlAcquirePebLock
> conditions, should interfere with CreatePipe or, FWIW, SetCurrentDirectory.
>
> The worst thing which could happen is that the handle in
> PEB->CurrentDirectoryHandle has another value and a subsequent
> CreateFile w/ a relative path fails (but not Cygwin's calls to
> NtCreateFile).  I don't see any chance that one of the pipe handles is
> suddenly stored in phdl == the address of the PEB's cwd handle.  And, if
> DuplicateHandle fails, the local handle is stored in the PEB directly.
> Still, all of that is done under RtlAcquirePebLock condition.
>
> And then again, if CreatePipe() stores a global handle to the pipe
> filesystem (I don't know, I never observed the CreatePipe call more
> closely), it will probably store the handle in the PEB.  If it does,
> it will use RtlAcquirePebLock/RtlReleasePebLock as well.

The PEB lock in cwdstuff::set() can only prevent the race if other
threads always hold the PEB lock while invoking system calls that
allocate handles.  CreateFile() is pretty big, so I might have missed
it, but I did not see CreateFile() holding the PEB lock while actually
creating the new file handle.  Instead, after actually creating the
new file handle, it called _RtlReleaseRelativeName@4 on
a reference-counted object that holds a copy of the current
directory handle (see below for more about such reference-counted
current directory objects).  So I think one can substitute
CreateFile() for CreatePipe() in step 3 above (provided you
delete step 1), in order to trigger a change in the current
directory handle value.

Now, where is Windows stuffing the extra copy of the directory handle?
In those reference-counted heap-allocated directory objects.  Stepping
through the machine code under Windows 7 I see SetCurrentDirectory()
updating a global pointer ("ntdll!RtlpCurDirRef") to one such object,
under the protection of a critical section ("ntdll!FastPebLock").
Despite mentioning "Peb" in the name, neither global's address is
computed using "FS:".  The global pointer points to a heap-allocated
block that starts with a reference count followed by a copy of the
directory handle value, and apparently includes other data as well.
SetCurrentDirectory() swaps in a new such block, and decrements the
counter on the old block, and if that reaches 0, closes the handle.
Anyway, this block appears to be where the old current directory
handle value persists past the changes made by cwdstuff::set().

So, I think what is happening in the test code I gave above is:

1. The main thread starts chdir("/") and gets as far as NtClose()
on the old directory handle value, hereafter called OldValue.

2. Inside CreateFile(), the second thread actually makes the system
call that opens "foobar".  It gets OldValue as the new handle value.

3. The main thread calls DuplicateHandle(), gets another handle
value for the current directory, and stores it in the PEB
but not the reference-counted current directory object
pointed to by "ntdll!RtlpCurDirRef".

4. The second thread calls CreateFile() again, but gets confused
because it tries to use OldValue to interpret the relative path
"foobar".

Now, how we can find the address of "ntdll!RtlpCurDirRef",
so that we can update it with the new handle, I am not sure.
But does the above adequately explain the difficulty?

Thanks for your concern,
John

[Technical note about this email: it is not threaded properly; hopefully that will change in future responses.  Sorry for any inconvenience.]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list