This is the mail archive of the
mailing list for the Cygwin project.
Re: Re: Re: Debugging help for fork failure: resource temporarily unavailable
- From: Ryan Johnson <ryanjohn at ece dot cmu dot edu>
- Cc: Jon TURNEY <jon dot turney at dronecode dot org dot uk>, cygwin at cygwin dot com
- Date: Wed, 13 Apr 2011 18:21:27 -0400
- Subject: Re: Re: Re: Debugging help for fork failure: resource temporarily unavailable
- References: <4DA5EF8C.email@example.com>
On 2:59 PM, Ryan Johnson wrote:
On 2:59 PM, Jon TURNEY wrote:
I look forward to reading your patches :-)
I think it's still rather premature to be cooking up a patch,
unfortunately -- I'm not convinced I know yet where the real problem
lies. Without some data to back up my speculation (which seems hard to
come by), any patch I might write would have a high probability of
joining other accumulated band-aids such as reserve_upto().
Open questions (for my ignorant self, at least) include:
- Does Windows always load a given dll at the same address when its
base address is already occupied?
- Does fork() always load DLLs in the same order that the parent
loaded them? This would probably be helpful to know even in cases
where no error arises, because it's a necessary precursor to fork
failures, and the code seems to assume it's true.
- Is it ever possible for fork() to unload BLODA dlls?
- Do injected dlls arrive before or after statically-linked dlls? Or
can it be either one?
- At fork time, does cygwin mogrify some generic child process to look
like the parent, or is the child another "normal" run of the parent's
executable image followed by plastic surgery to make heap, stack, etc.
match? I had been assuming the former, but should probably ask.
Update: I wrote a very simple program whose main() prints out the
contents of /proc/self/maps, forks, calls foo() and bar(), and finally
(if the parent) calls wait().
The trick is, foo() and bar() reside in cygfoo.dll and cygbar.dll
respectively, which I compiled to have the same base address: 0x66000000.
The running binary often, but not always, results in those annoying
"exception::handle: Exception: STATUS_ACCESS_VIOLATION" messages (the
process otherwise appears to complete normally most of the time).
However, once in a while the child fails to spawn, with no particular
error message to advertise that fact.
Running inside gdb (inside a plain cygwin window) gives the following
(I'm on Win7 x64, with all the latest packages as of yesterday afternoon):
GNU gdb 22.214.171.12480328-cvs (cygwin-special)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-cygwin".
(gdb) file fork
Reading symbols from /home/Ryan/experiments/fork-tests/fork...done.
Starting program: /home/Ryan/experiments/fork-tests/fork
[New thread 8864.0x2120]
Error: dll starting at 0x77190000 not found.
Error: dll starting at 0x75650000 not found.
Error: dll starting at 0x77190000 not found.
Error: dll starting at 0x76d20000 not found.
[New thread 8864.0x2710]
+ + + bar.cpp init
+ + + foo.cpp init
+ + + fork.cpp init
00400000-00410000 rw-s 00401000 2C36:17C8 33776997205430206
775E0000-77760000 r-xs 00000000 2C36:17C8 281474976927378
75650000-75760000 r-xs 756632D3 2C36:17C8 281474976927037
75350000-75396000 r-xs 75357478 2C36:17C8 281474976925120
66000000-66012000 rw-s 660011F0 2C36:17C8 3940649674730545
61000000-61450000 r-xs 6106F960 2C36:17C8 844424930325032
75A60000-75B00000 r-xs 75A749E5 2C36:17C8 281474976927159
75050000-750FC000 rw-s 7505A472 2C36:17C8 281474976749314
76840000-76859000 r-xs 76844975 2C36:17C8 281474976749841
76750000-76840000 r-xs 76760569 2C36:17C8 281474976924963
74CD0000-74D30000 r-xs 74CEA3B3 2C36:17C8 281474976924512
74CC0000-74CCC000 r-xp 74CC10E1 2C36:17C8 281474976748415
67F00000-67F0F000 rw-s 67F08920 2C36:17C8 562949954003711
6C480000-6C545000 rw-s 6C485110 2C36:17C8 562949954003739
002B0000-002C2000 rw-p 002B11F0 2C36:17C8 2533274791177101
753A0000-754A0000 rw-p 753BB6ED 2C36:17C8 281474976926904
74FC0000-75050000 rw-p 74FD6343 2C36:17C8 281474976926610
754D0000-754DA000 rw-p 754D36A0 2C36:17C8 281474976749103
757D0000-7586D000 rw-p 75803FD7 2C36:17C8 281474976927082
754E0000-75540000 r-xp 754F158F 2C36:17C8 281474976924115
74EF0000-74FBC000 rw-p 74EF168B 2C36:17C8 281474976749206
76980000-76985000 rw-p 76981438 2C36:17C8 281474976749672
0 [main] fork 9472 exception::handle: Exception:
559 [main] fork 9472 open_stackdumpfile: Dumping stack trace to
0 [main] fork 9132 exception::handle: Exception:
525 [main] fork 9132 open_stackdumpfile: Dumping stack trace to
0 [main] fork 7812 exception::handle: Exception:
531 [main] fork 7812 open_stackdumpfile: Dumping stack trace to
0 [main] fork 7648 exception::handle: Exception:
521 [main] fork 7648 open_stackdumpfile: Dumping stack trace to
0 [main] fork 1960 exception::handle: Exception:
657 [main] fork 1960 open_stackdumpfile: Dumping stack trace to
0 [main] fork 4480 exception::handle: Exception:
914 [main] fork 4480 open_stackdumpfile: Dumping stack trace to
0 [main] fork 8864 fork: child -1 - died waiting for longjmp
before initialization, retry 0, exit code 0x600, errno 11
Parent after fork (child: -1)
* * * fork.cpp fini
* * * foo.cpp fini
* * * bar.cpp fini
Program exited normally.
The above raises several interesting questions:
1. Why doesn't /proc/self/maps contain all the dlls gdb complains about?
x7565 is kernel32.dll, but there's no sign of x7719 or x76d2. I tried
nirsoft's 'InjectedDLL' but none of the dlls it finds have those bases,
and windbg doesn't report them either.
2. What determines which of the many bad things can happen at fork()
time? I've seen "resource temporarily unavailable", "died waiting for
longjmp" , and now this "STATUS_ACCESS_VIOLATION" (which invariably
happens an even number of times but is usually not fatal) ?
3. What code is raising the access violation, and is there a way to make
gdb catch it?
(gdb) catch load
catch of library loads not yet implemented on this platform
(gdb) catch throw
Function "__cxa_throw" not defined.
(gdb) catch exception
Unable to insert catchpoint. Is this an Ada main program?
(gdb) catch signal SIGSEGV
Catch of signal not yet implemented
4. Strace shows that each pair of access violations corresponds to a
failed attempt at forking. I guess after three failures cygwin gives up
and triggers the waiting-for-longjmp error?
Unfortunately I haven't been able to reproduce the resource unavailable
flavor of error yet...
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple