Improvements to fork handling (2/5)

Ryan Johnson ryan.johnson@cs.utoronto.ca
Tue May 24 20:19:00 GMT 2011


On 23/05/2011 3:31 AM, Corinna Vinschen wrote:
> On May 22 14:42, Ryan Johnson wrote:
>> On 21/05/2011 9:44 PM, Christopher Faylor wrote:
>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>> Hi all,
>>>>
>>>> This patch has the parent sort its dll list topologically by
>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>>>> in dependencies automatically, and the latter would then not benefit
>>> >from the code which "encourages" them to land in the right places.  The
>>>> dependency tracking is achieved using a simple class which allows to
>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>> The code currently rebuilds the dependency list at every fork rather
>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>> will still impose the usual risk of address space clobbers.
>>> This seems CPU and memory intensive during a time for which we already
>>> know is very slow.  Is the benefit really worth it?  How much more robust
>>> does it make forking?
>> Topological sorting is O(n), so there's no asymptotic change in
>> performance. Looking up dependencies inside a dll is *very* cheap
>> (2-3 pointer dereferences per dep), and all of this only happens for
>> dynamically-loaded dlls. Given the number of calls to
>> Virtual{Alloc,Query,Free} and LoadDynamicLibraryEx which we make, I
>> would be surprised if the topsort even registered.  That said, it is
>> extra work and will slow down fork.
>>
>> I have not been able to test how much it helps, but it should help
>> with the test case Jon Turney reported with python a while back [1].
>> In fact, it was that example which made me aware of the potential
>> need for a topsort in the first place.
>>
>> In theory, this should completely eliminate the case where us
>> loading one DLL pulls in dependencies automatically (= uncontrolled
>> and at Windows' whim). The problem would manifest as a DLL which
>> "loads" in the same wrong place repeatedly when given the choice,
>> and for which we would be unable to VirtualAlloc the offending spot
>> (because the dll in question has non-zero refcount even after we
>> unload it, due to the dll(s) that depend on it.
> There might be a way around this.  It seems to be possible to tweak
> the module list the PEB points to so that you can unload a library
> even though it has dependencies.  Then you can block the unwanted
> space and call LoadLibrary again.  See (*) for a discussion how
> you can unload the exe itself to reload another one.  Maybe that's
> something we can look into as well.  ObNote:  Of course, if we
> could influnce the address at which a DLL gets loaded right from the
> start, it would be the preferrable solution.
I tested that approach (LoadCount == Reserved5[1] and Flags == 
Reserved5[0] in struct _LDR_DATA_TABLE_ENTRY) and while it lets us 
unload statically-linked .dlls it doesn't unload the .exe any more -- 
setting Flags=4 as recommended has no effect on my w7-x64 machine, nor 
does setting Flags=0x80080004 to match other dlls' flags.

In retrospect, I don't think unloading dlls is going to be very helpful 
if it leaves the .exe with thunks pointing to stale addresses (and 
there's still the business of reloading the .exe afterward). I guess if 
none of the thunks had been triggered yet, we might be able to get away 
with it, but that sounds risky. We might try copying .idata across from 
the parent, but that would clobber any thunks to windows dlls which 
changed locations.

Ryan



More information about the Cygwin-patches mailing list