Improvements to fork handling (2/5)

Ryan Johnson
Tue May 24 16:47:00 GMT 2011

On 24/05/2011 12:14 PM, Corinna Vinschen wrote:
> On May 22 14:42, Ryan Johnson wrote:
>> On 21/05/2011 9:44 PM, Christopher Faylor wrote:
>>> On Wed, May 11, 2011 at 02:31:37PM -0400, Ryan Johnson wrote:
>>>> Hi all,
>>>> This patch has the parent sort its dll list topologically by
>>>> dependencies. Previously, attempts to load a DLL_LOAD dll risked pulling
>>>> in dependencies automatically, and the latter would then not benefit
>>> >from the code which "encourages" them to land in the right places.  The
>>>> dependency tracking is achieved using a simple class which allows to
>>>> introspect a mapped dll image and pull out the dependencies it lists.
>>>> The code currently rebuilds the dependency list at every fork rather
>>>> than attempt to update it properly as modules are loaded and unloaded.
>>>> Note that the topsort optimization affects only cygwin dlls, so any
>>>> windows dlls which are pulled in dynamically (directly or indirectly)
>>>> will still impose the usual risk of address space clobbers.
>>> This seems CPU and memory intensive during a time for which we already
>>> know is very slow.  Is the benefit really worth it?  How much more robust
>>> does it make forking?
>> Topological sorting is O(n), so there's no asymptotic change in
>> performance. Looking up dependencies inside a dll is *very* cheap
> Btw., isn't the resulting dependency list identical to the list
> maintained in the PEB_LDR_DATA member InInitializationOrderModuleList?
> Or, in other words, can't we just use the data which is already
> available?
I read somewhere that dll initialization is not guaranteed to happen in 
any particular order, and from what I've seen so far I believe it.

I think that's one reason (among many) why cygwin has to factor the 
user's initialization routines out from normal dll init function: they 
might call functions in other dlls which might not have been initialized 
yet. From what I can tell, though, mapping of all dlls in a batch 
completes before any initialization routines run.

Even assuming I'm wrong and dependency order === initialization order, 
we'd still have to find a way to isolate those dlls which are both 
cygwin-aware and dynamically loaded, because those are the only ones we 
care about. Doing that would also be expensive because we'd be searching 
the cygwin dll list for each dll in the PEB's list.

The best way to improve performance of this part of fork() would be to 
figure out how to force a dll to load in the right place on the first 
try. Achieving this admittedly "difficult" task would eliminate multiple 
syscalls per dll, the aggregate cost of which dominates the topsort into 
oblivion unless I'm very mistaken.


More information about the Cygwin-patches mailing list