This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: NtCreateProcess redux


On 4/25/2011 12:33 PM, Ryan Johnson wrote:
I know that folks have looked before into NtCreateProcess as a way of
doing a real fork() in cygwin, but it's very unclear from the various
list archives why it's still a bad idea today, other than its being
undocumented.

It's a bad idea because it doesn't work. You can certainly create a forked child with NtCreateProcess, but without being able to connect it to csrss and the rest of the win32 subsystem, this new process is useless. NtCreateProcess-fork works for Interix because it has its own NT subsystem, but Cygwin has to live within win32, and I don't think creating a new subsystem is feasible for anyone without access to the NT source.


If there's no interest in revisiting NtCreateProcess, I have some really
crazy ideas to offer, but they would still leave us copying whole
address spaces and trying to outsmart Windows along the way.

As (I think) cgf once said, Cygwin has been around for a long time, and most of the crazy ideas that didn't make it into the code were weighed, judged, and found wanting for one reason or another. I have some crazy ideas of my own, mostly involving using shared sections instead of NtCopyVirtualMemory to duplicate memory, but I haven't had time to implement them[1].


As far as the address space issue goes: when NT creates a new process, the loader, in ntdll, gains control before the entry point is ever called, and this loader is what's responsible for the initial VM layout. Because ntdll is a "known dll", you can't replace it with a friendlier implementation. After the loader completes its work, the kernel does some black magic and resets the initial thread's stack so that it begins executing in the ntdll thread startup routine, so you never actually _see_ the loader executing.

The only thing that might have a chance of working is to unload everything except user32, kernel32, and a few other components, then start fresh with a more constrained module loading strategy.

[1] If process A has section S, the contents of which we'd like to duplicate in child-process B as S', and B inherits a handle to S, it's slower to remap S in B and memcpy it to S' than it is to just initialize S' from A's address space with NtCopyVirtualMemory. But that's the single-threaded case. It turns out that if we have the child map S somewhere and have one thread touch S[0], S'[0], S[4096], S'[4096], etc. while another thread does a mempcy from S to S', we handily beat the NtCopyVirtualMemory approach.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]