OT: possible project/research project

Randall R Schulz rrschulz@cris.com
Wed Mar 20 08:24:00 GMT 2002


Rob,

More...


At 01:33 2002-03-20, Robert Collins wrote:
>Randall,
>responses inline..
>
> > -----Original Message-----
> > From: Randall R Schulz [mailto:rrschulz@cris.com]
> > Sent: Wednesday, March 20, 2002 7:34 PM
>
> > >Well we still have that basic separate - bash's builtin's
> > for example.
> > >If
> > >it's not builtin, it needs a sub process.
> >
> > That's not quite right. Built-ins still need sub-processes if
> > they're going
> > to operate in a pipeline or are enclosed within parentheses.
>
>Ok. So if it's not builtin, or it's a builtin that needs to be
>pipelined/parentisised it requires a sub-process. That sounds like
>something that a patch to the relevant shell might provide some easy
>wins.

Eh? What's to be patched? The shell built-ins already do this. They 
wouldn't work if they didn't.


> > >sub process's after all) -  but we have the source so....
> >
> > How will your magical push_context protect from wild pointer
> > references, e.g.?
>
>If that becomes a problem, I'd suggest that dll's get loaded on page
>boundaries and we protect the non-permitted address space with
>read-only, and install an exception handler that unprotects and restores
>context. It may be that handling that is not worth the development time
>- so reliability could be an issue.

The Win32 API allows read/write protections to be altered dynamically 
within a process?

This alone will require operating on many, probably a majority of the 
process's page table entries, which is going to start to cost something 
like as much as a vfork or copy-on-write fork.


> > >The fork()/exec() model bites. Sorry, but it does. fork()
> > based servers
> > >for instance run into the galloping herd - and scale very
> > badly. The other
> > >use for fork -the fork/exec combination is better achieved
> > with spawn()
> > >which is designed to do just that one job well. It also
> > happens to work
> > >very well on cygwin, and I see no reason to change that. So
> > spawned apps
> > >will remain completely separated and independent.
> >
> > Servers are not shells. Why should they fork at all? That's
> > what threads
> > are for. It's also why CGI (without something like mod_perl)
> > is not a good
> > thing and the Java server model has significant advantages.
>
>Exactly... my point is that the fork/exec model has no innate use.
>vfork/execve does - which is what spawn (look under posix_spawn() for
>the offical spawn these days) accomplishes.

Vfork() is a hack that goes back to the first BSD ports of Unix to the Vax. 
The proper way to do it is transparently, with copy-on-write in the fork() 
call.


> > Are you planning on incorporating your scheme into every
> > program that runs
> > sub-processes on a regular basis? How likely is it that what
> > works in one
> > shell will work in another or in a server?
>
>No. I'm not trying to create a new operating environment, I'm trying to
>address a common-case issue. If I can get certain configure scripts to
>run in under 30 minutes on my machine here, I'd be very happy. As for
>portability to different shells, or even to servers, I'd suggest that
>keeping the API very simply and clean - much like the sub process model
>is simple and clean would encourage such re-use.

So have you profiled the code to know how much of the time in build goes 
into forking? If you lowered the cost to zero, how much would you save?

You're just not going to get simpler and cleaner than the fork/exec model! 
Likewise for encouraging re-use.


> > I don't know the details of spawn(). How does it accomplish
> > I/O redirection?
>
>int posix_spawn(pid_t *restrict pid, const char *restrict path,
>const posix_spawn_file_actions_t *file_actions,
>const posix_spawnattr_t *restrict attrp,
>char *const argv[restrict], char *const envp[restrict]);
>
>Is the prototype. If file_actions is null, the the new process gets a
>copy of the parents fd table. If it's not null, then it provides the fd
>table for the new process.
>
> > Obviously if you add something, the old stuff isn't
> > (necessarily) lost. I'm
> > just saying that the fork/exec process model is simple,
> > elegant, available,
> > universal and fully functional in all POSIX systems. Your
> > model is a horse
> > of another color and any given command that would avail itself of the
> > supposed benefits of your scheme must be recast into a library that
> > conforms to the requirements of your embedded task model.
>
>Yes. Which is a significant impediment right from the word go. Which
>should go some way to explaining my ambivalence on this idea. However
>the building blocks to use this model are present and functional on all
>POSIX systems, so there's no reason to assume we couldn't 'make it
>work'.

What are these "building blocks?"

Let me be clear. I think this probably _can_ be done (it's a SMOP, after 
all), but that it shouldn't be done because it's not worth doing in from 
the perspective of a rational, complete and accurate cost / benefit 
analysis, including both the up-front programming costs and the ongoing 
maintenance costs.


> > It doesn't prevent it, but to avail ones self of the putative
> > benefits of
> > your proposed scheme, a significantly different programming
> > model has to be
> > learned and used. All for what? A tiny incremental
> > improvement in program
> > start-up times on a single platform and one or two
> > pre-ordained shells?
>
>Huh? That's an assumption. I'd hope I could achieve librarisation as
>simply as casting main to lib_main, and providing link time replacements
>for exit() and _exit() and fatal(). Then the real-binary doesn't use
>those link time replacements.

What about the C runtime startup actions? Heap (malloc) initialization? 
Standard I/O table initialization? Probably lots of other startup actions 
about which I know nothing whose code assume their operating in a 
post-exec() context.

Offhand, it seems like this is a hairy beast indeed.


> > How much time do they save? That's for you to claim and
> > substantiate. I'm
> > not trying to justify or validate your project, I'm trying to
> > repudiate it.
>
>I can tell. I'm not trying to defend it, as that assumes that it is
>defendable. I'm discussing it in a neutral (ish) light, I hope. I am
>trying to provide responses to the specific points you make as part of
>that discussion.

Please be intellectually honest. You've got and idea and you're trying to 
defend it. There's nothing wrong with that. If you didn't have an 
Australian email address, I'd think you were deluded by that 
all-too-American desire for "objectivity."


> > But consider this: By the time you complete this task, the
> > upward march of
> > system speeds (CPU and I/O) will probably have done more to improve
> > elapsed-time performance of command invocation than your
> > improvements are
> > going to achieve.
>
>Straw poll, who here has and uses a machine more than 2 years old right
>now? My hand goes up, as does my girlfriends, and my firewall. (My PC
>happens to be a dual processor, but still). Also, consider that as
>system speeds increase, so does the functionality. We may find MS
>polling internet servers on process startup or something equally
>ridiculous that drastically increase process startup speed. Certainly
>system policies now play a part, as each process startup has to be
>tested against an arbitrarily long list of rules. And don't talk about
>virus scanners.

I knew you'd bring that up, but it's not a valid argument. You don't have 
to have the latest hardware to be climbing the curve of rising system power 
that is happening throughout the computer industry. As a group, users are 
trading up as better hardware becomes available, even if they're trading up 
one or two years behind the curve of the latest and fastest.


> > And five staff-minutes per user per month? You think that's
> > significant?
> > What would you do with those five minutes spread throughout
> > the month?
> > That's right: Nothing, 'cause you'd get it in
> > fraction-of-a-second parcels.
>
>Well that's an assumption. For me, I'd get it running configure scripts,
>which is in far bigger chunks than fraction of a second.

But the gain is still incremental. And both the current cost and that of 
the "new and improved scheme" are still just unknowns.


> > Lastly, you'll have to have an ongoing effort to port changes
> > from the
> > stand-alone original versions of the commands to your
> > embedded counterparts.
>
>No - sounds like you haven't been paying attention. In my very first
>email I pointed out that this was not an acceptable approach, and that
>committing changes upstream would be the only meaningful way of doing
>this.

Are you saying you think you're going to convince the maintainers of these 
special programs that have been endowed with the ability to operate 
parasitically in your special version of the shell to let you put these 
changes into their mainline code bases? Good luck!


> > >I'd guess at ash, as that's the smallest shell we have, but if it's
> > >easier
> > >with bash, then I see no reason not to - as this would be a /bin/sh
> > >replacement - if the benefits were to be realised.
> >
> > How many people use such a bare-bones shell? Unless you
> > modify them all,
> > there will be a sizeable user contingent that does not
> > benefit from your
> > efforts.
>
>Nearly everyone here does - most scripts have #!/bin/sh in the header.

Perhaps. I do, but only until I want to use a BASH feature that ash doesn't 
have.


> > I think you need a good technical justification for the effort you'll
> > expend relative to the benefits you're going to gain and the
> > detriments
> > you're going to incur.
>
>Absolutely. The problem domain needs further refinement, a lit search is
>needed, some rough test cases /mock upss to provide a rule-of-thumb idea
>about the potential returns, cygwin needs serious profiling to
>understand if my assumptions about performance are correct. Lotsa work
>to do this right.
>
> > As with all optimizations, you must measure the cost of the
> > current code
> > and that of replacement. In this case, you could possibly
> > mock up a test
> > jig that did DLL loading and compare that with the cost of
> > fork / exec. But
> > that would not include the unknown costs of your putative
> > push_context /
> > pop_context mechanism.
>
>Absolutely. In fact
>"
>Rules of Optimization:
>Rule 1: Don't do it.
>Rule 2 (for experts only): Don't do it yet.
>- M.A. Jackson
>
>"More computing sins are committed in the name of efficiency (without
>necessarily achieving it) than for any other single reason - including
>blind stupidity."
>- W.A. Wulf
>
>"We should forget about small efficiencies, say about 97% of the time:
>premature optimization is the root of all evil."
>- Donald Knuth
>
>"The best is the enemy of the good."
>- Voltaire "

Yes, yes. I've been around long enough to have heard all of these.

Don't forget this:

"... this is the best of all possible worlds."
  -- Voltaire


>With assembly credit to
>http://www-2.cs.cmu.edu/~jch/java/optimization.html
>
> > "The proof of the pudding is in the eating." So until you've
> > done it, you
> > won't know for an empirical fact if it's a win and if so how
> > much of a win
> > it is.
>
>Sure.
>
>Rob


Randall Schulz
Mountain View, CA USA


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list