This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance problems

Christopher Faylor wrote:

On Sat, Jun 04, 2005 at 03:00:13PM -0700, Linda W wrote:

You are technically accurate, but the cygwin layer is a POSIX
complient-OS emulation layer by some definition, no?

Yes, but that has nothing to do with caching. Cygwin is just a DLL. It
can't monitor all file transactions in the whole system.

True, but cygwin doesn't need to monitor the entire OS -- neither
does Windows. Take a look at the open file descriptors held by
the winlogon process sometime -- it holds open OS-specific
directories and files.

Cygwin would only need to "cache" items (in the sense I would
anticipate) while the DLL is loaded and only those file items
that are being used by the current program.  For example a simple
find command on /tmp "find /tmp" produces 17 lines:
In all there were 311 file operations to list these 17 files.
They break down as folows:
1-27 - finding program by bash
28-48 - loading libraries
49-75 - processing "C:\, C:\home and C:\home\username
76-243 - working on tmp
244-311 - accessing home directory; search for psapi.dll & close of /tmp

The ones working on tmp were broken down as follows:

The first 27 were processing by bash to find "find.exe". Ignore.
Commands up to 28-48 were loading cygwin libraries by the find
command; Ignore that.
Commands 49-75 Involved file ops (Open, Query Info, Directory on the
paths C:\, C:\home\ and C:\home\user). Calls 76-243 seem to be working
on /tmp, calls. The tmp calls (executing between time index 51.995 - 51.005 (<1 clock tick), show the following breakdown:

     1 C:\home\law, QUERY INFORMATION
     1 C:\tmp\d.txt, READ
     2 C:\home\law, CLOSE
     2 C:\home\law, OPEN
     2 C:\tmp\d.txt, CLOSE
     2 C:\tmp\d.txt, OPEN
     3 C:\tmp\d.txt, QUERY INFORMATION
     5 C:\tmp\run-crons.ZE1996\, CLOSE
     5 C:\tmp\run-crons.ZE1996\, OPEN
     6 C:\tmp\run-crons.ZE1996, QUERY INFORMATION
     7 C:\, CLOSE
     7 C:\, DIRECTORY
     7 C:\, OPEN
     8 C:\tmp\run-crons.ZE1996, CLOSE
     8 C:\tmp\run-crons.ZE1996, OPEN
    12 C:\tmp\, CLOSE
    12 C:\tmp\, OPEN
    13 C:\tmp, CLOSE
    13 C:\tmp, OPEN
    15 C:\tmp\run-crons.ZE1996\, DIRECTORY
    28 C:\tmp\, DIRECTORY

So if I was wanting to cache -- say limit caching to ~.1-1 seconds, it would appear, on the surface, to possibly reduce the 169 calls to maybe
22? Lock open for read C:\, then C:\tmp, and C:\tmp\run-crons.Ze1996
while processing those dirs. With open file handles for read, you
can't remove or rename them during that .1-1 seconds. That could
eliminate 147 (87%) of the calls, in best case, almost a 10x speedup.

I wouldn't cache data without keeping the associated handles to the
corresponding file objects open. As long as they are kept open,
Windows would disallow things like deleting the file and replacing
it with a directory. That should control most race conditions
with some degree of relative safety.

You can't do that without taking the fact that the handle is open into
account when cygwin itself removes a file, opens a file, renames a file.

You can't? It would seem the cygwin library, itself could maintain
it's own list of open descriptors and close them when needed. Doesn't
cygwin use a shared-memory region for interprocess communication? Couldn't this same region be used for the File-handle/info cache so
multiple cygwin processes would behave with each other?

And it could be pretty surprising to find that when process a does an
opendir/readdir, process b is now unable to delete a file.

I'm not 100% certain, but I believe having a file (or dir) open
for read doesn't mean someone can't change the contents.  They
just can't delete the dir or file that is still opened for reading.
This is already a problem even w/o caching.  Cygwin can't delete
various directories because they are kept open by the login shell.
Weird and strange dirs like the MSN Gaming Zone that winlogin kept
open even though it was empty (deletable by forcing the winlogon
handle to close).  So I don't see that as a major loss of functionality
as the problem already exists.

She thinks that the benefits would outweigh the tiny possibility of bad
cache data resulting from something like performing an "ls" on a file
and having, e.g., some other process sneak in, remove the file and
introduce a directory, but still having "ls" report file data.

Isn't this already a problem on networked shares? I.e. doesn't
Windows cache file info from network shares for a few seconds (maybe
more if one has local-file caching turned on).

I don't know but, regardless, this would increase the possibility for
surprise to include local disks too. I'm not convinced that this is a
good thing. This would make the behavior that Gary R. Van Sickle
recently reported as the result of using google search (I think it was
google search), where files were kept open even though it seems like
they should be closed, common with cygwin.

I couldn't find a reference to GRSV's report on files being kept
open by a search engine.  However, with the caching proposed for
cygwin, those "file open" opportunities are measured in fractions
of a second.  Caching for 10 milliseconds, might have saved nearly
90% of the calls to Windows.  Windows is hardly a real-time OS where
tolerances need millisecond precision -- the clock defaults to about
a 20Hz clock speed unless you've tweaked it.

However, you spend time writing how no one _ever_ investigates
performance problems or suggests solutions.  That appears to be a
cynical view.  Then, when offered a clear example to the contrary, you
discard the effort as being "unoriginal" and already something that has
been (and is being) considered independantly of their suggestion.

That \could\ be perceived, by some, as "mean-spirited" or "spiteful".
I don't feel that this _encourages_ people to take the time to actually
"figure out" problems nor "figure out" improvements. If they don't
know you, some people might take it personally. :-) (Not that you
would be expected to care, publically :-) ).

You seem to be affronted by something that I said before you even responded. I did not respond to your email with a "you didn't even look at the code" response. I did not say "you are unoriginal". I merely represented our current thinking about the subject that you raised.

I happen to know that Corinna isn't around so I wanted to make sure that
she got the credit for having been thinking about this and even going so
far as to start coding something, I believe. We have been talking about
caching for a long, long time. I believe that there is even an "#if 0"
or two in the cygwin code still which contains my aborted attempt to
cache some path_conv lookups.

I wasn't _really_ affronted, that's why I posed it as a as
both a question about why you felt the need to talk about caching
that was suggested almost 2 years ago and "pre-apologized", if I
misunderstood your meaning.  Credit Smedit...other side of the
blame, and scapegoating.  Politics and 1-upmanship. Can't we just
have "consensus" and agree that something is a good idea?  Nahhh....
not in today's IP (Intellectual Property) climate.  Sigh.

Anyway, I hope you don't become increasingly against the idea
now that I've supported Corinna in her ideas...:-)

I.e. by supporting the idea, I hope I'm not shooting myself
in the foot....:-?


-- Unsubscribe info: Problem reports: Documentation: FAQ:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]