This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance problems


On Sun, Jun 05, 2005 at 08:00:44PM -0700, Linda W wrote:
>Christopher Faylor wrote:
>>On Sat, Jun 04, 2005 at 03:00:13PM -0700, Linda W wrote:
>>>You are technically accurate, but the cygwin layer is a POSIX
>>>complient-OS emulation layer by some definition, no?
>>
>>Yes, but that has nothing to do with caching.  Cygwin is just a DLL.  It
>>can't monitor all file transactions in the whole system.
>
>True, but cygwin doesn't need to monitor the entire OS -- neither
>does Windows. Take a look at the open file descriptors held by
>the winlogon process sometime -- it holds open OS-specific
>directories and files.

I am talking about cache coherency.  The OS can maintain it because it
knows what files are being updated.  Cygwin can't.  If cygwin opens a
file and another unrelated process modifies it, cygwin's cached
information would be wrong.  This is a simple statement of fact.

>Cygwin would only need to "cache" items (in the sense I would
>anticipate) while the DLL is loaded and only those file items
>that are being used by the current program.  For example a simple
>find command on /tmp "find /tmp" produces 17 lines:
>/tmp
>/tmp/d.txt
>/tmp/run-crons.ZE1996
>/tmp/run-crons.ZE1996/run-crons.1924
>/tmp/run-crons.ZE1996/run-crons.daily.1924
>/tmp/588-reg.reg
>/tmp/1892-reg.reg
>/tmp/VolumeC.txt
>/tmp/xyz.txt
>/tmp/wd.txt
>/tmp/d1.txt
>/tmp/xyz.txt.orig
>/tmp/AUTORUN.INF
>/tmp/WD_Data.ICO
>/tmp/WD_Install.exe
>/tmp/img1
>/tmp/1
>============
>In all there were 311 file operations to list these 17 files.
>They break down as folows:
>1-27 - finding program by bash
>28-48 - loading libraries
>49-75 - processing "C:\, C:\home and C:\home\username
>76-243 - working on tmp
>244-311 - accessing home directory; search for psapi.dll & close of /tmp
>
>The ones working on tmp were broken down as follows:
>
>The first 27 were processing by bash to find "find.exe". Ignore.
>Commands up to 28-48 were loading cygwin libraries by the find
>command; Ignore that.
>Commands 49-75 Involved file ops (Open, Query Info, Directory on the
>paths C:\, C:\home\ and C:\home\user).   Calls 76-243 seem to be working
>on /tmp, calls.  The tmp calls (executing between time index 51.995 - 
>51.005 (<1 clock tick), show the following breakdown:
>
>     1 C:\home\law, QUERY INFORMATION
>     1 C:\tmp\d.txt, READ
>     2 C:\home\law, CLOSE
>     2 C:\home\law, OPEN
>     2 C:\tmp\d.txt, CLOSE
>     2 C:\tmp\d.txt, OPEN
>     3 C:\tmp\d.txt, QUERY INFORMATION
>     5 C:\tmp\run-crons.ZE1996\, CLOSE
>     5 C:\tmp\run-crons.ZE1996\, OPEN
>     6 C:\tmp\run-crons.ZE1996, QUERY INFORMATION
>     7 C:\, CLOSE
>     7 C:\, DIRECTORY
>     7 C:\, OPEN
>     8 C:\tmp\run-crons.ZE1996, CLOSE
>     8 C:\tmp\run-crons.ZE1996, OPEN
>    10 C:\tmp, QUERY INFORMATION
>    12 C:\tmp\, CLOSE
>    12 C:\tmp\, OPEN
>    13 C:\tmp, CLOSE
>    13 C:\tmp, OPEN
>    15 C:\tmp\run-crons.ZE1996\, DIRECTORY
>    28 C:\tmp\, DIRECTORY
>
>So if I was wanting to cache -- say limit caching to ~.1-1 seconds,

<And as soon as you start timing out your cache, you either have a
separate thread running which manages this (which implies careful
attention to locking issues and context switching) or you a schedule
  timer signal (which has similar problems).)

>it would appear, on the surface, to possibly reduce the 169 calls to
>maybe 22?

You really can't predict without looking at the code.  There is no way
of knowing what the above information represents as far as what cygwin
and find are doing.

>>You can't do that without taking the fact that the handle is open into
>>account when cygwin itself removes a file, opens a file, renames a file.
>> 
>You can't? It would seem the cygwin library, itself could maintain
>it's own list of open descriptors and close them when needed.  Doesn't
>cygwin use a shared-memory region for interprocess communication? 
>Couldn't this same region be used for the File-handle/info cache so
>multiple cygwin processes would behave with each other?

I was filling in the details here just to show that the solution of
keeping files open has consequences.  Keeping the file open increases
the complexity of every function which manipulates a file rather than
the one or two functions which might be interested in the cached status
information.

>>And it could be pretty surprising to find that when process a does an
>>opendir/readdir, process b is now unable to delete a file.
>> 
>I'm not 100% certain, but I believe having a file (or dir) open
>for read doesn't mean someone can't change the contents.  They
>just can't delete the dir or file that is still opened for reading.

Which is why I said "unable to delete a file".

>This is already a problem even w/o caching.  Cygwin can't delete
>various directories because they are kept open by the login shell.

Being unable to consistently delete a file because something has it open
is explainable.  What isn't explainable is "Why does my configure script
work some times but not others?"  When you talk about keeping caching information
around, you stand the chance of something like this not working:

  find . -name foo | xargs rm

because find may still have foo open when rm tries to remove it.

That may not be a huge deal for a FS/OS which honors DELETE_ON_CLOSE and
will be able to delete "foo" regardless, but this introduces another
potential place for code complexity.

cgf

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]