[PATCH] Cygwin: Interim malloc speedup
Mon Jan 18 12:50:39 GMT 2021
On Jan 17 22:47, Mark Geisert wrote:
> Hi Corinna,
> Happy New Year back at you! I'm very glad to see you posting again!
Yeah, I took a longer timeout over the holiday season.
> Corinna Vinschen via Cygwin-patches wrote:
> > Hi Mark,
> > Happy New Year!
> > On Dec 21 20:53, Mark Geisert wrote:
> > > Replaces function-level lock with data-level lock provided by existing
> > > dlmalloc. Sets up to enable dlmalloc's MSPACES, but does not yet enable
> > > them due to visible but uninvestigated issues.
> > >
> > > Single-thread applications may or may not see a performance gain,
> > > depending on how heavily it uses the malloc functions. Multi-thread
> > > apps will likely see a performance gain.
> > > diff --git a/winsup/cygwin/cygmalloc.h b/winsup/cygwin/cygmalloc.h
> > > index 84bad824c..67a9f3b3f 100644
> > > --- a/winsup/cygwin/cygmalloc.h
> > > +++ b/winsup/cygwin/cygmalloc.h
> > > +/* These defines tune the dlmalloc implementation in malloc.cc */
> > > # define MALLOC_FAILURE_ACTION __set_ENOMEM ()
> > > # define USE_DL_PREFIX 1
> > > +# define USE_LOCKS 1
> > Just enabling USE_LOCKS looks wrong to me. Before enabling USE_LOCKS,
> > you should check how the actual locking is performed. For non WIN32,
> > that will be pthread_mutex_lock/unlock, which may not be feasible,
> > because it may break expectations during fork.
> I did investigate this before setting it, and I've been running with
> '#define USE_LOCKS 1' for many weeks and haven't seen any memory issues of
> any kind. Malloc multi-thread stress testing, fork() stress testing, Cygwin
> DLL builds, Python and binutils builds, routine X usage; all OK. (Once I
> straightened out sped-up mkimport to actually do what Jon T suggested,
> > What you may want to do is setting USE_LOCKS to 2, and defining your own
> > MLOCK_T/ACQUIRE_LOCK/... macros (in the `#if USE_LOCKS > 1' branch of
> > the malloc source, see lines 1798ff), using a type which is non-critical
> > during forking, as well as during process initialization. Win32 fast
> > R/W Locks come to mind and adding them should be pretty straight-forward.
> > This may also allow MSPACES to work OOTB.
> With '#define USE_LOCKS 1' the tangled mess of #if-logic in malloc.cc
> resolves on Cygwin to using pthread_mutex_locks, so that seems to be OK
> as-is unless what you're suggesting is preferable for speed (or MSPACES when
> I get to that).
Admittedly, I'm not sure if pthread mutexes pose a problem here, I'm
Malloc locking is single-process only and pthread mutexes are adding some
unnecessary overhead (Event object, bookkeeping list, fixup_after_fork
handling). Win32 SRW Locks, especially the exclusive type, is much
faster and also easy to use, unless you need recursive locking.
More information about the Cygwin-patches