[PATCH] Cygwin: malloc tune-up

Corinna Vinschen corinna-cygwin@cygwin.com
Mon Aug 24 09:48:43 GMT 2020


Hi Mark,

On Aug 23 21:59, Mark Geisert wrote:
> 1. Replace global malloc lock with finer-grained internal locks.
> 2. Enable MSPACES, i.e. thread-specific memory pools.
> 
> Porting and testing of several dlmalloc-related malloc implementations
> (ptmalloc, ptmalloc2, ptmalloc3, nedalloc) shows that they are no faster
> than the current dlmalloc used by Cygwin, when the latter is tuned.  This
> could be because the others are forks of earlier dlmalloc versions.  For

That also means, there may be a point to update dlmalloc to a newer
version at one point again.  If you have fun to do so...

> the record, I could not get jemalloc or tcmalloc running at all due to
> chicken-egg issues, nor could I get a Win32 Heap-based malloc to work
> across fork(), which was expected.
> 
> I think I can see a way to get Win32 Heap-based malloc to work across
> fork()s, but it would depend on undocumented info and would likely be
> subject to breakage in future Windows versions.  Too bad, because that
> form of malloc package would be 2 to 8 times faster in practice.

Which is kind of strange, given malloc is all about heap management.
Given a sufficiently big heap, how can a well-written malloc be slower
than Win32 HeapAlloc?  Puzzeling...

> ---
>  winsup/cygwin/cygmalloc.h       |  21 +++-
>  winsup/cygwin/fork.cc           |   7 --
>  winsup/cygwin/malloc_wrapper.cc | 163 +++++++++++++++++++-------------
>  3 files changed, 113 insertions(+), 78 deletions(-)

I think this is a great idea.  I have a few nits, though:

> diff --git a/winsup/cygwin/cygmalloc.h b/winsup/cygwin/cygmalloc.h
> index 84bad824c..302ce59c8 100644
> --- a/winsup/cygwin/cygmalloc.h
> +++ b/winsup/cygwin/cygmalloc.h
> @@ -33,15 +33,30 @@ void *mmap64 (void *, size_t, int, int, int, off_t);
>  # define mmap mmap64
>  # define MALLOC_FAILURE_ACTION	__set_ENOMEM ()
>  # define USE_DL_PREFIX 1
> +# define USE_LOCKS 1
> +# define MSPACES 1
> +# define FOOTERS 1
>  
>  #elif defined (__INSIDE_CYGWIN__)
>  
> -# define __malloc_lock() mallock.acquire ()
> -# define __malloc_unlock() mallock.release ()
> -extern muto mallock;
> +# define MSPACES 0

That means MSPACES is 0 in malloc_wrapper.cc.  This code is not building
the MSPACES stuff into malloc_wrapper.cc, but the now non-thread-safe
direct dlfoo calls.  Just add `#error foo' to some of the `#if MSPACES'
code and you'll see you're (probably) not building what you think you're
building.

>  #endif
>  
> +#if MSPACES
> +void __reg2 *create_mspace (size_t, int);
> +void __reg2 mspace_free (void *m, void *p);
> +void __reg2 *mspace_malloc (void *m, size_t size);
> +void __reg3 *mspace_realloc (void *m, void *p, size_t size);
> +void __reg3 *mspace_calloc (void *m, size_t nmemb, size_t size);
> +void __reg3 *mspace_memalign (void *m, size_t alignment, size_t bytes);
> +void __reg2 *mspace_valloc (void *m, size_t bytes);
> +size_t __reg1 mspace_usable_size (const void *p);
> +int __reg2 mspace_malloc_trim (void *m, size_t);
> +int __reg2 mspace_mallopt (int p, int v);
> +void __reg1 mspace_malloc_stats (void *m);
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/winsup/cygwin/fork.cc b/winsup/cygwin/fork.cc
> index 38172ca1e..82f95dafe 100644
> --- a/winsup/cygwin/fork.cc
> +++ b/winsup/cygwin/fork.cc
> @@ -296,8 +296,6 @@ frok::parent (volatile char * volatile stack_here)
>    si.lpReserved2 = (LPBYTE) &ch;
>    si.cbReserved2 = sizeof (ch);
>  
> -  bool locked = __malloc_lock ();
> -

You're guarding the new code below with #if MSPACES and keep the calling
dlmalloc and friends here in fork as well as in the #else branches.  But
these code branches would need the old __malloc_lock/__malloc_unlock stuff,
otherwise they are broken.

Either we can go entirely without the #if MSPACES bracketing, or the
the code must keep using the global lock in the #if !MSPACES case.

>    /* Remove impersonation */
>    cygheap->user.deimpersonate ();
>    fix_impersonation = true;
> @@ -448,9 +446,6 @@ frok::parent (volatile char * volatile stack_here)
>  		   "stack", stack_here, ch.stackbase,
>  		   impure, impure_beg, impure_end,
>  		   NULL);
> -
> -  __malloc_unlock ();
> -  locked = false;
>    if (!rc)
>      {
>        this_errno = get_errno ();
> @@ -533,8 +528,6 @@ cleanup:
>  
>    if (fix_impersonation)
>      cygheap->user.reimpersonate ();
> -  if (locked)
> -    __malloc_unlock ();
>  
>    /* Remember to de-allocate the fd table. */
>    if (hchild)
> diff --git a/winsup/cygwin/malloc_wrapper.cc b/winsup/cygwin/malloc_wrapper.cc
> index 3b245800a..f1c23b4a8 100644
> --- a/winsup/cygwin/malloc_wrapper.cc
> +++ b/winsup/cygwin/malloc_wrapper.cc
> @@ -27,6 +27,33 @@ extern "C" struct mallinfo dlmallinfo ();
>  static bool use_internal = true;
>  static bool internal_malloc_determined;
>  
> +/* If MSPACES (thread-specific memory pools) are enabled, use a
> +   thread-local variable to store a pointer to that thread's mspace.
> + */
> +#if MSPACES
> +static DWORD tls_mspace; // index into thread's TLS array
> +#define MSPACE_SIZE (8 * 1024 * 1024)
> +
> +static void *
> +get_current_mspace ()
> +{
> +  if (unlikely(tls_mspace == 0))
> +    api_fatal ("a malloc-related function was called before malloc_init");
> +
> +  void *m = TlsGetValue (tls_mspace);

Why do we need a Win32 TLS value here?  We're usually having our own tls
area as defined in cygtls.h.  You could just add a

  void *mspace_ptr;

or so to class _cygtls.  The only downside is that the first build will
be very likely broken.  It regenerates tlsoffsets{64}.h, but then newlib
potentially uses wrong offsets.  Therefore, after changing class _cygtls
and regenerating the tls offsets once, a full rebuild should be
performed to be on the safe side.

In terms of using TlsAlloc and friends I have a problem with fork
safety.  The _cygtls area is on the thread's stack, and the stack of the
forking thread is copied over to the child, so the value of mspace_ptr
is intact after fork.  This isn't the case for the TlsAlloc'ed void
pointer, given the TlsAlloc is called anew in every process.  So it
looks like the forked process is losing a pointer.


Thanks,
Corinna


More information about the Cygwin-patches mailing list