This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: Use generic memset for RTLD for ppc32/64
- From: Luis Machado <luisgpm at linux dot vnet dot ibm dot com>
- To: Ulrich Drepper <drepper at gmail dot com>
- Cc: libc-alpha <libc-alpha at sourceware dot org>
- Date: Thu, 07 Oct 2010 18:55:59 -0300
- Subject: Re: [PATCH] powerpc: Use generic memset for RTLD for ppc32/64
- References: <1285607653.12736.111.camel@gargoyle> <AANLkTimyDKTu9T64FJJEWcLY_=UxDpviwc2zFMLaFNJH@mail.gmail.com> <1285643520.12736.122.camel@gargoyle> <AANLkTikWwDGEpg2tGbbtkE6kW7OGYwrydBxSXunGmp2P@mail.gmail.com> <1285769005.3709.9.camel@gargoyle>
- Reply-to: luisgpm at linux dot vnet dot ibm dot com
Hi Ulrich,
On Wed, 2010-09-29 at 11:03 -0300, Luis Machado wrote:
> On Mon, 2010-09-27 at 23:56 -0400, Ulrich Drepper wrote:
> > On Mon, Sep 27, 2010 at 23:12, Luis Machado
> <luisgpm@linux.vnet.ibm.com> wrote:
> > > We use a cache-based instruction (dcbz) to optimize memset when
> it's
> > > called with a 0 value (bzero). In short, we clear 128 bytes in a
> row and
> > > move to the next iteration.
> >
> > Then why not simply replace the 0x80 in the asm code with a macro?
> > All these temporary workarounds and especially for architectures
> > nobody but you care about. You have control over the sources used
> for
> > your own builds. So just provide final patches.
>
> It's not so simple. The code does pre-alignment to 128-bytes prior to
> using the instruction, so those chunks of code need to be executed
> conditionally.
>
> Is estabilishing a power4 rtld-memset acceptable at all or would you
> rather have the power4 memset code modified?
I see you checked in the more generic patch i sent earlier. But the
discussion was starting to flow towards a more narrow solution, making
the change more ppc-server-specific, without interferring too much with
other ppc archs other than IBM's. Since our base architecture is power4,
we only need to make power4's RTLD code generic.
So, power4's rtld-memset.S would point to the generic powerpc ASM
implementation (sysdeps/powerpc/powerpc[32|64]/memset.S). This code
already takes different cache-line sizes into consideration, and should
be slightly faster than the code GCC generates for the C version of
memset.
Does that sound like a better solution to you?
Thanks,
Luis