This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PowerPC: memset optimization for POWER8/PPC64
- From: Richard Henderson <rth at twiddle dot net>
- To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, libc-alpha at sourceware dot org
- Date: Mon, 21 Jul 2014 09:16:42 -1000
- Subject: Re: PowerPC: memset optimization for POWER8/PPC64
- Authentication-results: sourceware.org; auth=none
- References: <53C920CD dot 8030506 at linux dot vnet dot ibm dot com> <53C94952 dot 4010805 at twiddle dot net> <53CD12D3 dot 3060804 at linux dot vnet dot ibm dot com>
On 07/21/2014 03:17 AM, Adhemerval Zanella wrote:
> In fact, in this case it will need to write 1-15 bytes based 'clrldi' result. And
> for POWER8, although unaligned store are handled with performance equivalence of
> aligned ones, some cases POWER8 will either:
>
> * force break unaligned in multiple internal operations (misaligned flushes in a
> crossing 128-byte cache-line boundaries and storing a 4KB small page boundary);
This, I assume isn't actually a big deal. Internal operations floating around
the execute queue are certainly better than the pipeline flush caused by a
mis-predicted branch.
> * trigger an alignment interrupt in caching-inhibited storage. This is why I have
> pushed the patch 87868c2418fb74357757e3b739ce5b76b17a8929 on memcpy: if you use
> memcpy on a DMA mapped memory (from a GPU for instance), doing *any* unaligned
> store will result in alignment interrupt. And I got reports that the X server
> is doing it (that's why the patch).
However, this is certainly a good reason. Thanks for the pointer.
r~