[PATCH] c++: implement C++17 hardware interference size

Matthias Kretz m.kretz@gsi.de
Fri Jul 16 15:12:04 GMT 2021


On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > Currently the patch does not adjust the values based on -march, as in JF's
> > proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> > how to go about that.  --param l1-cache-line-size is set based on -mtune,
> > but I don't think we want -mtune to change these ABI-affecting values. 
> > Are
> > there -march values for which a smaller range than 64-256 makes sense?

As a user who cares about ABI but also cares about maximizing performance of 
builds for a specific HPC setup I'd expect the hardware interference size 
values to be allowed to break ABIs. The point of these values is to give me 
better performance portability (but not necessarily binary portability) than 
my usual "pick 64 as a good average".

Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X should 
be interpreted as "my binary is supposed to be optimized for X, I accept 
inefficiencies on everything that's not X".

On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> On intel x86 systems with a private L2 cache the spatial prefetcher
> can cause destructive interference along 128 byte aligned boundaries.
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> 2-architectures-optimization-manual.pdf#page=60

I don't understand how this feature would lead to false sharing. But maybe I 
misunderstand the spatial prefetcher. The first access to one of the two cache 
lines pairs would bring both cache lines to LLC (and possibly L2). If a core 
with a different L2 reads the other cache line the cache line would be 
duplicated; if it writes to it, it would be exclusive to the other core's L2. 
The cache line pairs do not affect each other anymore. Maybe there's a minor 
inefficiency on initial transfer from memory, but isn't that all?

That said. Intel documents the spatial prefetcher exclusively for Sandy 
Bridge. So if you still believe 128 is necessary, set the destructive hardware 
interference size to 64 for all of x86 except -mtune=sandybridge.

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────


More information about the Libstdc++ mailing list