[PATCH] c++: implement C++17 hardware interference size

Fri Jul 16 21:23:44 GMT 2021

On Fri, Jul 16, 2021 at 3:37 PM Matthias Kretz <m.kretz@gsi.de> wrote:

> On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> > > I don't understand how this feature would lead to false sharing. But
> maybe
> > > I
> > > misunderstand the spatial prefetcher. The first access to one of the
> two
> > > cache
> > > lines pairs would bring both cache lines to LLC (and possibly L2). If a
> > > core
> > > with a different L2 reads the other cache line the cache line would be
> > > duplicated; if it writes to it, it would be exclusive to the other
> core's
> > > L2.
> > > The cache line pairs do not affect each other anymore. Maybe there's a
> > > minor
> > > inefficiency on initial transfer from memory, but isn't that all?
> >
> > If two cores that do not share an L2 cache need exclusive access to
> > a cache-line, the L2 spatial prefetcher could cause pingponging if those
> > two cache-lines were adjacent and shared the same 128 byte alignment.
> > Say core A requests line x1 in exclusive, it also get line x2 (not sure
> > if x2 would be in shared or exclusive), core B then requests x2 in
> > exclusive,
> > it also gets x1. Irrelevant of the state x1 comes into core B's private
> L2
> > cache
> > it invalidates the exclusive state on cache-line x1 in core A's private
> L2
> > cache. If this was done in a loop (say a simple `lock add` loop) it would
> > cause
> > pingponging on cache-lines x1/x2 between core A and B's private L2
> caches.
>
> Quoting the latest ORM: "The following two hardware prefetchers fetched
> data
> from memory to the L2 cache and last level cache:
> Spatial Prefetcher: This prefetcher strives to complete every cache line
> fetched to the L2 cache with the pair line that completes it to a 128-byte
> aligned chunk."
>
> 1. If the requested cache line is already present on some other core, the
> spatial prefetcher should not get used ("fetched data from memory").
>

I think this is correct and I'm incorrect that a request from LLC to L2
will invoke the spatial prefetcher. So not issues with 64 bytes. Sorry for
the added confusion!

>
> 2. The section is about data prefetching. It is unclear whether the
> spatial
> prefetcher applies at all for normal cache line fetches.
>
> 3. The ORM uses past tense ("The following two hardware prefetchers
> fetched
> data"), which indicates to me that Intel isn't doing this for newer
> generations anymore.

> 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of
> cache
> line A and the adjacent cache line B thus is also loaded to LLC. Core 2
> request a read of line B and thus loads line A into LLC. Now both cores
> have
> both cache lines in LLC. Core 1 writes to line A, which invalidates line A
> in
> LLC of Core 2 but does not affect line B. Core 2 writes to line B,
> invalidating line A for Core 1. => no false sharing. Where did I get my
> mental
> cache protocol wrong?

> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────
>
>
>
>