This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Optimization of conditional stores (was: Re: [PATCH] Add adaptive elision to rwlocks)
- From: Torvald Riegel <triegel at redhat dot com>
- To: Andi Kleen <andi at firstfloor dot org>
- Cc: Alexander Monakov <amonakov at ispras dot ru>, Roland McGrath <roland at hack dot frob dot com>, Andi Kleen <ak at linux dot intel dot com>, libc-alpha at sourceware dot org
- Date: Thu, 10 Apr 2014 22:35:01 +0200
- Subject: Re: Optimization of conditional stores (was: Re: [PATCH] Add adaptive elision to rwlocks)
- Authentication-results: sourceware.org; auth=none
- References: <1396652083-18920-1-git-send-email-andi at firstfloor dot org> <20140404234516 dot 3DFAD74446 at topped-with-meat dot com> <20140405003759 dot GQ32556 at tassilo dot jf dot intel dot com> <20140405044201 dot 9B44D74445 at topped-with-meat dot com> <alpine dot LNX dot 2 dot 00 dot 1404071824530 dot 2531 at monopod dot intra dot ispras dot ru> <20140407161055 dot GV22728 at two dot firstfloor dot org> <alpine dot LNX dot 2 dot 00 dot 1404072027420 dot 2624 at monopod dot intra dot ispras dot ru> <20140407175227 dot GW22728 at two dot firstfloor dot org>
On Mon, 2014-04-07 at 19:52 +0200, Andi Kleen wrote:
> On Mon, Apr 07, 2014 at 08:54:57PM +0400, Alexander Monakov wrote:
> > > > I would also suggest making the intent (perform the store only when necessary)
> > > > explicit, and make sure to disallow the compiler optimization, for example:
> > > >
> > > > if (*ptr != value)
> > > > *(volatile typeof(*ptr)*)ptr = value;
> > >
> > > That's really ugly.
> >
> > I simply expanded kernel's ACCESS_ONCE macro by hand for the sake of the
> > example.
>
> Ok I guess we can add a ACCESS_ONCE()
>
> The existing elision code also uses this construct.
I'm not a kernel developer, but I think there ACCESS_ONCE is rather used
to get reads in spin-loops right (in terms of forward progress). If we
transition to C11 atomics eventually, that specific workaround won't be
needed because just having an atomic load in a spin loop will be enough
to tell the compiler to not optimize the load out.
However, this case here is different because it's not about forward
progress but about whether idempotent stores are fine or not. I guess
it's really a case of inefficient code generation if the compiler is
optimizing out checks for idempotent stores. Alexander's example is
slightly different because there, the compiler sees that the location
just has been written to. It shouldn't assume this in the lock elision
code.
Thus, I think it's not worth adding (something else than) ACCESS_ONCE
here; rather, someone should look at the compiler. Maybe the case would
be clearer for the compiler if we'd actually used atomic accesses and
atomic-typed variables (because for synchronizing code, it's more likely
to be inefficient if you just store, idempotent or not...).