This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 02/10] Add the low level infrastructure for pthreads lock elision with TSX


On Fri, Jan 11, 2013 at 10:23:14PM +0100, Torvald Riegel wrote:
> This can be good or bad depending on how you see it.  It is good for
> people that want to tune.  But it also indicates that HLE _needs_ to be

This implementation does not use HLE and this has nothing to do with
HLE, as the TSX ISA interface.

> tuned to be worthwhile, which worries me.

Modern computers are complex and need to be tuned in general.

> 
> I know that that HLE is the first LE HW that we can experiment with (at
> least, I'm not aware of any other mainstream CPU having something
> similar).  And the fact that we can't yet talk about performance is
> another obstacle.  But we should try really hard to avoid requiring
> users to tune our locks, including whether to use HLE or not.

The goal is to have reasonable defaults. But until we understand
what the good reasonable defaults are we need tuning mechanism,
so that people can find good defaults. I don't want to explain
everyone who wants to play with this how to rebuild glibc,
so a runtime mechanism is needed.

Also I actually disagree that exposing tunables is a bad thing. 

The current glibc strategy of hardcoding dubious magic numbers from 10+
years ago (like it is done for the adaptive lock) is a poor one, and contrary
to any good continuous performance improvement strategy.

Of course most users will not tune, but the few that do can give you
priceless feedback and they need to have appropiate mechanisms.

> Also, if we want to expose tuning parameters, my gut feeling is that at
> least long-term, we'd need something different / better for how to set
> those parameters.

I considered a configuration file, but it has additional overhead.
Right now based on my own experience the environment variables work
fine.

  If we'd expose all tunables in pthreads code to the
> same extent and in the same way (e.g., max spin count for adaptive
> mutexes), it would be a mess.

It would be a great improvement over the current "generally poor"
state.

> So, I don't want to blame your patch for this, but IMHO this is
> something the project should investigate and provide guidance on.
> 
> > Lock elision can be enabled/disabled using environment variables.
> > It can be also enabled or disabled using new lock types for
> > mutex and rwlocks.
> 
> I think we already have too many mutex types.  It's good to provide

elision is not really adding any new types, it's just adding two
additional flags (elision and no elision)

The initializers do not support flags well, that is why there are
multiple combinations. But that's how pthreads was defined.

> choice, but if we'd really optimize our locks, we'd end up with _a lot_
> more types.  Should we really have one lock type for each of cross-NUMA,
> one with more fairness and one without, one that spins and one that
> doesn't (yes, that's a current case), and so on?  And do we then also
> build combinations of all of them and with or without HLE?  And what if,
> for example, Intel comes up with another HLE implementation that is more
> powerful?  Will we then have HLE2 initializers?  Or HLE_vendorX?

The source interface was designed to be portable, so anyone doing lock elision
on any platform can use it. There is nothing vendor specific in it
(the only vendor specific parameters are exposed in the tunable
interface)

If there are multiple elision types I would expect those to be handled
through the separate tuning interface (but I don't see this)

Right now you need two more initializers for each lock type that
supports elision. There are only two lock types right now that do,
so you have four more. Not an unreasonable number. The only
other lock type I would expect to elide in the future is recursive,
so there may be two more.

> I think we should sit down and decide which locks we really need, and
> what they should offer.  And then see how HLE fits into it.

timed, adaptive, maybe recursive at some point.

> 
> For example, can we subsume HLE under adaptive mutexes, or at least

Lock elision is orthogonal to how you do the fallback lock.
It's just another attribute, like PI.  I don't think it makes
sense to mix the two.

> Note that C++11 also requires the lock to be owned before an unlock()
> call.  Same for C11.

Yes any program hitting this is non conformant.

> 
> >   There are ways around this with some tradeoffs (more code in hot paths)
> 
> Can you summarize the workarounds and the associated costs?

For unlock it's adding _xtest() in the unlock path.
For the others it's usually adding _xabort(), which leads nested
locking to not elide.

I plan to do any of those changes if there are programs affected by it.
However I first want to understand if there are.


> >   This will also happen on systems without RTM with the patchkit.
> >   I'm still undecided on what approach to take here; have to wait for testing reports.
> 
> What kind of tests did you run so far?

A lot large real programs, various benchmark suites etc.

> 
> > - pthread_mutex_destroy of a lock mutex will not return EBUSY but 0.
> 
> That should be fine, as it's a may-fail requirement.  You could add code
> to write to the lock value itself too to ensure that there will be no
> other thread holding the mutex (and thus get a correct return value).
> The current pthread_mutex_destroy code already writes to the mutex kind,
> so this should have negligible overhead.

It would be possible to abort yes, if it's a problem.

> I see that you are using RTM instead of the HLE facilities (ie, XACQUIRE
> and XRELEASE) that would give you the expected trylock behavior, which
> is interesting.
> I assume you do this to be more flexible regarding when to fall back to
> nonspeculative execution?
> 
> Have you tried to use HLE within an RTM transaction?  That would give

HLE inside RTM aborts in current TSX implementations.

> > - Same applies to the rwlocks. Some of the return values changes
> >   (for example there is no EDEADLK for an elided lock, unless it aborts.
> >    However when elided it will also never deadlock of course)
> 
> The EDEADLK case is just a may-fail, so this is fine; as you say, there
> is no deadlock in this case . Are there any other changes in semantics
> besides wrong return values of trywrlock and tryrdlock?

Elided locks always behave like reader locks, so you can see
write locks behaving like read locks to the same write locks.
(e.g. write trylock inside write trylock will succeed now)
This is the same as with the mutexes.

> I'd like us to discuss the semantics and high-level design in more
> detail first.  Thus, I have no comments yet on the specifics of your
> patch, but will review once we have consensus on the design.

On contrary I would prefer if people would discuss concrete code instead
of various strawmans. In my experience high level theoretical discussions of
transactional memory often go off into the weeds, it's much better 
to have concrete examples related to code.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]