This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Lock elision: Give PTHREAD_MUTEX_NORMAL-like mutexes a new internal type.


On Fri, 2013-06-28 at 08:57 +0200, Dominik Vogt wrote:
> On Thu, Jun 27, 2013 at 11:18:15AM +0200, Torvald Riegel wrote:
> > On Thu, 2013-06-27 at 07:38 +0200, Dominik Vogt wrote:
> > > On Tue, Jun 25, 2013 at 02:39:58PM +0200, Torvald Riegel wrote:
> > > The let me rephrase my question:  How can you know what you're
> > > interested with zero performance testing up to now.
> > 
> > What does zero performance testing refer to?  Zero wide-spread testing
> > by exposing it to real users?
> 
> Zero means zero.  Have so seen a single real number except the ones
> I have posted?
> 
> Actually, I find the discussion about Abi changes and new Apis
> quite strange.  If people were not so intent on getting patches
> into 2.18, we could experiment with the interfaces to our hearts
> desire and decide which changes and additions are worth it _after_
> some testing is done.  There seems to be so much talk about
> fruitless points, for example, the vast majority of software is
> _not_ going to benefit from elision (because mutexes are used only
> sparingly, in ways that kill elision, or not at all), so trying to
> enable elision for all applications looks irrelevant in my eyes.

Enabling elision won't hurt anything or anyone if there are no ABI/API
changes.  If it turns out to be a bad idea as you assume it will, we can
just get rid of it again without having creating any maintenance
overhead.

We enable wide-spread testing by making it available, and this
wide-spread testing is what really counts.
So I don't see any risk or loss in allowing for elision.  Anyone who
wants to test elision can enable the configure switch.  Everyone else is
practically not affected.

I'm not sure which other risks you see, or which costs.

> > > As it is now, you're just assuming or hoping for certain
> > > properties of transactional memory without any evidence that they
> > > exist in reality.  I _have_ data on transactional memory that
> > > suggests that your hopes will not come true.
> > 
> > Then post this data.  I assume that you have data on how Haswell's
> > transactional memory (TM) performs, because that's what Andi's patches
> > are about.
> 
> I have already posted data - for z/architecture, of course.

But this data seemed to be flawed.  Did it turn out to be correct, or is
the cause for the anomalies that we were looking for back then already
known?

> I've not seen any numbers on Haswell, only vague promises that
> "everything will be great", and I know enough about HTM to beleive
> that this is by far too optimistic.
> 
> Unfortunately I cannot post the test programs for the time being,
> but only describe the algorithms.  And I _can_ run test programs
> written by someone else and post (relative) results (i.e.  relative
> performance of glibc without elision patches, with elision patches
> but disabled, and with elision enabled).

If we get a low-risk elision implementation in (ie, no ABI/API changes),
everyone can do their own testing.  To me, that seems to be a good way
to resolve this question.

> > > As far as I know, nobody has ever done real application tests with
> > > transactional memory.
> > 
> > There's published work for STMs on real applications like memcached.
> 
> All these tests with STM are not applicable to HTM because nobody
> has ever bothered to simulate cache effects of a real HTM (as far
> as I know).

The evaluations of AMD's ASF did use a simulator (an improved PTLSim)
that was pretty accurate (e.g., see the Eurosys 2010 paper or the papers
by Diestelhorst et al.).  IIRC, caches were simulated precisely except
that the all the cores were equal cores without any higher-level
hierarchy (ie, as in real AMD CPUs), and the cache timing model didn't
have all the bells and whistles.

> But almost the whole implementation of HTM is a sum
> of cache effects.  For example STM implementations never abort
> transactions because of cache line conflicts like HTM
> implementations do.  STM test results can be surpassed by HTM in
> some aspects, but they are always too optimistic regarding abort
> ratio.
> 
> > Sun has done tests on real code back when they worked on the Rock TM.
> > No published papers on Haswell TM performance AFAIK, but that's no
> > surprise given the hardware is new.
> 
> I.e. no publicly available data except marketing stuff.  Not even
> real hardware.

There was real hardware AFAIU.  I know the folks who did those
experiments, and I trust them enough to believe that what they published
wasn't just marketing stuff.

> > > I'll never believe someone has done real world tests unless he
> > > documents the precise test setup so that everybody can repeat the
> > > tests.  This is because I tried to do these real world tests
> > > myself and was unable to find a suitable application that could
> > > substantially benefit from lock elision
> > 
> > Lock elision isn't equal to TM.  TM is the general programming
> > abstraction.  HTM and STM are hardware/software implementations of TM.
> > Lock elision is something that you can implement with an HTM or STM, but
> > STM will be slower of course.
> 
> Sure, but lock elision will certainly not surpass the benefits
> that are possible with HTM if it is implemented using HTM.
> 
> > > I posted test results some days ago.  The 22 to 45 percent
> > > performance loss even with elision disabled do not count?
> > 
> > As far as I remember this thread, it wasn't quite clear at that time
> > whether those results were correct.
> 
> The results in the separate thread are real.

But it seemed to me that some things were odd to the extent that it
didn't make sense to us (e.g., a bug in the implementation, something
else, ...), and that you wanted to find out what was going on.  So while
the results may have been real, it doesn't mean that they were correctly
assessing the potential of HTM nor lock elision.  Nor of course that
this would apply to *every* HTM.

Torvald


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]