This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 07/10] Add __pthread_set_abort_hook export


On Sat, 2013-01-19 at 00:46 +0100, Andi Kleen wrote:
> > So, what's the use case for TXN_ASSERT? 
> 
> Two use cases are:
> 
> Performance debugging:
> - You have a complex transaction specific code path (e.g. with _xtest())
> and there is a bug in it that causes unexpected aborts. As you say
> this cannot cause functional bugs, but the unexpected abort can 
> be still useful to fix to get better performance.
> 
> You can see the abort in the profiler, and get LBRs for some context,
> but for complex code flows it's useful to have assert() to check
> internal conditions too.
> 
> Correctness debugging:
> - In rare cases programs may break due to timing dependencies or
> existing races exposed by much faster locks. In this case it's useful
> to have a debug facility that works in speculation too, so that
> inconsistencies in other threads can be detected faster.

So, trying to summarize why you're interested in something like
__pthread_set_abort_hook (please correct anything that's wrong):

(1) This is not necessary for existing and correctly synchronized code
because assertions will also fail in nontransactional executions (and
the failure will be reported as expected).

(2) For explicitly transactional code (ie, code in which some programmer
explicitly used TSX), you want a facility to communicate some
information out of transactions without having to finish execution of
these transactions.

(3) If programs are incorrectly synchronized, you want to give
programmers a debugging facility.  You want this to take the form of
manually added assertions.


Because of (1), I think we should decouple this patch from the rest of
the lock elision patches.


For (2), if the explicitly transactional code is correct and it's just
performance issues we want to debug, we could do without terminating the
transaction (unless we have to write so much data out that we're hitting
HTM capacity limits, etc.).  That is, I'm wondering whether assertions
are the right tool for this.

For (3) and also (2) if it's not just a performance problem, we need to
terminate the current transaction to be able to get information out of
it when we can't continue to execute it.  With TSX, we can either use
the 8 bits that we can communicate via abort, or we could commit the
transaction early, and then abort.

Early commit would work if just RTM is used (ie, while (_xtest())
_xcommit(); ).  But I guess it would fail if xacquire/xrelease is mixed
in, or does TSX not complain about replacing xrelease with an RTM
commit?

If TSX complains, we get a fault, IIRC, so when this fault happened
within the code with the loop above, we'd still know that some assertion
fired.  If we inline this code, or add other hints regarding what called
it, I guess we could find out which assertion triggered the fault by
looking at the code around where the fault happened?  Thoughts?

The advantage over the abort approach would be that we can just handle
this locally in TXN_ASSERT or even assert(), and that we can communicate
a larger amount of information.


The abort approach works if xacquire and xbegin regions are mixed, but
only if the outermost transaction was not started with xacquire.  But
with TSX we just have <255 values that we can get out (ie, without the
values reserved for hold locks etc.).  And when we abort, we jump to
whatever started the outermost transaction, which could be code in
applications (programmers using transactions explicitly), glibc (e.g.,
lock elision), libstdc++ (if it doesn't use glibc locks), boost
(likewise), libitm (__transaction { }), and so on.  So to make this work
in general, all those components would have to support the special
assertions.

To actually support the assertions, abort codes need to be interpreted
consistently, and all assertions in a process need to be encoded using
<255 values.

Who is supposed to be the consumer of the abort codes? (I've asked this
previously, but you haven't answered.)  Is this code in the program, or
something else?  This matter because it's the other end of the
assertion, obviously.

Overall, I think the approach you take now is fragile in bigger scenario
with libraries etc., but could be sufficient in simple or tightly
controlled experiments.  I'd prefer if we could do something as outlined
previously that does not rely on hooks into abort handling and such, but
haven't had time for a proof of concept.

What do you think?  Are there any other alternatives?


Torvald


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]