This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro

From: Torvald Riegel <triegel at redhat dot com>
To: Alexandre Oliva <aoliva at redhat dot com>
Cc: libc-alpha at sourceware dot org, carlos at redhat dot com, mtk dot manpages at gmail dot com
Date: Thu, 21 Nov 2013 17:26:14 +0100
Subject: Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
Authentication-results: sourceware.org; auth=none
References: <20131113081059 dot 3464 dot 51385 dot stgit at frit dot home> <20131113081132 dot 3464 dot 30409 dot stgit at frit dot home> <1384859432 dot 32326 dot 364 dot camel at triegel dot csb> <orsiurva0g dot fsf at livre dot home> <1384956325 dot 3152 dot 591 dot camel at triegel dot csb> <orhab6t6m8 dot fsf at livre dot home>

On Thu, 2013-11-21 at 08:02 -0200, Alexandre Oliva wrote:
> On Nov 20, 2013, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > What about linking statically and using LTO?
> 
> That's something I'd considered at some point, before concluding that a
> lot of code in glibc just wasn't suited for this sort of use.

But that would be a limitation of the current glibc implementation, and
not an inherent property of any glibc implementation, right?  IOW, it
would be preferable to give such cases an "incorrectsync" tag or
similar, rather than building our safety properties around a current
limitation (ie, that MT-Safe only holds if LTO isn't used).

Either case, I think we should state that data races can lead to
undefined behavior even before the point were the abstract machine would
"encounter" the race.  That's the case in C/C++ too, so we're not taking
any guarantees away from users.

> A number
> of very small functions (feof comes to mind) are safe in spite of
> avoiding synchronization primitives because we assume they won't be
> inlined.  What makes them safe currently is their single access to a
> single word;

I wouldn't say that they are safe.  First, what non-inlining gives us is
that we get a (hopefully) high likelihood that the code that the
compiler will generate is similar to the code it would generate for an
atomic access with relaxed memory order.  Having this high likelihood
might be fine for now, but I think it would be better to change this to
atomic memory accesses eventually.

Second, accesses with relaxed memory order are often not sufficient to
produce sequential consistency.  Thus, on weak-memory-model
architectures, we can't anymore guarantee equivalence of a
multi-threaded execution of MT-Safe functions to some sequential
execution of the same functions; we just don't get a total order that
programmers can reason more easily about.

I know that whether or not MT-Safe implies sequential consistency isn't
quite made clear by the standard, but given that C and C++ are using it
as default, I think we need to revisit the implementation of those
functions after we've decided on a precise definition of MT-Safe.

> if we were to inline it along with other calls that take
> such shortcuts, this would no longer make for consistent executions,
> because of the very reordering possibilities you bring up.

Did you make a list of these cases?

We might also discover these cases if we should give variables used for
synchronization in glibc atomic types -- but having such a list of cases
would be helpful nonetheless.

> > Such a scenario might not be common today, but I'd prefer to just be
> > clear about what using MT-Unsafe functions in multi-threaded code can
> > result in.  It doesn't hurt us to be clear, and we don't gain anything
> > when we promise more than we can guarantee.
> 
> How about this:
> 
>   ...  Calling them within such contexts invokes undefined behavior.
> 
> ?

Maybe that's best, assuming that readers understand undefined behavior.
Perhaps we could add that it could have similar effects as a data race?

> >> >> +@item @code{envromt}
> >> >> +@cindex envromt
> 
> >> > Why didn't
> >> > you use a different MT-Safe tag (eg, MT-Safe-Env) for those functions,
> >> > but instead merged them with the "true" MT-Safe?
> 
> >> Same reason as glocale, really: once all methods to modify something are
> >> decreed unsafe to call in MT programs, this something becomes constant
> >> in conformant MT programs, and therefore the functions are indeed
> >> perfectly safe to call.
> 
> > That's true, but then you're still merging categories, namely functions
> > "MT-Safe under constraint X", where X could be the environment stuff, or
> > something else like glocale.  Or did I misunderstand your categories?
> 
> I don't know what you understood, but there appears to be some
> disconnect indeed.  I wouldn't have said âcategoriesâ, for example; what
> each keyword stands for is a (mis)feature detected in the implementation
> of certain functions.  Multiple such features may be present in each
> function.
> 
> Most such features make functions unsafe to call in certain contexts
> (i.e., when multiple threads are running, or within an asynchronous
> signal handler, or when async cancellation is enabled), but there are
> some kinds of exceptions:
> 
> a) some functions exhibit a misfeature, but the misfeature does not make
> it unsafe to call, e.g., it returns a newly-allocated piece of memory,
> file descriptor, and there's no way to stop that return value from
> leaking should a thread be canceled between the moment the function
> returns and the result being stored in a location where a cleanup
> handler could get to;
> 
> b) some of the misfeatures can be worked around by taking some
> additional care in user code
> 
> c) some of the misfeatures cause all functions that exhibit a certain
> behavior to be unsafe to call, which in turn ensures the harmful
> behavior won't ever be exercised in conforming programs, which thus
> makes a complementary misfeatures exhibited by other functions to become
> harmless.  We bind the complementary misfeatures together when some work
> around is possible that could make the unsafe functions safe to call, at
> the expense of coordinating calls to the unsafe functions and to those
> who were only regarded as safe because the unsafe behavior had been
> ruled out.  This is the case of glocale and envromt.

That is a useful summary; should it be made part of the documentation
(in suitably rephrased form)?

> 
> > Also, from a readers perspective, mentioning "global" constraints like
> > environment first before mentioning MT-Safe would be safer even if
> > TL;DR.
> 
> Sorry, I don't follow.  Can you expand on what you have in mind?

The documentation defines MT-Safe without mentioning that there are
constraints.  If a reader would just read the MT-Safe definition, and
see it being attached to some function, the reader could (falsely)
assume that it it MT-Safe without any constraints needing to be
considered.  That would be a TL;DR leading to a misinterpretation;
nonetheless, I think it's nicer to just say that MT-Safe can come with
constraints, and mention that they are explained further down in the
text.

I was speaking about "global" constraints because I incorrectly
remembered "envromt" just being attached to those modifying the
environment, not also to those environment-reading functions that are
MT-Safe under the no-concurrent-env-modification constraint.

This might show that I didn't pay enough attention or that I easily
forget things; OTOH, it could also indicate that it would be useful to
have a better explanation of how your annotations work :)

> > Well, what about "onceunsafe"?  Do you dislike this option?
> 
> Yeah.  For some reason it sounds to me like some native-Canadian city
> name ;-) I don't like the additional length much, and I wouldn't want to
> have to face that alliteration in a public speech ;-)

At least you can pronounce it when reading it out loud :)

> >> >> +Functions marked with @code{uunguard} modify non-atomically arguments or
> >> >> +global objects that other functions access without synchronization.  To
> >> >> +ensure MT- and AS-Safe behavior, callers should refrain from calling
> >> >> +so-marked functions concurrently with readers of those objects.  A
> 
> > Just readers or also other "writers"?
> 
> Yeah, writers too, of course ;-)  I moved that sentence down to a new
> paragraph and rewrote it altogether:
> 
>   In order to avoid the safety issues posed by these unguarded accesses,
>   callers must guard calls to @emph{all} readers and writers of the
>   affected objects with something equivalent to a @code{rwlock}, i.e.,
>   writers must get exclusive access, while multiple concurrent readers
>   may be allowed.
> 
> How's that?

That sounds alright to me.  For every function marked this way, is it
documented which the "affected objects" are, and obvious which function
can be writer?

> > If that's the case, why do you have uunguard at all, given that it's
> > just the negation of MT-Safe (or, be equal to MT-Unsafe)?
> 
> It documents a *reason* for a so-marked function to be MT-Unsafe.

But isn't it listed under MT-Safe constraints?  That is, this list:

+When a function is safe to call only under certain constraints, we will
+add keywords to the safety notes whose meanings are defined as follows:
+
+@itemize @bullet

> > Second, assuming that uunguard is not unlike MT-Unsafe, why is it listed
> > as a constraint of MT-Safe functions?
> 
> It's not a constraint; it's a (mis)feature that can be avoided by
> constraining the program in certain ways.  In this case, by explicitly
> guarding all potential readers and writers.

Will the readers get an annotation as well?  The text for uunguard just
mentions modifying functions.  For envromt, you said that readers are
annotated too.

But see below; maybe it's just that you listed all of them under in one
list about MT-Safe constraints...

> > What I'm saying is that the definition of MT-Safe at the very top should
> > link to the listing of constraints.
> 
> Hmm, is this a result of the disconnect/misunderstanding we've detected?
> I don't think linking the definition of MT-Safe to misfeatures that may
> cause functions to be MT-Unsafe makes much sense.

I think it does, because otherwise readers can read just the definition
of MT-Safe and aren't aware that it comes with constraints.  If you
don't follow up on the other annotations (which might be cryptic), then
a user will misunderstand the definition.  The MT-Safe definition looks
complete and self-contained (ie, the two sentences at the top of the
text), but it isn't; and you don't discover this until you've read all
of the document, or start wondering what the additional tag on a
function means.

> And then, while constraints may imply only MT-Unsafe, others may imply
> other kinds of unsafety, and others may imply multiple kinds of unsafety
> (e.g., one function could be AS-Unsafe and AC-Unsafe, while another
> could be just AC-Unsafe, because of the same misfeature)
> 
> > It's true that if you call an
> > MT-Safe function A and a function B that is MT-Safe-under-constraint-X
> > concurrently, and X doesn't hold, then this could be considered as
> > MT-Safe A being mixed with MT-Unsafe B.  But readers could also
> > interpret the first definition as "MT-Safe regardless".
> 
> I think I understand your concern.  The idea of marking both readers and
> writers with the same (or complementary) keywords ought to indicate not
> even the readers are not MT-Safe regardless, right?  E.g., consider the
> documentation of setlocale:
> 
>      | MT-Unsafe uunguard, envromt || AS-Unsafe oncesafe, selfdeadlock,
>      asmalloc, asynconsist || AC-Unsafe oncesafe, incansist, lockleak,
>      memleak, fdleak |
> 
> while iswdigit says:
> 
>      | MT-Safe glocale || AS-Safe || AC-Safe |
> 
> It *should* be possible to infer from the docs that setlocale has
> uunguard because of glocale (the documentation of uunguard says so), but
> maybe making it uunguard:glocale would make it easier to correlate
> readers (with just glocale) and writers (with uunguard:glocale).  Then,
> if you decide you realy need to call setlocale in a MT program, you set
> up and take a global rw lock for writing around setlocale and any
> uunguard:glocale-marked functions, and for reading around iswdigit and
> any other glocale-marked functions.

Yes, that would be useful.  This example also shows how uunguard relates
to glocale.  ISTM that it would be better to put uunguard in a separate
list, and first explain -- close to where you define MT-Safe etc. -- how
the constraints work, and which one is paired with which other.  Then a
reader knows how the whole model works, knows that he/she has to pay
attention to any tags next to MT-Safe, and just needs to remember the
individual constraints / "misfeatures".

> >> >> +Functions marked with @code{tempsig} may temporarily install signal
> >> >> +handlers for internal purposes, which may interfere with other uses of
> >> >> +those signals.  
> 
> >> >> +This makes such functions MT-Unsafe and AS-Unsafe to begin with.
> 
> >> > Why does it make such functions MT-Unsafe in every case?
> 
> >> Err...  Because they may interfere with other uses of those signals, and
> >> there's not much one can do to avoid this interference.  Calling the
> >> function concurrent from two different threads is a no-no.  Heck, since
> >> signals are not thread-aware, it's not even safe to call the function in
> >> a single thread; the signal it expects may be delivered to a different
> >> thread, failing to fulfill its purpose.
> 
> > But that doesn't mean that the signal handler must lead to bad behavior.
> > I was mainly wondering why installing a temporary signal handler would
> > always lead to this being MT-Unsafe.  I suppose that there are ways to
> > not make that MT-Unsafe, at least as far as glibc is concerned.
> 
> Ok, two comments on this.
> 
> 1. the reasoning is not sets-temporary-signals-p => MT-Unsafe.  It goes
> like: does this function set temporary signals in a way that causes
> thread-unsafe behavior?  If so, mark it with tempsig, because that's
> that's the keyword that documents this misfeature.  If some function
> could set temporary signals without causing a safety problem as
> described in tempsig, it just wouldn't get this note.

OK, that makes sense.

> 2. setting a signal temporarily means it ought to be restored before
> returning.  But signal handlers are a process-wide property; there
> aren't per-thread handlers.  So, when you set a signal handler in a
> thread and get the previous handler back, three kinds of problems may
> occur:

These reasons also make sense, but I believe they can be worked around
(at least as far as just glibc is concerned, ignoring the rest of the
application and whether it wants to handle signals (by establishing
process-wide consensus on whether some signals are to be handled or not
using suitable synchronization).

>   a. you may get canceled before you get the return value from the
>   signal call, so you can't restore the original handler => AC-Unsafe
> 
>   b. another thread may modify the signal handler, removing the handler
>   you'll later restore, or overriding your newly-installed handler; if
>   they ought to restore later, how do they synchronize so that they
>   restore yours before you restore theirs?  there's just no
>   standard-defined way to coordinate => MT-Unsafe
> 
>   c. you set a temp handler that restores its original handler; you get
>   another signal and its handler installs the temp handler for its own
>   uses, and then it gets the signal meant for you => you lose the signal
>   if the other temp handler doesn't call you back, and you mess with the
>   other handler if they do and you then restore the original handler =>
>   AS-Unsafe
> 
> >> > Given that we control the allocator, I believe that it can be made
> >> > atomic (without disabling signals or reentrancy).
> 
> >> That's not enough; we'd have to make *storing* the address of the
> >> allocated memory into its destination an atomic part of the allocation
> >> atomic unit, otherwise memory would still leak after we returned it, if
> >> cancellation occurred before the returned value was stored in the
> >> destination.
> 
> > One can build wait-free atomic updates.
> 
> How would that help?
> 
>   foo = malloc(bar);
> 
> if the thread is canceled just after malloc returns but before the
> returned value is stored in foo, how would you clean it up?

You'd need the handshake to extend to the scope that will trigger
deallocation (ie, the caller).  As you mentioned, and as I confirmed,
that would need an extension of the interface.

> >> They take a lock and don't release it, so the taken lock leaks.  I
> >> understand the distinction you make, but I fail to see what good would
> >> come out of this sort of hair splitting.  Like, what else might lockleak
> >> be expected to mean?
> 
> > That the lock object itself leaks (eg, that it's not destroyed).
> 
> That would be a memleak, no?

It could be memory, it could be some other resource used by a lock.
That's why I think the expression "the lock leaks" (or is leaked?) can
be misguiding: It indicates that the lock leaks much like if memory
would leak... :)

> > Which, I guess, might matter even for a robust lock (where an
> > acquisition could "leak" and we might still have some way to reclaim
> > it).
> 
> Oh, you mean something like running a destructor for the lock object,
> not just leaking its memory?  Well, I guess lockleak could have been
> defined so as to mean that.  But it wasn't ;-)

Well, maybe we should just use a more descriptive name then, so that it
matches the definition? :)

> >> It's a partial overlap of async (signal) and inconsist (for
> >> inconsistency).  Doubling the c wouldn't make it any less misleading ;-)
> 
> > Alright, why not async-inconsist, or asyncinconsist then?
> 
> See the 3 lines of safety notes for setlocale?  That's hardly the
> longest set of keywords :-( Since the terms are arbitrary keywords, I'm
> optimizing for size, some mnemonic value, and pronounceability, but
> definitely not for self-explanatory meanings (see below).

I can't follow this reasoning.  Screen space isn't scarce, is it?  This
is complex stuff, so reading the terms is really the smallest part of
the costs.  I wouldn't mind at all being at least a little bit verbose
here.  AFAIU, Joseph would also like to see keywords not optimized for
size.

Also, if it's not a concatenation of English words, how do I pronounce
it properly without making something up?  Creating new words doesn't
make something easier for first-time or infrequent users, even though it
might be somewhat more efficient for frequent users.  But it's the first
group we have to care about to make this easy to consume, I believe.

> > Just longer keywords?  I'd prefer something that has a somewhat
> > descriptive name over a number.
> 
> Intuitive meanings are not necessarily a plus; they may induce users to
> refrain from looking them up and to assume they mean something they
> don't, just because it sounds like they do ;-)

I can't argue with that, there is this risk (as shown by our past
discussion about what "thread safety" means) :)
Nonetheless, using "environment" instead of "envromt" feels just better
for me.  When writing this email, each time I wrote "envromt", I had to
think back to what the spelling was you picked...

Follow-Ups:
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva

References:
- [MTASCsft PATCH WIP5 00/33] MT-, AS- and AC-Safety docs
  - From: Alexandre Oliva
- [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Torvald Riegel
- Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
  - From: Alexandre Oliva

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]