This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
- From: Torvald Riegel <triegel at redhat dot com>
- To: Alexandre Oliva <aoliva at redhat dot com>
- Cc: libc-alpha at sourceware dot org, carlos at redhat dot com, mtk dot manpages at gmail dot com
- Date: Wed, 20 Nov 2013 15:05:25 +0100
- Subject: Re: [MTASCsft PATCH WIP5 01/33] Multi Thread, Async Signal and Async Cancel safety documentation: intro
- Authentication-results: sourceware.org; auth=none
- References: <20131113081059 dot 3464 dot 51385 dot stgit at frit dot home> <20131113081132 dot 3464 dot 30409 dot stgit at frit dot home> <1384859432 dot 32326 dot 364 dot camel at triegel dot csb> <orsiurva0g dot fsf at livre dot home>
On Wed, 2013-11-20 at 04:54 -0200, Alexandre Oliva wrote:
> On Nov 19, 2013, Torvald Riegel <triegel@redhat.com> wrote:
>
> > On Wed, 2013-11-13 at 06:11 -0200, Alexandre Oliva wrote:
>
> >> +Intuitive definition of these properties, that attempt to capture
> >> +the meaning of the standard definitions, follow:
>
> > Spelling: I believe that you should either drop the (first) comma(s), or
> > use "which" instead of "that".
>
> Hmm, definitions was supposed to be plural, this may have made it more
> confusing than it was. How about this:
>
> Intuitive definitions of these properties, attempting to capture the
> meaning of the standard definitions, follow:
That sounds good.
> > If a program calls an MT-Unsafe function and thereby creates a data
> > race, then this data race can manifest in undefined behavior even
> > *before* the abstract machine would make the MT-Unsafe call. This can
> > happen because the compiler can assume that programs are data-race-free,
> > and use certain optimizations that wouldn't be allowed on (properly
> > synchronizing) MT-Safe versions of the called code. For example, if
> > there is sequential code after the call but depending on its outcome,
> > then the compiler could speculative and move it before the MT-Unsafe
> > call, which could cause a crash before the call actually happens.
>
> What you write is true, but I don't think it's relevant. All glibc
> functions that matter are opaque to their callers, therefore compilers
> couldn't possibly optimize assuming there aren't synchronization
> primitives in them. Do you have any evidence of the opposite, that
> might justify further obfuscating the definition?
What about linking statically and using LTO?
Such a scenario might not be common today, but I'd prefer to just be
clear about what using MT-Unsafe functions in multi-threaded code can
result in. It doesn't hurt us to be clear, and we don't gain anything
when we promise more than we can guarantee.
> >> +Each of these unprotected uses will use either the previous or the
> >> +subsequent locale information, so they won't cause crashes or access to
> >> +uninitialized, unmapped or recycled memory. However, since some cases
> >> +use cached locale information while others access the effective locale
> >> +object anew, concurrent changes to the global locale object may cause
> >> +these functions to behave in ways that they could not behave should the
> >> +execution of @code{setlocale} and of the so-annotated functions be
> >> +atomic, or even should @code{setlocale} alone be atomic.
>
> > Can you break up this sentence?
>
> There are several sentences in the quoted paragraph. Can you point out
> which one you mean, ideally suggesting alternate wording?
However, sometimes cached local information is used whereas in other
cases, the locale object is accessed anew; thus, concurrent changes to
the global locale object may cause these functions to behave in
unexpected ways (i.e., behavior that is not possible if @code{setlocale}
and the so-annotated functions would have been executed in some
sequential order).
I don't quite know what you mean by "or even should @code{setlocale}
alone be atomic" or if that's necessary, so I didn't add it to the
sentence above.
>
> >> +@item @code{envromt}
> >> +@cindex envromt
> >> +
> >> +Functions marked with @code{envromt} access the environment with
> >> +@code{getenv} or similar, requiring the environment to be effectively
> >> +read-only for MT-Safe operation.
> >> +
> >> +Environment-modifying functions do not protect in any way against
> >> +concurrent modifications or access, so calling @code{envromt}-marked
> >> +functions concurrently with @code{setenv}, @code{putenv},
> >> +@code{unsetenv} or direct modifications of the global environment data
> >> +structures is ill-advised; external concurrency control must be
> >> +introduced by callers of these environment-modifying and
> >> +@code{envromt}-marked functions.
> >> +
> >> +Functions that modify the environment are also marked with
> >> +@code{envromt}, but they are not MT-Safe for the reasons above. Since
> >> +all environment-modifying functions are MT-Unsafe, functions that only
> >> +access the environment are marked as MT-Safe when no other safety issue
> >> +applies.
>
> > I know you spelled the exception out here, but I'm wondering whether
> > everyone will remember this when seeing MT-Safe somewhere. Why didn't
> > you use a different MT-Safe tag (eg, MT-Safe-Env) for those functions,
> > but instead merged them with the "true" MT-Safe?
>
> Same reason as glocale, really: once all methods to modify something are
> decreed unsafe to call in MT programs, this something becomes constant
> in conformant MT programs, and therefore the functions are indeed
> perfectly safe to call.
That's true, but then you're still merging categories, namely functions
"MT-Safe under constraint X", where X could be the environment stuff, or
something else like glocale. Or did I misunderstand your categories?
Also, from a readers perspective, mentioning "global" constraints like
environment first before mentioning MT-Safe would be safer even if
TL;DR.
> It occurs to me that it might be useful to distinguish between read-only
> and read-write cases, for example, if there's a 1stcall mark: if you
> call the function in ST mode so as to eliminate that safety issue, does
> envromt indicate the function remains unsafe to call, or that it just
> reads from the environment so that it's fine to call after a first
> single-threaded call.
>
> >> +
> >> +@item @code{oncesafe}
> >> +@cindex oncesafe
> >> +
> >> +Functions marked with @code{oncesafe} use the internal @code{libc_once}
> >> +machinery or similar to initialize internal data structures.
> >> +
> >> +If a signal handler interrupts such an initializer, and calls any
> >> +function that also performs @code{libc_once} initialization, it will
> >> +deadlock if the thread library is linked in.
>
> > If "oncesafe" marks cases where a deadlock can happen, shouldn't it be
> > called "onceunsafe" or something like this instead?
>
> Read it aloud, collapsing the final phoneme from once with the first
> from safe ;-)
:)
> I tried other spellings such as wunsafe, but this one
> didn't bring libc_once to mind. It's to be understood as unsafe because
> libc_once is unsafe; can you suggest an alternate keyword or spelling?
Well, what about "onceunsafe"? Do you dislike this option?
> >> +@item @code{1stcall}
> >> +@cindex 1stcall
> >> +
> >> +Functions marked with @code{1stcall} perform thread-unsafe
> >> +initialization when they are first called.
>
> > I agree with Carlos on this (ie, it sounds as if it's better to be
> > listed as an exception to MT-Safe
>
> But... That's what it is! It's MT-unsafe (1stcall).
>
> > MT-Safe | oncesafe | 1stcall
>
> oncesafe is about libc_once initialization. libc_once is MT-Safe; so
> oncesafe is a note for AS-Unsafe and AC-Unsafe. 1stcall is for
> non-libc_once initializations that are not even MT-Safe.
Agree that oncesafe is a bad example in combination with MT-Safe.
> > with the definition above, you need to add that there could be
> > other restrictions additionally).
>
> This is the case of all notes; there's no reason to single this one out.
I guess I missed "the only reason" (in your docs) when reading this
first.
> >> +Functions marked with @code{uunguard} modify non-atomically arguments or
> >> +global objects that other functions access without synchronization. To
> >> +ensure MT- and AS-Safe behavior, callers should refrain from calling
> >> +so-marked functions concurrently with readers of those objects. A
Just readers or also other "writers"?
> >> +consequence of regarding modifiers of these objects as unsafe is that
> >> +the covered objects can be regarded as constant (subject to the
> >> +observation of safety constraints), so that all readers can be
> >> +considered safe in this regard.
>
> > AFAIU, uunguard are functions that if called, override the safety
> > properties of MT-Safe or AS-Safe functions?
>
> They're functions that, if called in MT programs, cause undefined
> behavior, like any MT-Unsafe functions.
If that's the case, why do you have uunguard at all, given that it's
just the negation of MT-Safe (or, be equal to MT-Unsafe)?
> uunguard is a catch-all general case for envromt and glocale, discussed
> earlier in this message.
>
> > Functions marked with @code{uunguard} modify arguments or global objects
> > that other, potentially MT-Safe or AS-Safe, functions access without
> > synchronization. Thus, calls to so-marked functions potentially make
> > all concurrent calls to otherwise MT-Safe or AS-Safe functions MT-Unsafe
> > and/or AS-Unsafe.
>
> That's a consequence of calling any MT-Unsafe function. No reason to
> single this one out, is there?
Then I don't understand what this category is really about. Is it
functions that are MT-Unsafe with an additional tag revealing why they
are unsafe? Is it functions that are MT-Safe except when certain other
functions are called, namely those MT-Unsafe ones that do the
modification? Or something else?
Second, assuming that uunguard is not unlike MT-Unsafe, why is it listed
as a constraint of MT-Safe functions?
> > There should also be a link in the MT-Safe and AS-Safe paragraphs to
> > this paragraph here, because both depend on this exception (ie, it's
> > only MT-Safe if no uunguard is called concurrently).
>
> This reasoning is backwards. Those other functions *are* MT-Safe
> *because* MT-Unsafe functions will not be called. If they are, all bets
> are off in very many different ways.
Your docs contain:
+@cindex MT-Safe
+MT-Safe functions are safe to call in the presence of other threads.
MT
+stands for Multi Thread.
[more definitions]
+When a function is safe to call only under certain constraints, we will
+add keywords to the safety notes whose meanings are defined as follows:
[other items]
+@item @code{uunguard}
+@cindex uunguard
[description of uunguard that I don't seem to understand]
What I'm saying is that the definition of MT-Safe at the very top should
link to the listing of constraints. It's true that if you call an
MT-Safe function A and a function B that is MT-Safe-under-constraint-X
concurrently, and X doesn't hold, then this could be considered as
MT-Safe A being mixed with MT-Unsafe B. But readers could also
interpret the first definition as "MT-Safe regardless".
>
> >> +@item @code{xguargs}
> >> +@cindex xguargs
> >> +
> >> +Functions marked with @code{xguargs} may use or modify objects passed
> >> +(indirectly) as arguments, without any guards to guarantee consistency.
> >> +To ensure MT- and AS-Safe behavior, callers must ensure that the objects
> >> +passed in are not accessed or modified concurrently by other threads or
> >> +signal handlers.
> >> +
> >> +This mark is only applied with regard to an object when the
> >> +@code{uunguard} mark is not applied because of the same object, and the
> >> +object is opaque or not intended to be modified by code outside of
> >> +@theglibc{}.
>
> > Wouldn't you need to apply the mark only if the object is not opaque and
> > might be considered to be fine to be modified outside of glibc?
>
> No, the rationale you quoted explains why *this* note covers the opaque
> or presumed-constant objects.
>
> > When specifying this more formally, I believe we need to be careful to
> > specify the "lifetime" of this constraint (eg, is it just for the
> > duration of this call?).
>
> The description in the first paragraph permits this reading, yeah.
> E.g., functions that hold pointers to passed-in (sub)objects and later
> on, asynchronously, proceed to modify them, would be marked with a
> broader, bigger red flag ;-)
>
> >> +@item @code{tempsig}
> >> +@cindex tempsig
>
> >> +Functions marked with @code{tempsig} may temporarily install signal
> >> +handlers for internal purposes, which may interfere with other uses of
> >> +those signals.
>
> >> +This makes such functions MT-Unsafe and AS-Unsafe to begin with.
>
> > Why does it make such functions MT-Unsafe in every case?
>
> Err... Because they may interfere with other uses of those signals, and
> there's not much one can do to avoid this interference. Calling the
> function concurrent from two different threads is a no-no. Heck, since
> signals are not thread-aware, it's not even safe to call the function in
> a single thread; the signal it expects may be delivered to a different
> thread, failing to fulfill its purpose.
But that doesn't mean that the signal handler must lead to bad behavior.
I was mainly wondering why installing a temporary signal handler would
always lead to this being MT-Unsafe. I suppose that there are ways to
not make that MT-Unsafe, at least as far as glibc is concerned.
> > Spelling: cummulative -> cumulative, I believe.
>
> Thanks.
>
> > Given that we control the allocator, I believe that it can be made
> > atomic (without disabling signals or reentrancy).
>
> That's not enough; we'd have to make *storing* the address of the
> allocated memory into its destination an atomic part of the allocation
> atomic unit, otherwise memory would still leak after we returned it, if
> cancellation occurred before the returned value was stored in the
> destination.
One can build wait-free atomic updates.
> The existing APIs just doesn't make memleak-safe
> implementations possible.
They aren't sufficient, but I guess they could be extended to make them
sufficient.
> >> +Functions annotated with @code{lockleak} may leak locks if asynchronous
> >> +thread cancellation interrupts their execution.
>
> > So this is not a leaking lock, but a non-cleaned-up acquisition of a
> > lock? "lockacquisitionleak" is longer, but might be more clear? Or
> > "lockacqleak"?
>
> They take a lock and don't release it, so the taken lock leaks. I
> understand the distinction you make, but I fail to see what good would
> come out of this sort of hair splitting. Like, what else might lockleak
> be expected to mean?
That the lock object itself leaks (eg, that it's not destroyed). Which,
I guess, might matter even for a robust lock (where an acquisition could
"leak" and we might still have some way to reclaim it).
> >> +@item @code{asynconsist}
> >> +@cindex asynconsist
> >> +
> >> +Functions marked with @code{asynconsist} take a recursive lock to ensure
> >> +MT-Safety while accessing or modifying data structures guarded by the
> >> +lock.
>
> > Should this be "asyncconcist" (ie, don't drop one of the "c"'s)?
> > I was reading "async-on-sist" and wondering what it means...
>
> It's a partial overlap of async (signal) and inconsist (for
> inconsistency). Doubling the c wouldn't make it any less misleading ;-)
Alright, why not async-inconsist, or asyncinconsist then?
> >> +Functions marked with @code{incansist} modify data structures in a
> >> +non-atomic way.
>
> > What does "incansist" stand for?
>
> potential
> inconsistency
> ^
> if |
> *can*celled
>
>
> The difficulties in guessing the precise meanings of the keywords, the
> risk of their misleading as to the issue they refer to, and even my own
> difficulty in keeping the precise meanings in mind and avoiding their
> drift over time might indicate that made-up keywords weren't such a
> great idea, and we might be better off with some numerical rather than
> symbolic code. Personally, I prefer symbols over numbers, but then, I'm
> biased ;-) Your thoughts?
Just longer keywords? I'd prefer something that has a somewhat
descriptive name over a number.