This is the mail archive of the
ecos-discuss@sourceware.org
mailing list for the eCos project.
Re: Scheduler lock in Doug Lea's malloc implementation - be real careful
- From: bob dot koninckx at o-3s dot com
- To: "Bart Veer" <bartv at ecoscentric dot com>, bob dot koninckx at o-3s dot com
- Cc: ecos-discuss at sources dot redhat dot com
- Date: Mon, 11 Sep 2006 13:56:13 +0000
- Subject: Re: [ECOS] Scheduler lock in Doug Lea's malloc implementation - be real careful
- Sensitivity: Normal
>
> Bob> Obviously, you need protection there. What I do not
> Bob> understand is why it is implemented using scheduler
> Bob> lock/unlock instead of using mutexes or semaphores. Also the
> Bob> proposed use of memory pools is not really clear to me (and
> Bob> it certainly raises portability issues when moving from one
> Bob> OS to another). As a general rule, I _always_ consider
> Bob> dynamic memory allocation an undeterministic process, so I
> Bob> _never_ use it in time critical code. It nevertheless seems
> Bob> odd to me that you shouldn't be allowed to use dynamic memory
> Bob> allocation in low priority threads with no real-time
> Bob> requirements at all, just because of timing requirements in
> Bob> higher priority threads. I see no reason why scanning the
> Bob> heap could not be interrupted (put on hold for a while) to
> Bob> give a real-time thread time to finish its job. This is not
> Bob> possible if you protect things with a big kernel lock, but
> Bob> would be if searching for memory would be protected by a
> Bob> mutex.
>
> Bob> Any comments ? Am I making a fundamental reasoning error here
> Bob> ? If not, I am going to try to change the implementation
> Bob> and see if it solves our problem and propose a patch.
>
>You seem to have missed my point. Let me try again.
I don't think so. I think we are thinking very much along the same lines. Let me also try again :-)
>The current code behaves as follows:
>
> lock scheduler
> alloc() or free() or realloc()
> unlock scheduler
Which locks the scheduler for an _undeterministic_ amount of time, depending on the size of the heap, its fragmentation etc...
>
>Let T_alloc be the worst case time spent in the alloc(), free() or
>realloc() routines, noting that the realloc() implementations in the
>heap will fail if the chunk cannot be resized and the alternative
>alloc()/memcpy()/free() implementation happens elsewhere. T_alloc will
>depend on factors like heap fragmentation so ideally would be measured
>in the context of a real application, but some of testcases in the
>memalloc package probably come close enough.
>
>What you are proposing is:
>
> lock_mutex
> lock scheduler
> mutex lock code
> unlock scheduler
> alloc() or free() or realloc()
> unlock_mutex
> lock scheduler
> mutex unlock code
> unlock scheduler
>
Indeed
>Let T_lock be the time spent in the mutex lock code and T_unlock the
>time spent in the mutex unlock code. These should probably be measured
>for the case where the mutex is contested, priority inversion
>protection triggers, and there are threads blocked on the mutex, not
>for the simple uncontested case. The other important number is T_max -
>the maximum time the system will spend with the scheduler locked
>because of other code. This is rather tricky to measure and will
>depend in part on the application and its I/O behaviour.
>If T_alloc is comparable to max(T_lock, T_unlock) then there is
>nothing to be gained by using a mutex rather than the scheduler lock.
>You are adding cpu cycle overhead to the memory allocation routines
>for no improvement in real-time performance.
Agree
>If T_alloc < T_max, then switching from a scheduler lock to a mutex
>lock/unlock will not change the maximum dispatch latency and will not
>affect the worst case real-time characteristics. It may still affect
>the average dispatch latency.
Agree
>If T_alloc == T_max, i.e. the worst case dispatch latency is caused by
>the memory allocation code, then we have a problem that needs
>addressing.
Agree
>I believe the author of the original code assumed that T_alloc was
>comparable to max(T_lock, T_unlock), or at least that (T_alloc < T_max).
>Hence using a scheduler lock saved some cpu cycles for every memory
>allocation operation without affecting real-time behaviour. I suspect
>that assumption is false but do not have any numbers to prove it.
>From what we see happening in our application, I am almost sure that it is false.
>There are several memory allocators, but the ones we are interested in
>are the fixed block, variable block and dlmalloc ones. If for all
>three we have T_alloc >> MAX(T_lock, T_unlock) then we should probably
>switch to using a mutex by default on the grounds of improving average
>dispatch latency. If for all three we have T_alloc == T_max then we
>must switch to using a mutex by default to reduce the maximum dispatch
>latency. Unfortunately this requires measuring T_max.
Anything we can do to provide you with the numbers you need ?
>If on the other hand we find that for say the fixed memory allocator
>T_alloc is much the same as MAX(T_lock, T_unlock) but dlmalloc behaves
>very differently then we would need to introduce new templates, with
>the old ones still using the scheduler lock and the new ones using a
>mutex.
>
>
>I do not have time right now to experiment and get some real numbers.
>However I would very much like to see such numbers before or
>accompanying any patch that changes the behaviour. Also it would be
>important to run not just the memory allocation package's own tests
>but also the uITRON ones.
I don't have that much time myself. I'd like to hear your ideas on a suitable test for measuring thing such that we can make a sound decision.
Thanks for your input
Bob
>
>Bart
>
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss