This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][RFC] Allow explicit shrinking of arena heaps using anenvironment variable
- From: Florian Weimer <fweimer at redhat dot com>
- To: Rich Felker <dalias at aerifal dot cx>
- Cc: libc-alpha at sourceware dot org
- Date: Wed, 01 Aug 2012 18:08:56 +0200
- Subject: Re: [PATCH][RFC] Allow explicit shrinking of arena heaps using anenvironment variable
- References: <20120725183634.E0C5F2C0B1@topped-with-meat.com> <CAHGf_=qCMy6K1MD4miN65GkMSumYTtH23xoFrfmCuh=WjybAVA@mail.gmail.com> <20120730230006.355f9b67@spoyarek> <5016D012.5000101@gmail.com> <20120730191758.GX544@brightrain.aerifal.cx> <CAHGf_=rRm4B4YCZKgos5hWN_QTYV76fAxcegO7D3x=4_A88Rvw@mail.gmail.com> <20120731132705.GZ544@brightrain.aerifal.cx> <5017DECB.9030405@redhat.com> <20120801041303.GA544@brightrain.aerifal.cx> <5018D9B4.9020707@redhat.com> <20120801122706.GE544@brightrain.aerifal.cx>
On 08/01/2012 02:27 PM, Rich Felker wrote:
I find it surprising that PROT_NONE does not count against the
commit limit (at least for initial allocations in 2.6.32-era
Why? PROT_NONE is not special here. All that matters is that
PROT_WRITE is not included.
But you can turn PROT_NONE into PROT_WRITE using mprotect. Now it
happens that the accounting check is delayed until the mprotect call,
but it doesn't have to be implemented this way.
> The same is true of read-only clean
> anonymous maps (all zero) or read-only maps of files. The best example
> is the program's .text/.rodata/etc. PT_LOAD segment that's read-only.
> Except in the case of textrels (where it was temporarily made writable
> and part or all of it was dirtied), this map does not contribute to
> commit charge; if it did, the concept of shared program text would be
> nearly meaningless.
It would still be an important performance optimization because you can
share non-dirty pages between processes and use RAM more efficiently.
You just lose the ability to conserve swap space.
kernels, I have not checked if applying it retroactively using
mprotect, or on newer kernels). As you explain, it is sound to do
this, but the the mmap(2) manual page suggests that MAP_NORESERVE
has this effect as well, except that in reality, such a mapping does
count against the limit.
MAP_NORESERVE is a historical relic that violates the principle of
no-overcommit. It cannot be allowed to work, because it does not only
affect the calling process. If memory is overcommitted, any other
process could later fail when the kernel is unable to satisfy the
memory committed to that process; this would be a serious
vulnerability.
The same trick as with mprotect could be applied here, the accounting
check could be deferred until an attempt is made to dirty the page.
It might be a challenge to write that SIGSEGV handler, but Hotspot is
supposed to have one that attempts to recover from the out-of-memory
situation. Switching to PROT_NONE allocation with subsequent mprotect
would be vastly preferable (because it improves behavior in mode 2), but
it is difficult to convince anyone to rely on the PROT_NONE behavior.
MAP_NORESERVE is supposedly still honored if the old imprecise
pseudo-no-overcommit mode (vm.overcommit_memory=0) is used, but I have
not tested this.
Yes, it basically disables the mapping size sanity check performed in
mode 0.
Perhaps we should add a test case for the intended mprotect behavior?
Just make a 2gb PROT_NONE map and fork a few thousand times... :-)
Right, I think this is actually testable without bringing down the box.
--
Florian Weimer / Red Hat Product Security Team