Performance issue with systemd-coredump and container process linking 2000 shared libraries.
Mark Wielaard
mark@klomp.org
Tue Jun 20 13:15:13 GMT 2023
Hi Romain,
On Mon, 2023-06-19 at 19:56 +0000, Romain GEISSLER via Elfutils-devel
wrote:
>
> Thanks ! And sorry that Laurent had pinged you directly on Slack, I
> wanted to reach you via this mailing list instead of through the Red
> Hat customer network ;)
Slack isn't a very effective way to reach me. Most elfutils hackers do
hang out on the Libera.Chat irc channel #elfutils.
> I don’t know if you read the Red Hat case too. There you can find
> things a bit more clarified, and splitted into what I think are potentially
> 3 distinct "problems" which 3 distinct possible fix. Since there is nothing
> private, I can write on this here as well on this public mailing list.
I haven't looked if I have access to the customer case since you
provided such a great reproducer.
> So in the end I see 3 points (in addition to not understanding why
> finding the elf header returns NULL while it should not and which I
> guess you are currently looking at):
> - the idea that systemd developers should invert their logic: first
> try to parse elf/program headers from the (maybe partial) core dump
> PT_LOAD program headers
yes, that could in theory also be done through a custom callbacks-
>find_elf.
> - This special "if" condition that I have added in the original systemd
> code:
>
> + /* This PT_LOAD section doesn't contain the start address, so it can't be the module we are looking for. */
> + if (start < program_header->p_vaddr || start >= program_header->p_vaddr + program_header->p_memsz)
> + continue;
>
> to be added near this line: https://github.com/systemd/systemd/blob/72e7bfe02d7814fff15602726c7218b389324159/src/shared/elf-util.c#L540
>
> on which I would like to ask you if indeed it seems like a "right" fix with
> your knowledge of how core dump and elf files are shaped.
Yes, that does make sense.
> - The idea that maybe this commit https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=8db849976f07046d27b4217e9ebd08d5623acc4f
> which assumed that normally the order of magnitude of program headers
> is 10 for a "normal" elf file, so a linked list would be enough might be
> wrong in the special case of core dump which may have much more
> program headers. And if indeed it makes sense to elf_getdata_rawchunk
> for each and every program header of a core, in that case should this
> linked list be changed into some set/hashmap indexed by start
> address/size ?
Interesting. Yeah, a linked list is not the ideal datastructure here.
But I don't think it is causing the really long delay. That clearly
comes from the (negative) inode/dentry file search cache. But we could
look at this after we solve the other issues and we maybe want to speed
things up a bit more.
>
Cheers,
Mark
More information about the Elfutils-devel
mailing list