Performance issue with systemd-coredump and container process linking 2000 shared libraries.

Mark Wielaard mark@klomp.org
Tue Jun 20 13:15:13 GMT 2023


Hi Romain,

On Mon, 2023-06-19 at 19:56 +0000, Romain GEISSLER via Elfutils-devel
wrote:
> 
> Thanks ! And sorry that Laurent had pinged you directly on Slack, I
> wanted to reach you via this mailing list instead of through the Red
> Hat customer network ;)

Slack isn't a very effective way to reach me. Most elfutils hackers do
hang out on the Libera.Chat irc channel #elfutils.

> I don’t know if you read the Red Hat case too. There you can find
> things a bit more clarified, and splitted into what I think are potentially
> 3 distinct "problems" which 3 distinct possible fix. Since there is nothing
> private, I can write on this here as well on this public mailing list.

I haven't looked if I have access to the customer case since you
provided such a great reproducer.

> So in the end I see 3 points (in addition to not understanding why
> finding the elf header returns NULL while it should not and which I
> guess you are currently looking at):
>  - the idea that systemd developers should invert their logic: first
> try to parse elf/program headers from the (maybe partial) core dump
> PT_LOAD program headers

yes, that could in theory also be done through a custom callbacks-
>find_elf.

>  - This special "if" condition that I have added in the original systemd
> code:
> 
> +                /* This PT_LOAD section doesn't contain the start address, so it can't be the module we are looking for. */
> +                if (start < program_header->p_vaddr || start >= program_header->p_vaddr + program_header->p_memsz)
> +                        continue;
> 
> to be added near this line: https://github.com/systemd/systemd/blob/72e7bfe02d7814fff15602726c7218b389324159/src/shared/elf-util.c#L540
> 
> on which I would like to ask you if indeed it seems like a "right" fix with
> your knowledge of how core dump and elf files are shaped.

Yes, that does make sense.

>  - The idea that maybe this commit https://sourceware.org/git/?p=elfutils.git;a=commitdiff;h=8db849976f07046d27b4217e9ebd08d5623acc4f
> which assumed that normally the order of magnitude of program headers
> is 10 for a "normal" elf file, so a linked list would be enough might be
> wrong in the special case of core dump which may have much more
> program headers. And if indeed it makes sense to elf_getdata_rawchunk
> for each and every program header of a core, in that case should this
> linked list be changed into some set/hashmap indexed by start
> address/size ?

Interesting. Yeah, a linked list is not the ideal datastructure here.
But I don't think it is causing the really long delay. That clearly
comes from the (negative) inode/dentry file search cache. But we could
look at this after we solve the other issues and we maybe want to speed
things up a bit more.

> 
Cheers,

Mark


More information about the Elfutils-devel mailing list