This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: An idea on library loading and symbol lookup
- From: Andreas Jaeger <aj at suse dot de>
- To: Robert Schweikert <rjschwei at abaqus dot com>
- Cc: libc-alpha at sources dot redhat dot com, work <rjschwei at hks dot com>
- Date: Sun, 16 Mar 2003 13:20:29 +0100
- Subject: Re: An idea on library loading and symbol lookup
- References: <1047815095.24185.64.camel@triumph.rjsdomain>
Robert Schweikert <rjschwei at abaqus dot com> writes:
> Hi all,
>
> I have been trying to get smart about shared library loading on Linux
> and wanted to float an idea to get an understanding of the related
> issues and if the idea is feasible at all. From reading Ulrich's paper
> (http://people.redhat.com/drepper/dsohowto.pdf) it appears that the good
> people that came up with the C++ ABI standard threw performance concerns
> out the window when coming up with the name mangling convention.
>
> namespace - class name - function name - arguments
>
> Since namespace - class name will be the same quite often and in many
> cases even the function name will be the same between 2 symbols.
>
> Taking for example a symbol from a template function which looks as
> follows:
>
> _ZNK7cow_COWI16bas_ShortcutImplI23kamC_BaselineCorrectionE11cow_VirtualIS2_EE8ConstGetEv
>
> and then another function from the same template:
>
> _ZNK7cow_COWI16bas_ShortcutImplI23kamC_BaselineCorrectionE11cow_VirtualIS2_EE6IsNullEv
>
> Now if you put these two symbols next to each other and consider that
> the loader has to walk the string and do a character comparison for each
> char, at leat that is my understanding of what's happening, you can see
> and count that there are 77 characters that are equal. Thus, the loader
> compares 77 characters before discarding the "wrong" symbol. Now you do
> that a few times and I think there should not be an argument on why it
> takes quite a while to load shared objects. If I would have a namespace
> then the number of characters that are equal in these 2 symbols would be
> even bigger since the namespace name comes first in the mangled name.
Symbol lookup is done via a hash table. Only entries added to the
same hash bucket will be compared character by character. The
expectation is that such strcmps happen quite seldom
> Now the idea:
>
> Lets say I do a reverse lookup (I'm sure I'm not the first one to come
> up with this idea but I couldn't find any discussions on it), i.e. I
> start at the back of the string then I only compare 2 characters before
> I discard the symbol. I just improved my lookup performance by a
> gazillion.
You need to know the end of the string. I don't think we have that
information directly available and therefore you would first need to
scan to the end - and if you scan, you can do the compare at nearly
the same time.
> I know there are other issues to deal with such as symbol versioning
> etc. and these are the issues I'd like to understand. Overall I think
> there's got to be a better way then comparing all these characters from
> the beginning.
Hashing ;-)
Andreas
--
Andreas Jaeger
SuSE Labs aj at suse dot de
private aj at arthur dot inka dot de
http://www.suse.de/~aj