This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Performance of global access versus thread local
- From: Roland McGrath <roland at hack dot frob dot com>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: libc-alpha <libc-alpha at sourceware dot org>
- Date: Wed, 25 Sep 2013 14:38:02 -0700 (PDT)
- Subject: Re: Performance of global access versus thread local
- Authentication-results: sourceware.org; auth=none
- References: <CANu=DmgX9dFaMyWr6g6Mh5C4OMn3C_T8gpb-FdNTJcPLQc8hAw at mail dot gmail dot com>
If you're going to use a DSO like that, you should use LD_BIND_NOW=1 to
keep startup overhead out of your measured loops. There is no real need to
use a DSO though. I'm guessing you did so just to make sure the tested
accesses were the PIC flavors. You can just compile the main program with
-fPIC for that.
For the global case, it would be a hidden global within libc itself.
So you need the __attribute__ ((visibility ("hidden"))) variant to
be representative of what the accesses inside libc would do.
It's probably better to write the two accesses by hand in assembly--or at
least show us the disassembly of what you compiled--to be sure they are
really representative of what the special-case assembly access in libc
would do.