This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Performance of global access versus thread local
- From: Will Newton <will dot newton at linaro dot org>
- To: libc-alpha <libc-alpha at sourceware dot org>
- Date: Wed, 25 Sep 2013 21:53:26 +0100
- Subject: Performance of global access versus thread local
- Authentication-results: sourceware.org; auth=none
Hi,
I've been trying to build a benchmark to measure the performance of
accessing a thread pointer relative variable versus a global variable,
in order to examine the trade-offs in implementing stack guard and
pointer guard.
Using the attached code[1] I get the following numbers:
Core i5 (x86_64):
TLS ticks per 1000 loops: 0.0000101905 Global ticks per 1000 loops: 0.0000100481
(which is understandable, as the two code sequences are practically identical)
Cortex-A15 (arm):
TLS ticks per 1000 loops: 0.0000052731 Global ticks per 1000 loops: 0.0000064143
Does this test look valid? I would be interested to see if anyone else
gets different numbers on different platforms.
[1] gcc -shared -fPIC -O2 tlsvglobal.c -o libtlsvglobal.so
gcc -O2 main.c -o main -L. -ltlsvglobal -lrt
LD_LIBRARY_PATH=. ./main
--
Will Newton
Toolchain Working Group, Linaro
#include <stdio.h>
#include <time.h>
#define LOOPS 10000000
int main(void)
{
struct timespec start, end;
unsigned int i, loops = LOOPS;
double tls_elapsed, global_elapsed;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
for (i = 0; i < LOOPS; i++)
{
tls_access();
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
tls_elapsed = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) * 1e-9;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
for (i = 0; i < LOOPS; i++)
{
global_access();
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
global_elapsed = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) * 1e-9;
printf("TLS ticks per 1000 loops: %.10f Global ticks per 1000 loops: %.10f\n",
(tls_elapsed / loops) * 1000, (global_elapsed / loops) * 1000);
}
static __thread int tlsvar __attribute__((tls_model ("initial-exec")));
//int globalvar __attribute__ ((visibility ("hidden")));
int globalvar;
int tls_access(void)
{
return tlsvar;
}
void set_tls(int v)
{
tlsvar = v;
}
int global_access(void)
{
return globalvar;
}