thread_local performance using g++ for cygwin

Arthur Norman acn1@cam.ac.uk
Mon May 6 07:09:00 GMT 2019


The attached code tried two loops each of which just calls a function that 
increments an integer variable. One loop is a simple variable, the other 
has the thread_local qualifier. I put in ugly annotations to prevent g++ 
from inlining the functions even though I compile with -O3, but in real 
cases separate compilation forces each TL access to be independent.
The timing as between the two cases is EXTREME on cygwin (both 32 and 
64-bit) however g++ on Linux and the Microsoft compiler on Windows both 
manage to keep the base of thread-local regions in a segment register in 
such a way that the thread_local overhead is minimal. The cygwin 
thread_local overhead is large enough to be very visible in my code as a 
whole. I can see that changing to use a segment register might be a 
painful API change even if it was feasible, but has there been any 
consideration of it?
Note that x86_64-w64-mingw32-g++ and clang also do not use the segment 
register so suffer the significant speed penalty, so maybe it would be 
hard to match what Microsoft manage?

Sample output:
     simple 1.265
     thread_local 33.219


            Arthur
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tltime.cpp
URL: <http://cygwin.com/pipermail/cygwin/attachments/20190506/7b4dc048/attachment.ksh>
-------------- next part --------------

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


More information about the Cygwin mailing list