Clang is using the wrong memory model

Agner Fog agner@agner.org
Sun Aug 18 11:37:00 GMT 2019


Thanks a lot for your help in clarifying this.

When I complained here about the wasteful 64-bit addresses you said that 
it was an LLVM issue. When I complained to LLVM they said it was a 
Cygwin issue, and that you were using the wrong memory model.

All this confusion is due to a terrible lack of documentation of everything.
I had to do a lot of reverse engineering to figure out what is 
happening. What I have found out so far is listed below. Much of this is 
undocumented. Obviously, I would like to know if any or this is wrong or 
if specific documentation is available other than the SysV ABI and 
Windows ABI:

* Cygwin is using its own loader which is different from the Windows 
loader.
* The Cygwin loader emulates the behavior of Linux shared objects. This 
includes the ability to directly access a variable inside a DLL
* Access to a variable in a different DLL requires a 64-bit address. 
This is obtained by using the medium memory model with a gcc or Clang 
compiler.
* The small memory model works differently on different targets. A 
-mcmodel=small with a Linux target puts everything below 2GB addresses. 
32-bit absolute addresses are allowed. -mcmodel=small with a Windows or 
Mac target allows addresses above 2GB, but limits the distance between 
code and data in the same executable to 2GB. 32-bit absolute addresses 
are not allowed. 32-bit relative addresses are used instead.
* The memory models work differently in gcc an Clang. Gcc with a medium 
or large memory model is using 64-bit address tables to access a 
variable in a different C/CPP file. Clang with a medium or large memory 
model is using 64-bit addresses not only for external variables, but 
also for local static data. This includes floating point constants, 
string constants, array initializers, jump tables, global variables, and 
more.
* Cygwin uses a medium memory model by default. The medium memory model 
is necessary only for a program that makes direct access to a variable 
in a different DLL. The medium memory model is wasteful, and more so 
with Clang than with gcc.

Now I am speculating what we can do to avoid the wasteful 64-bit 
address-load instructions to improve the performance of Cygwin programs.

We can improve performance by using the small memory model when 
possible. The medium memory model is needed only for programs that link 
to a variable in a different DLL. The DLL that contains the link target 
does not need the medium memory model.

Direct access to a variable in a different DLL is considered bad 
programming practice by modern standards. This should occur only in old 
Linux code.

A link to a variable in a different DLL may be replaced by function 
calls (this is done with errno). In some cases, static linking can be an 
efficient alternative.

It would be helpful if the Cygwin loader could print the name of the 
offending variable when relocation fails with the small memory model. 
This could help programmers remove any obstacles to using the more 
efficient small memory model.


Agner


On 17/08/2019 10.16, Corinna Vinschen wrote:
> Oe Aug 17 07:31, Agner Fog wrote:
>>> So errno was a bad example but you can try accessing e.g. __ctype_ptr__,
>>> __progname, optarg, h_errno, or use FE_DFL_ENV from another DLL, just
>>> for kicks.
>> __ctype_ptr__ is a function
>>
>> h_errno works like errno with an imported function
>>
>> FE_DFL_ENV is a macro
>>
>> __progname and optarg are local variables to each exe or dll
> That would contradict what, e.g., __progname is for.  Here's a test:
>
> $ cat > dll.c <<EOF
> #include <stdio.h>
>
> extern char *__progname;
>
> void
> printprog ()
> {
>    printf ("progname: %s\n", __progname);
> }
> EOF
> $ cat > main.c <<EOF
> extern void printprog();
>
> int
> main ()
> {
>    printprog ();
> }
> EOF
> $ uname -a
> CYGWIN_NT-10.0 vmbert10 3.1.0(0.340/5/3) 2019-08-16 14:36 x86_64 Cygwin
>
> Lets try the medium model first:
>
>    $ gcc -g -shared -mcmodel=medium -o dll.dll dll.c
>    $ gcc -g -mcmodel=medium -o main main.c dll.dll
>    $ ./main
>    progname: main
>
> Now let's try the small model:
>
>    $ gcc -g -shared -mcmodel=small -o dll.dll dll.c
>    $ gcc -g -mcmodel=small -o main main.c dll.dll
>    $ ./main
>    Cygwin runtime failure: /home/corinna/main.exe: Invalid relocation.  Offset
>    0xfffffffd80348989 at address 0x40000103b doesn't fit into 32 bits
>
> Now let's try without explicit mcmodel on the CLI:
>
>    $ gcc -g -shared -o dll.dll dll.c
>    $ gcc -g -o main main.c dll.dll
>    $ ./main
>    progname: main
>
>> gcc is using the small memory model by default in Cygwin64, and it works.
> No, it's not, see above.
>
>> clang is using the small memory by default when cross-compiling for a Cygwin64 target from Linux, and it works.
> ...in *your* example code.
>
>
> Corinna
>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list