This is the mail archive of the
binutils@sources.redhat.com
mailing list for the binutils project.
Incompatibility between GNU-ld and SUN's ld.so.1
- From: "Christian Ehrhardt" <ehrhardt at mathematik dot uni-ulm dot de>
- To: binutils at sources dot redhat dot com
- Date: Tue, 24 Sep 2002 18:26:51 +0200
- Subject: Incompatibility between GNU-ld and SUN's ld.so.1
Hi,
first: I'd appreciate to be CC'ed on replies but I'll try to follow
the thread in the archives.
[ I still think this is a Problem of Suns ld.so.1 and I have an open
CALL with Sun. However, as this problem is triggered by libstdc++
and libgcc_s and the Sun behaviour dates back to Solaris 7 (or even
earlier) it would be helpful if GNU-ld could work around this problem.
]
Here's the relevant part of my report sent to SUN (I guess you'd
prefere to use Makefile instead of Makefile.sun. However, note that
using -nostdlib will cause a different crash due to a missing exit()):
----------------- cut here --------------------------------------------
SUMMARY DESCRIPTION: ld.so.1 fails to relocate certain shared libraries
DETAILED DESCRIPTION:
The dynamic runtime linker fails to relocate valid shared libraries
generated by recent versions of GNU-ld. /usr/local/bin/ld is from
the GNU binutils-2.13 package:
turing$ /usr/local/bin/ld -v
GNU ld version 2.13
How to reproduce:
Script started on Fri Sep 20 19:46:43 2002
turing$ cat t2.c
struct object {
int i;
int j;
int k;
int l;
};
int func ()
{
static struct object x;
struct object * p;
p = &x;
p->i = 3;
return 0;
}
turing$ cat t3.c
extern int func();
int main ()
{
func();
return 0;
}
turing$ cat Makefile.sun
.PHONY: clean
all: a.out
t2.o: t2.c
CC -c -KPIC t2.c
libt2.so: t2.o
/usr/local/bin/ld -G t2.o -olibt2.so
t3.o: t3.c
CC -c t3.c
a.out: libt2.so t3.o
CC -lt2 t3.o -L. -R.
clean:
rm -f *.so *.o a.out
turing$ cat Makefile
.PHONY: clean
all: a.out
t2.o: t2.c
gcc -c -fPIC t2.c
libt2.so: t2.o
/usr/local/bin/ld -nostdlib -shared -olibt2.so t2.o
a.out: libt2.so t3.c
gcc -nostdlib t3.c libt2.so -L. -R.
clean:
rm -f *.so *.o a.out core
turing$ make -f Makefile.sun clean
rm -f *.so *.o a.out
turing$ make -f Makefile.sun
CC -c -KPIC t2.c
/usr/local/bin/ld -G t2.o -olibt2.so
CC -c t3.c
CC -lt2 t3.o -L. -R.
turing$ a.out
Segmentation Fault (core dumped)
turing$ exit
script done on Fri Sep 20 19:47:32 2002
Note that I compiled everything with /opt/SUNWspro/bin/CC to
rule out bugs in gcc. This problem can be reproduced using
the second Makefile and gcc with an even smaller resulting
executable.
Analyzing the core shows the following:
turing$ pmap core | grep libt2.so
FF370000 8K read/exec libt2.so
FF380000 8K read/write/exec libt2.so
Script started on Fri Sep 20 19:53:10 2002
turing$ gdb a.out core
GNU gdb 5.0
[ ... ]
#0 0xff370318 in __1cEfunc6F_i_ ()
from /home/thales/ehrhardt/ld.so.1-bug/./libt2.so
(gdb) disass
Dump of assembler code for function __1cEfunc6F_i_:
0xff3702e0 <__1cEfunc6F_i_>: save %sp, -112, %sp
0xff3702e4 <__1cEfunc6F_i_+4>: call 0xff3702ec <__1cEfunc6F_i_+12>
0xff3702e8 <__1cEfunc6F_i_+8>: sethi %hi(0), %o1
0xff3702ec <__1cEfunc6F_i_+12>: mov %o1, %o1 ! 0x0
0xff3702f0 <__1cEfunc6F_i_+16>: add %o7, %o1, %o1
0xff3702f4 <__1cEfunc6F_i_+20>: st %o1, [ %fp + -12 ]
0xff3702f8 <__1cEfunc6F_i_+24>: sethi %hi(0x10000), %o0
0xff3702fc <__1cEfunc6F_i_+28>: or %o0, 0xc4, %o0 ! 0x100c4
0xff370300 <__1cEfunc6F_i_+32>: add %o1, %o0, %l7
0xff370304 <__1cEfunc6F_i_+36>: sethi %hi(0), %g1
0xff370308 <__1cEfunc6F_i_+40>: or %g1, 4, %g1 ! 0x4
0xff37030c <__1cEfunc6F_i_+44>: ld [ %l7 + %g1 ], %o0
0xff370310 <__1cEfunc6F_i_+48>: st %o0, [ %fp + -8 ]
0xff370314 <__1cEfunc6F_i_+52>: mov 3, %o1
0xff370318 <__1cEfunc6F_i_+56>: st %o1, [ %o0 ]
0xff37031c <__1cEfunc6F_i_+60>: clr [ %fp + -4 ]
0xff370320 <__1cEfunc6F_i_+64>: mov %g0, %i0
0xff370324 <__1cEfunc6F_i_+68>: ret
0xff370328 <__1cEfunc6F_i_+72>: restore
0xff37032c <__1cEfunc6F_i_+76>: mov %g0, %i0
0xff370330 <__1cEfunc6F_i_+80>: ret
0xff370334 <__1cEfunc6F_i_+84>: restore
---Type <return> to continue, or q <return> to quit---
End of assembler dump.
(gdb) bt
#0 0xff370318 in __1cEfunc6F_i_ ()
from /home/thales/ehrhardt/ld.so.1-bug/./libt2.so
#1 0x10884 in main ()
(gdb) info reg o0
o0 0xff370000 -13172736
(gdb) info reg o1
o1 0x3 3
(gdb) info reg l7
l7 0xff3803a8 -13106264
(gdb) info reg g1
g1 0x4 4
(gdb) turing$ exit
script done on Fri Sep 20 19:54:46 2002
Looking back at function func from t2.c shows:
int func ()
{
static struct object x;
struct object * p;
p = &x;
p->i = 3; <====== crash is here.
return 0;
}
The value of the pointer p is obviously in register o0, i.e. it is
0xff370000. This is precisely the BASE address where the shared library
libt2.so has been mapped to. Register l7 contains the base address of
the .got section (the global offset table of this library). The
questionable address is loaded from offset 4 in the global offset table.
Looking at the contents of the global offset table in the shared
library shows the following:
turing$ elfdump -G libt2.so
Global Offset Table: 2 entries
ndx addr value reloc addend symbol
[00000] 000103a8 00010338 R_SPARC_NONE 00000000
[00001] 000103ac 000103b0 R_SPARC_RELATIVE 00000000
turing$
Note that we have indeed
%l7(0xff3803a8) = Offset of .got(0x000103a8) + library base address(0xFF370000)
The Solaris Linker and Libraries Guide (freshly downloaded from
docs.sun.com) has this explanation for R_SPARC_RELATIVE:
|Some relocation types have semantics beyond simple calculation:
|[ ... ]
|R_SPARC_RELATIVE
| Created by the link-editor for dynamic objects. Its offset member
| gives the location within a shared object that contains a value
| representing a relative address. The runtime linker computes the
| corresponding virtual address by adding the virtual address at which
| the shared object is loaded to the relative address. Relocation
| entries for this type must specify 0 for the symbol table index.
This means that the value at offset 0x4 in the global offset
Table should be
library base address + Value in .got
0xFF370000 + 0x000103B0 = 0xFF3803B0
after relocation. However looking at the value of register o0 we
see that the .got section obviously contains the value 0xFF37B000
instead.
----------------- cut here --------------------------------------------
The basic problem is the interpretation of the meaning of
R_SPARC_RELATIVE. Recall the explanation from above:
[ The same document also states that the calculation performed by
R_SPARC_RELATIVE is B+A (see Terminologie below). IMHO this is
overruled by the first sentence quoted below.
]
|Some relocation types have semantics beyond simple calculation:
|[ ... ]
|R_SPARC_RELATIVE
| Created by the link-editor for dynamic objects. Its offset member
| gives the location within a shared object that contains a value
| representing a relative address. The runtime linker computes the
| corresponding virtual address by adding the virtual address at which
| the shared object is loaded to the relative address. Relocation
| entries for this type must specify 0 for the symbol table index.
This explanation is obviously derived from the SHT_REL case where
the ``relative address'' explained above and the implicit addend
are the same.
Terminologie:
* B is the baseaddress where the library is loaded
* A is the EXPLICIT addend
* V is the value stored in the shared library where an implicit addend
would reside (IMHO this is what ``relative address'' above describes).
The SUN-Linker used to always calculate V + B + A for R_SPARC_RELATIVE
relocations, however, starting with Solaris 7 and the advent of
DT_RELACOUNT it calculates only B+A (ignoring V completly) iff
DT_RELACOUNT is actually supplied and explicit addends are used.
ld could work around this by always storing the relative address in
the addend and setting V to 0 if explicit addends are used. This is
what SUN's linker has done for quite some time.
Note: This incompatibility is the cause of recent gcc Bugreports
that see crashes in __register_info_frame_bases when starting any
C++ program.
Regards Christian
--
THAT'S ALL FOLKS!