This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH,PE] Allow .DEF file parser to handle 'foreign' language symbols.

From: Dave Korn <dave dot korn dot cygwin at googlemail dot com>
To: "binutils at sourceware dot org" <binutils at sourceware dot org>
Date: Tue, 26 May 2009 17:33:54 +0100
Subject: [PATCH,PE] Allow .DEF file parser to handle 'foreign' language symbols.

    Hi all,

  Unless anyone objects, I intend to apply the attached patch within the next
24 hours or so, subject to some further testing on a number of targets
(including Cygwin, MinGW and CeGCC).  I think I'm competent with Bison but
wouldn't mind if someone even more experienced cast an eye over that part of
the patch; I think I did right by using left-recursion and there aren't any
new shift-reduce conflicts, but there could be deeper subtleties I'm not aware of.

  The purpose of the patch is to allow the "aligncomm" .drectve-section
command to parse the kind of non-C-language-family symbols emitted by the
gfortran compiler (and maybe other languages too), so that those languages can
use the PE aligned common extension.  The requirement for this support emerged
during testing of the GCC changes to enable this feature in the compiler;
being a backend change, it's there for all languages so we should aim to
support them all.

  The complication is caused by the presence of '.' in non-C symbols.  In .DEF
file syntax, the period is used primarily as a separator when specifying a
"fully-qualified" ex/import symbol in "MODULE-NAME.EXTERNAL-NAME" format.
Other 'unusual' characters, "$:-_?/@", are allowed in identifiers, but a
period delimits the ID token.

  We could almost but not quite use the dot_name production:

dot_name: ID
	| dot_name '.' ID
	;

... except that it's possible for the character immediately after the period
to be a digit, which isn't allowed as the first character of an ID token, and
indeed forces the lexer to produce a NUMBER token - and that's a second
problem, because we then only get a numeric value for the digit(s), and so
wouldn't be able to discriminate e.g. "_symbol.1_" and "_symbol.001_".

  So there are two changes in the attached patch.  First, the lexer now
returns a string of digits in verbatim char* form, as a DIGITS token, and
there is an elementary production from DIGITS to NUMBER (which is now a type,
not a token), effectively just hoisting the strtoul call out of the lexer and
into the grammar, but thereby exposing the raw DIGITS token string to rules
that want it.  Secondly, I added a production "anylang_id", to compose the
various tokens into which a non-C symbol will be broken down.

  This doesn't yet allow the use of foreign symbols in IMPORT or EXPORT
directives; that's a whole nother can of worms for another day.  But it
provides the infrastructure we'll neeed if/as and when we do decide to add
that support.

ld/ChangeLog

	* deffilep.y (%union):  Add new string-type semantic value 'digits'.
	(%token):  Remove NUMBER as token, add DIGITS.
	(%type):  Add NUMBER as type.  Add new id types anylang_id, opt_id.
	(ALIGNCOMM):  Parse an anylang_id instead of a plain ID.
	(anylang_id):  New production.
	(opt_digits):  Likewise.
	(opt_id):  Likewise.
	(NUMBER):  Likewise.
	(def_lex):  Return strings of digits in raw string form as DIGITS
	token, instead of converting to numeric integer type.

ld/testsuite/ChangeLog

	* ld-pe/non-c-lang-syms.c:  New dump test source file.
	* ld-pe/non-c-lang-syms.d:  New dump test pattern file.
	* ld-pe/pe.exp:  Run new "foreign symbol" test.

  Please shout if this isn't ok by all concerned!

    cheers,
      DaveK

Index: ld/deffilep.y
===================================================================
RCS file: /cvs/src/src/ld/deffilep.y,v
retrieving revision 1.26
diff -p -u -r1.26 deffilep.y
--- ld/deffilep.y	19 May 2009 16:08:07 -0000	1.26
+++ ld/deffilep.y	26 May 2009 15:00:17 -0000
@@ -103,6 +103,7 @@ static const char *lex_parse_string_end 
 %union {
   char *id;
   int number;
+  char *digits;
 };
 
 %token NAME LIBRARY DESCRIPTION STACKSIZE_K HEAPSIZE CODE DATAU DATAL
@@ -110,10 +111,12 @@ static const char *lex_parse_string_end 
 %token PRIVATEU PRIVATEL ALIGNCOMM
 %token READ WRITE EXECUTE SHARED NONAMEU NONAMEL DIRECTIVE
 %token <id> ID
-%token <number> NUMBER
+%token <digits> DIGITS
+%type  <number> NUMBER
+%type  <digits> opt_digits
 %type  <number> opt_base opt_ordinal
 %type  <number> attr attr_list opt_number exp_opt_list exp_opt
-%type  <id> opt_name opt_equal_name dot_name 
+%type  <id> opt_name opt_equal_name dot_name anylang_id opt_id
 
 %%
 
@@ -135,7 +138,7 @@ command: 
 	|	VERSIONK NUMBER { def_version ($2, 0);}
 	|	VERSIONK NUMBER '.' NUMBER { def_version ($2, $4);}
 	|	DIRECTIVE ID { def_directive ($2);}
-	|	ALIGNCOMM ID ',' NUMBER { def_aligncomm ($2, $4);}
+	|	ALIGNCOMM anylang_id ',' NUMBER { def_aligncomm ($2, $4);}
 	;
 
 
@@ -245,7 +248,25 @@ dot_name: ID		{ $$ = $1; }
 	    $$ = name;
 	  }
 	;
-	
+
+anylang_id: ID		{ $$ = $1; }
+	| anylang_id '.' opt_digits opt_id
+	  {
+	    char *id = xmalloc (strlen ($1) + 1 + strlen ($3) + strlen ($4) + 1);
+	    sprintf (id, "%s.%s%s", $1, $3, $4);
+	    $$ = id;
+	  }
+	;
+
+opt_digits: DIGITS	{ $$ = $1; }
+	|		{ $$ = ""; }
+	;
+
+opt_id: ID		{ $$ = $1; }
+	|		{ $$ = ""; }
+	;
+
+NUMBER: DIGITS		{ $$ = strtoul ($1, 0, 0); }
 
 %%
 
@@ -1010,11 +1031,11 @@ def_lex (void)
 	}
       if (c != EOF)
 	def_ungetc (c);
-      yylval.number = strtoul (buffer, 0, 0);
+      yylval.digits = xstrdup (buffer);
 #if TRACE
-      printf ("lex: `%s' returns NUMBER %d\n", buffer, yylval.number);
+      printf ("lex: `%s' returns DIGITS\n", buffer);
 #endif
-      return NUMBER;
+      return DIGITS;
     }
 
   if (ISALPHA (c) || strchr ("$:-_?@", c))
Index: ld/testsuite/ld-pe/non-c-lang-syms.d
===================================================================
RCS file: ld/testsuite/ld-pe/non-c-lang-syms.d
diff -N ld/testsuite/ld-pe/non-c-lang-syms.d
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ ld/testsuite/ld-pe/non-c-lang-syms.d	26 May 2009 15:00:17 -0000
@@ -0,0 +1,10 @@
+#...
+[0-9A-Fa-f]{6,14}[08]0 B _?test\$equiv\.eq\.
+[0-9A-Fa-f]{6,14}[02468aAcCeE]0 B _?test\$equiv\.eq\.100
+[0-9A-Fa-f]{6,14}[0-9A-Fa-f]0 B _?test\$equiv\.eq\.1_
+[0-9A-Fa-f]{6,14}[048cC]0 B _?test\$equiv\.eq\._
+[0-9A-Fa-f]{6,14}[08]0 B _?test_equiv\.eq\.
+[0-9A-Fa-f]{6,14}[02468aAcCeE]0 B _?test_equiv\.eq\.100
+[0-9A-Fa-f]{6,14}[0-9A-Fa-f]0 B _?test_equiv\.eq\.1_
+[0-9A-Fa-f]{6,14}[048cC]0 B _?test_equiv\.eq\._
+#...
Index: ld/testsuite/ld-pe/non-c-lang-syms.s
===================================================================
RCS file: ld/testsuite/ld-pe/non-c-lang-syms.s
diff -N ld/testsuite/ld-pe/non-c-lang-syms.s
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ ld/testsuite/ld-pe/non-c-lang-syms.s	26 May 2009 15:00:17 -0000
@@ -0,0 +1,15 @@
+
+main:
+_main:
+	nop
+
+	.comm   _test_equiv.eq.1_, 16, 4
+	.comm   _test_equiv.eq.100, 16, 5
+	.comm   _test_equiv.eq._, 16, 6
+	.comm   _test_equiv.eq., 16, 7
+
+	.comm   _test$equiv.eq.1_, 16, 4
+	.comm   _test$equiv.eq.100, 16, 5
+	.comm   _test$equiv.eq._, 16, 6
+	.comm   _test$equiv.eq., 16, 7
+
Index: ld/testsuite/ld-pe/pe.exp
===================================================================
RCS file: /cvs/src/src/ld/testsuite/ld-pe/pe.exp,v
retrieving revision 1.13
diff -p -u -r1.13 pe.exp
--- ld/testsuite/ld-pe/pe.exp	19 May 2009 16:08:08 -0000	1.13
+++ ld/testsuite/ld-pe/pe.exp	26 May 2009 15:00:17 -0000
@@ -69,3 +69,10 @@ run_dump_test "longsecn-4"
 run_dump_test "longsecn-5"
 
 run_dump_test "orphan"
+
+set foreign_sym_test {
+  {"non-C aligned common" "" "" {non-c-lang-syms.s}
+   {{nm -C non-c-lang-syms.d}} "non-c-lang-syms.x"}
+}
+
+run_ld_link_tests $foreign_sym_test

Follow-Ups:
- Re: [PATCH,PE] Allow .DEF file parser to handle 'foreign' language symbols.
  - From: Dave Korn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]