Proposal on how to study g++ compile-time regression and progression since 2.95.2 and initial analysis

Tue Feb 27 19:33:00 GMT 2001

http://gcc.gnu.org/gcc-3.0/criteria.html says that we will study the
compile-time performance for C++ in 2.95.2 and 3.0 release candidates
on the Primary Evaluation Platforms but does not list any particular
representative sources for study.

I propose two possible representative sources and give current results
for a Secondary Evaluation Platform.  I think it is quite important to
break this study into two parts: header processing speed (since some
C++ projects are clearly dominated by many small source files each
implementing a small part of a total program or library) and code
generation speed (since some C++ projects are clearly dominated by
template crunching and other heavy lifting, etc).

/usr/bin/g++ is ``gcc version 2.95.2 19991024 (release)''.
/usr/local/beta-gcc/bin/g++ is ``gcc version 3.0 20010227
(prerelease)'' [made soon *after* checking was *disabled* on the 3.0
branch].  The platform was i386-unknown-freebsd4.2.  Both compilers
were built with equivalent BOOTSTRAP flags (-O2 -g).

1. Proposed C++ test one studies the time/memory to process an
extended, but typical and standard, set of header files and is
generated as follows (note: we preprocess with the 3.0 release
candidate but must do a bit of fix-up since 2.95.2 didn't support
__builtin_va_list and legal static_cast<> usage appears to have been
relaxed somewhat between releases):

; /usr/local/beta-gcc/bin/g++ -E \
    $srcdir/libstdc++-v3/testsuite/17_intro/headers.cc | \
  sed 's,__builtin_va_list,void*,' | \
  sed 's,static_cast<const,static_cast<,' >cc_test1.i

Current results are mixed:

; limit datasize 14m
; time /usr/bin/g++ -O2 -c cc_test1.i
     5r     3.8u     0.1s       /usr/bin/g++ -O2 -c cc_test1.i
; limit datasize 4m
; time /usr/local/beta-gcc/bin/g++ -O2 -c cc_test1.i
    12r     6.6u     0.1s       /usr/local/beta-gcc/bin/g++ -O2 -c cc_test1.i

[AFAIK, the only semi-portable way to check memory usage under POSIX
 is to set a small limit in the shell (the syntax above is different
 than your favorite shell); attempt compilation; and try again with a
 larger limit.  The above numbers were found by hand but I see how to
 easily automate this under a portable /bin/sh script without any
 problem.]

Runtime performance has gotten worse beyond the specification in the
release criteria document; but memory usage for merely processing
standard header declarations is looking awesome.  Great work guys!
(However, I could send some bizarre, recursive, inline template code
that totally blows up 3.0 verses 2.95.2 but I don't think that use is
anywhere near typical so I won't unless someone asks for it ;-).

1a. In an alternate version of this test, we could force both
compilers to use the actual C++ headers shipped (or to be shipped)
with the release and do all required preprocessing on those headers as
well.  At first glance, this is quite unfair to 3.0 since, e.g. on my
platform, <iostream> alone pulls in about 10x lines under 3.0 as
compared to 2.95.2 (as measured with 'g++ -E [...] | wc').  However,
this is the baseline performance users will actually see in day-to-day
use.  I.e. Unfair or not, I think most g++ users, especially those
most prone to complain about a slowdown in compile time, will test the
compiler versions against one another with actual application source
code which uses standard headers and not preprocessed code.  In
addition, when one looks at the number of lines of non-blank code
produced when $srcdir/gcc/testsuite/g++.old-deja/g++.other/headers1.C
(which is not as exhaustive as the above standard header processing
test but contains only those headers available under both 2.95.2 and
3.0) is preprocessed by each version of the compiler, 3.0 is only
about 33% more lines than 2.95.2 (23618 verses 17225 lines).

; cp $srcdir/gcc/testsuite/g++.old-deja/g++.other/headers1.C cc_test1.C

Current results:

; limit datasize 12m
; time /usr/bin/g++ -O2 -c cc_test1.C
     3r     3.0u     0.1s       /usr/bin/g++ -O2 -c cc_test1.C
; limit datasize 5m
; time /usr/local/beta-gcc/bin/g++ -O2 -c cc_test1.C
     7r     6.6u     0.2s       /usr/local/beta-gcc/bin/g++ -O2 -c cc_test1.C

2. Proposed C++ test two studies time/memory to produce object code
from C++ code instead of focusing on header processing speed as in C++
test one.  Any large body of representative C++ code that uses no or
few standard headers could work for this test.  stepanov_v1p2.C would
appear to be too small.  POOMA would appear to fit the bill especially
since it is already being used as a real-world regression test for
gcc3.  (I was able to build POOMA against gcc 2.95.2 without any
problems but it doesn't appear to offer a configuration for gcc3 yet.
Since I knew someone was working on this already, I only spent a
little time monkeying around with the configuration without success.)

The POOMA build procedure already reports the time required to build
various benchmarks (broken down to a per-file compilation).  Luckily,
they implemented that feature as a hook point.

Instead of using /usr/bin/time at that hook point, a wrapper script
may be placed around the standard g++ invocation to attempt
compilation under different memory constraints (using the same
technique hinted at above).  The script could report both the time and
memory usage of the compilation that succeeded instead of just the
time.  Comparisons between compiler versions, as shown above for
proposed C++ test one, would then be possible.

Comments on this analysis approach and/or the proposal to split the
compile-time performance study or the particular tests?  I should have
a script to automate testing and reporting of "proposed C++ test one"
using the technique outlined above quite soon.

Regards,
Loren