blob: 0d93e10bcf127121db89cc468b966e63b3a041d1 [file] [log] [blame]
Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
Reinhold P. Weicker
Siemens AG, E STE 35
Postfach 3240
D-8520 Erlangen
Germany (West)
The Dhrystone benchmark program [1] has become a popular benchmark for
CPU/compiler performance measurement, in particular in the area of
minicomputers, workstations, PC's and microprocesors. It apparently
satisfies a need for an easy-to-use integer benchmark; it gives a first
performance indication which is more meaningful than MIPS numbers
which, in their literal meaning (million instructions per second),
cannot be used across different instruction sets (e.g. RISC vs. CISC).
With the increasing use of the benchmark, it seems necessary to
reconsider the benchmark and to check whether it can still fulfill this
function. Version 2 of Dhrystone is the result of such a re-
evaluation, it has been made for two reasons:
o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal
and C have been distributed by Reinhold Weicker via floppy disk.
However, the version that was used most often for benchmarking has
been the version made by Rick Richardson by another translation from
the Ada version into the C programming language, this has been the
version distributed via the UNIX network Usenet [2].
There is an obvious need for a common C version of Dhrystone, since C
is at present the most popular system programming language for the
class of systems (microcomputers, minicomputers, workstations) where
Dhrystone is used most. There should be, as far as possible, only
one C version of Dhrystone such that results can be compared without
restrictions. In the past, the C versions distributed by Rick
Richardson (Version 1.1) and by Reinhold Weicker had small (though
not significant) differences.
Together with the new C version, the Ada and Pascal versions have
been updated as well.
o As far as it is possible without changes to the Dhrystone statistics,
optimizing compilers should be prevented from removing significant
statements. It has turned out in the past that optimizing compilers
suppressed code generation for too many statements (by "dead code
removal" or "dead variable elimination"). This has lead to the
danger that benchmarking results obtained by a naive application of
Dhrystone - without inspection of the code that was generated - could
become meaningless.
The overall policiy for version 2 has been that the distribution of
statements, operand types and operand locality described in [1] should
remain unchanged as much as possible. (Very few changes were
necessary; their impact should be negligible.) Also, the order of
statements should remain unchanged. Although I am aware of some
critical remarks on the benchmark - I agree with several of them - and
know some suggestions for improvement, I didn't want to change the
benchmark into something different from what has become known as
"Dhrystone"; the confusion generated by such a change would probably
outweight the benefits. If I were to write a new benchmark program, I
wouldn't give it the name "Dhrystone" since this denotes the program
published in [1]. However, I do recognize the need for a larger number
of representative programs that can be used as benchmarks; users should
always be encouraged to use more than just one benchmark.
The new versions (version 2.1 for C, Pascal and Ada) will be
distributed as widely as possible. (Version 2.1 differs from version
2.0 distributed via the UNIX Network Usenet in March 1988 only in a few
corrections for minor deficiencies found by users of version 2.0.)
Readers who want to use the benchmark for their own measurements can
obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX
format) from the author.
In general, version 2 follows - in the parts that are significant for
performance measurement, i.e. within the measurement loop - the
published (Ada) version and the C versions previously distributed.
Where the versions distributed by Rick Richardson [2] and Reinhold
Weicker have been different, it follows the version distributed by
Reinhold Weicker. (However, the differences have been so small that
their impact on execution time in all likelihood has been negligible.)
The initialization and UNIX instrumentation part - which had been
omitted in [1] - follows mostly the ideas of Rick Richardson [2].
However, any changes in the initialization part and in the printing of
the result have no impact on performance measurement since they are
outside the measaurement loop. As a concession to older compilers,
names have been made unique within the first 8 characters for the C
version.
The original publication of Dhrystone did not contain any statements
for time measurement since they are necessarily system-dependent.
However, it turned out that it is not enough just to inclose the main
procedure of Dhrystone in a loop and to measure the execution time. If
the variables that are computed are not used somehow, there is the
danger that the compiler considers them as "dead variables" and
suppresses code generation for a part of the statements. Therefore in
version 2 all variables of "main" are printed at the end of the
program. This also permits some plausibility control for correct
execution of the benchmark.
At several places in the benchmark, code has been added, but only in
branches that are not executed. The intention is that optimizing
compilers should be prevented from moving code out of the measurement
loop, or from removing code altogether. Statements that are executed
have been changed in very few places only. In these cases, only the
role of some operands has been changed, and it was made sure that the
numbers defining the "Dhrystone distribution" (distribution of
statements, operand types and locality) still hold as much as possible.
Except for sophisticated optimizing compilers, execution times for
version 2.1 should be the same as for previous versions.
Because of the self-imposed limitation that the order and distribution
of the executed statements should not be changed, there are still cases
where optimizing compilers may not generate code for some statements.
To a certain degree, this is unavoidable for small synthetic
benchmarks. Users of the benchmark are advised to check code listings
whether code is generated for all statements of Dhrystone.
Contrary to the suggestion in the published paper and its realization
in the versions previously distributed, no attempt has been made to
subtract the time for the measurement loop overhead. (This calculation
has proven difficult to implement in a correct way, and its omission
makes the program simpler.) However, since the loop check is now part
of the benchmark, this does have an impact - though a very minor one -
on the distribution statistics which have been updated for this
version.
In this section, all changes are described that affect the measurement
loop and that are not just renamings of variables. All remarks refer to
the C version; the other language versions have been updated similarly.
In addition to adding the measurement loop and the printout statements,
changes have been made at the following places:
o In procedure "main", three statements have been added in the non-
executed "then" part of the statement
if (Enum_Loc == Func_1 (Ch_Index, 'C'))
they are
strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
Int_2_Loc = Run_Index;
Int_Glob = Run_Index;
The string assignment prevents movement of the preceding assignment
to Str_2_Loc (5'th statement of "main") out of the measurement loop
(This probably will not happen for the C version, but it did happen
with another language and compiler.) The assignment to Int_2_Loc
prevents value propagation for Int_2_Loc, and the assignment to
Int_Glob makes the value of Int_Glob possibly dependent from the
value of Run_Index.
o In the three arithmetic computations at the end of the measurement
loop in "main ", the role of some variables has been exchanged, to
prevent the division from just cancelling out the multiplication as
it was in [1]. A very smart compiler might have recognized this and
suppressed code generation for the division.
o For Proc_2, no code has been changed, but the values of the actual
parameter have changed due to changes in "main".
o In Proc_4, the second assignment has been changed from
Bool_Loc = Bool_Loc | Bool_Glob;
to
Bool_Glob = Bool_Loc | Bool_Glob;
It now assigns a value to a global variable instead of a local
variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not
used afterwards.
o In Func_1, the statement
Ch_1_Glob = Ch_1_Loc;
was added in the non-executed "else" part of the "if" statement, to
prevent the suppression of code generation for the assignment to
Ch_1_Loc.
o In Func_2, the second character comparison statement has been changed
to
if (Ch_Loc == 'R')
('R' instead of 'X') because a comparison with 'X' is implied in the
preceding "if" statement.
Also in Func_2, the statement
Int_Glob = Int_Loc;
has been added in the non-executed part of the last "if" statement,
in order to prevent Int_Loc from becoming a dead variable.
o In Func_3, a non-executed "else" part has been added to the "if"
statement. While the program would not be incorrect without this
"else" part, it is considered bad programming practice if a function
can be left without a return value.
To compensate for this change, the (non-executed) "else" part in the
"if" statement of Proc_3 was removed.
The distribution statistics have been changed only by the addition of
the measurement loop iteration (1 additional statement, 4 additional
local integer operands) and by the change in Proc_4 (one operand
changed from local to global). The distribution statistics in the
comment headers have been updated accordingly.
The string operations (string assignment and string comparison) have
not been changed, to keep the program consistent with the original
version.
There has been some concern that the string operations are over-
represented in the program, and that execution time is dominated by
these operations. This was true in particular when optimizing
compilers removed too much code in the main part of the program, this
should have been mitigated in version 2.
It should be noted that this is a language-dependent issue: Dhrystone
was first published in Ada, and with Ada or Pascal semantics, the time
spent in the string operations is, at least in all implementations
known to me, considerably smaller. In Ada and Pascal, assignment and
comparison of strings are operators defined in the language, and the
upper bounds of the strings occuring in Dhrystone are part of the type
information known at compilation time. The compilers can therefore
generate efficient inline code. In C, string assignemt and comparisons
are not part of the language, so the string operations must be
expressed in terms of the C library functions "strcpy" and "strcmp".
(ANSI C allows an implementation to use inline code for these
functions.) In addition to the overhead caused by additional function
calls, these functions are defined for null-terminated strings where
the length of the strings is not known at compilation time; the
function has to check every byte for the termination condition (the
null byte).
Obviously, a C library which includes efficiently coded "strcpy" and
"strcmp" functions helps to obtain good Dhrystone results. However, I
don't think that this is unfair since string functions do occur quite
frequently in real programs (editors, command interpreters, etc.). If
the strings functions are implemented efficiently, this helps real
programs as well as benchmark programs.
I admit that the string comparison in Dhrystone terminates later (after
scanning 20 characters) than most string comparisons in real programs.
For consistency with the original benchmark, I didn't change the
program despite this weakness.
When Dhrystone is used, the following "ground rules" apply:
o Separate compilation (Ada and C versions)
As mentioned in [1], Dhrystone was written to reflect actual
programming practice in systems programming. The division into
several compilation units (5 in the Ada version, 2 in the C version)
is intended, as is the distribution of inter-module and intra-module
subprogram calls. Although on many systems there will be no
difference in execution time to a Dhrystone version where all
compilation units are merged into one file, the rule is that separate
compilation should be used. The intention is that real programming
practice, where programs consist of several independently compiled
units, should be reflected. This also has implies that the compiler,
while compiling one unit, has no information about the use of
variables, register allocation etc. occuring in other compilation
units. Although in real life compilation units will probably be
larger, the intention is that these effects of separate compilation
are modeled in Dhrystone.
A few language systems have post-linkage optimization available
(e.g., final register allocation is performed after linkage). This
is a borderline case: Post-linkage optimization involves additional
program preparation time (although not as much as compilation in one
unit) which may prevent its general use in practical programming. I
think that since it defeats the intentions given above, it should not
be used for Dhrystone.
Unfortunately, ISO/ANSI Pascal does not contain language features for
separate compilation. Although most commercial Pascal compilers
provide separate compilation in some way, we cannot use it for
Dhrystone since such a version would not be portable. Therefore, no
attempt has been made to provide a Pascal version with several
compilation units.
o No procedure merging
Although Dhrystone contains some very short procedures where
execution would benefit from procedure merging (inlining, macro
expansion of procedures), procedure merging is not to be used. The
reason is that the percentage of procedure and function calls is part
of the "Dhrystone distribution" of statements contained in [1]. This
restriction does not hold for the string functions of the C version
since ANSI C allows an implementation to use inline code for these
functions.
o Other optimizations are allowed, but they should be indicated
It is often hard to draw an exact line between "normal code
generation" and "optimization" in compilers: Some compilers perform
operations by default that are invoked in other compilers only when
optimization is explicitly requested. Also, we cannot avoid that in
benchmarking people try to achieve results that look as good as
possible. Therefore, optimizations performed by compilers - other
than those listed above - are not forbidden when Dhrystone execution
times are measured. Dhrystone is not intended to be non-optimizable
but is intended to be similarly optimizable as normal programs. For
example, there are several places in Dhrystone where performance
benefits from optimizations like common subexpression elimination,
value propagation etc., but normal programs usually also benefit from
these optimizations. Therefore, no effort was made to artificially
prevent such optimizations. However, measurement reports should
indicate which compiler optimization levels have been used, and
reporting results with different levels of compiler optimization for
the same hardware is encouraged.
o Default results are those without "register" declarations (C version)
When Dhrystone results are quoted without additional qualification,
they should be understood as results obtained without use of the
"register" attribute. Good compilers should be able to make good use
of registers even without explicit register declarations ([3], p.
193).
Of course, for experimental purposes, post-linkage optimization,
procedure merging and/or compilation in one unit can be done to
determine their effects. However, Dhrystone numbers obtained under
these conditions should be explicitly marked as such; "normal"
Dhrystone results should be understood as results obtained following
the ground rules listed above.
In any case, for serious performance evaluation, users are advised to
ask for code listings and to check them carefully. In this way, when
results for different systems are compared, the reader can get a
feeling how much performance difference is due to compiler optimization
and how much is due to hardware speed.
The C version 2.1 of Dhrystone has been developed in cooperation with
Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the
"Version 1.1" distributed previously by him over the UNIX network
Usenet. Through his activity with Usenet, Rick Richardson has made a
very valuable contribution to the dissemination of the benchmark. I
also thank Chaim Benedelac (National Semiconductor), David Ditzel
(SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael
Saavedra-Barrera (UC at Berkeley) for their help with comments on
earlier versions of the benchmark.
[1]
Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming
Benchmark.
Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
[2]
Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
Informal Distribution via "Usenet", Last Version Known to me: Sept.
21, 1987
[3]
Brian W. Kernighan and Dennis M. Ritchie: The C Programming
Language.
Prentice-Hall, Englewood Cliffs (NJ) 1978