1Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules 2 3 Reinhold P. Weicker 4 Siemens AG, E STE 35 5 Postfach 3240 6 D-8520 Erlangen 7 Germany (West) 8 9 10 11 12The Dhrystone benchmark program [1] has become a popular benchmark for 13CPU/compiler performance measurement, in particular in the area of 14minicomputers, workstations, PC's and microprocesors. It apparently 15satisfies a need for an easy-to-use integer benchmark; it gives a first 16performance indication which is more meaningful than MIPS numbers 17which, in their literal meaning (million instructions per second), 18cannot be used across different instruction sets (e.g. RISC vs. CISC). 19With the increasing use of the benchmark, it seems necessary to 20reconsider the benchmark and to check whether it can still fulfill this 21function. Version 2 of Dhrystone is the result of such a re- 22evaluation, it has been made for two reasons: 23 24o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal 25 and C have been distributed by Reinhold Weicker via floppy disk. 26 However, the version that was used most often for benchmarking has 27 been the version made by Rick Richardson by another translation from 28 the Ada version into the C programming language, this has been the 29 version distributed via the UNIX network Usenet [2]. 30 31 There is an obvious need for a common C version of Dhrystone, since C 32 is at present the most popular system programming language for the 33 class of systems (microcomputers, minicomputers, workstations) where 34 Dhrystone is used most. There should be, as far as possible, only 35 one C version of Dhrystone such that results can be compared without 36 restrictions. In the past, the C versions distributed by Rick 37 Richardson (Version 1.1) and by Reinhold Weicker had small (though 38 not significant) differences. 39 40 Together with the new C version, the Ada and Pascal versions have 41 been updated as well. 42 43o As far as it is possible without changes to the Dhrystone statistics, 44 optimizing compilers should be prevented from removing significant 45 statements. It has turned out in the past that optimizing compilers 46 suppressed code generation for too many statements (by "dead code 47 removal" or "dead variable elimination"). This has lead to the 48 danger that benchmarking results obtained by a naive application of 49 Dhrystone - without inspection of the code that was generated - could 50 become meaningless. 51 52The overall policiy for version 2 has been that the distribution of 53statements, operand types and operand locality described in [1] should 54remain unchanged as much as possible. (Very few changes were 55necessary; their impact should be negligible.) Also, the order of 56statements should remain unchanged. Although I am aware of some 57critical remarks on the benchmark - I agree with several of them - and 58know some suggestions for improvement, I didn't want to change the 59benchmark into something different from what has become known as 60"Dhrystone"; the confusion generated by such a change would probably 61outweight the benefits. If I were to write a new benchmark program, I 62wouldn't give it the name "Dhrystone" since this denotes the program 63published in [1]. However, I do recognize the need for a larger number 64of representative programs that can be used as benchmarks; users should 65always be encouraged to use more than just one benchmark. 66 67The new versions (version 2.1 for C, Pascal and Ada) will be 68distributed as widely as possible. (Version 2.1 differs from version 692.0 distributed via the UNIX Network Usenet in March 1988 only in a few 70corrections for minor deficiencies found by users of version 2.0.) 71Readers who want to use the benchmark for their own measurements can 72obtain a copy in machine-readable form on floppy disk (MS-DOS or XENIX 73format) from the author. 74 75 76In general, version 2 follows - in the parts that are significant for 77performance measurement, i.e. within the measurement loop - the 78published (Ada) version and the C versions previously distributed. 79Where the versions distributed by Rick Richardson [2] and Reinhold 80Weicker have been different, it follows the version distributed by 81Reinhold Weicker. (However, the differences have been so small that 82their impact on execution time in all likelihood has been negligible.) 83The initialization and UNIX instrumentation part - which had been 84omitted in [1] - follows mostly the ideas of Rick Richardson [2]. 85However, any changes in the initialization part and in the printing of 86the result have no impact on performance measurement since they are 87outside the measaurement loop. As a concession to older compilers, 88names have been made unique within the first 8 characters for the C 89version. 90 91The original publication of Dhrystone did not contain any statements 92for time measurement since they are necessarily system-dependent. 93However, it turned out that it is not enough just to inclose the main 94procedure of Dhrystone in a loop and to measure the execution time. If 95the variables that are computed are not used somehow, there is the 96danger that the compiler considers them as "dead variables" and 97suppresses code generation for a part of the statements. Therefore in 98version 2 all variables of "main" are printed at the end of the 99program. This also permits some plausibility control for correct 100execution of the benchmark. 101 102At several places in the benchmark, code has been added, but only in 103branches that are not executed. The intention is that optimizing 104compilers should be prevented from moving code out of the measurement 105loop, or from removing code altogether. Statements that are executed 106have been changed in very few places only. In these cases, only the 107role of some operands has been changed, and it was made sure that the 108numbers defining the "Dhrystone distribution" (distribution of 109statements, operand types and locality) still hold as much as possible. 110Except for sophisticated optimizing compilers, execution times for 111version 2.1 should be the same as for previous versions. 112 113Because of the self-imposed limitation that the order and distribution 114of the executed statements should not be changed, there are still cases 115where optimizing compilers may not generate code for some statements. 116To a certain degree, this is unavoidable for small synthetic 117benchmarks. Users of the benchmark are advised to check code listings 118whether code is generated for all statements of Dhrystone. 119 120Contrary to the suggestion in the published paper and its realization 121in the versions previously distributed, no attempt has been made to 122subtract the time for the measurement loop overhead. (This calculation 123has proven difficult to implement in a correct way, and its omission 124makes the program simpler.) However, since the loop check is now part 125of the benchmark, this does have an impact - though a very minor one - 126on the distribution statistics which have been updated for this 127version. 128 129 130In this section, all changes are described that affect the measurement 131loop and that are not just renamings of variables. All remarks refer to 132the C version; the other language versions have been updated similarly. 133 134In addition to adding the measurement loop and the printout statements, 135changes have been made at the following places: 136 137o In procedure "main", three statements have been added in the non- 138 executed "then" part of the statement 139 if (Enum_Loc == Func_1 (Ch_Index, 'C')) 140 they are 141 strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING"); 142 Int_2_Loc = Run_Index; 143 Int_Glob = Run_Index; 144 The string assignment prevents movement of the preceding assignment 145 to Str_2_Loc (5'th statement of "main") out of the measurement loop 146 (This probably will not happen for the C version, but it did happen 147 with another language and compiler.) The assignment to Int_2_Loc 148 prevents value propagation for Int_2_Loc, and the assignment to 149 Int_Glob makes the value of Int_Glob possibly dependent from the 150 value of Run_Index. 151 152o In the three arithmetic computations at the end of the measurement 153 loop in "main ", the role of some variables has been exchanged, to 154 prevent the division from just cancelling out the multiplication as 155 it was in [1]. A very smart compiler might have recognized this and 156 suppressed code generation for the division. 157 158o For Proc_2, no code has been changed, but the values of the actual 159 parameter have changed due to changes in "main". 160 161o In Proc_4, the second assignment has been changed from 162 Bool_Loc = Bool_Loc | Bool_Glob; 163 to 164 Bool_Glob = Bool_Loc | Bool_Glob; 165 It now assigns a value to a global variable instead of a local 166 variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not 167 used afterwards. 168 169o In Func_1, the statement 170 Ch_1_Glob = Ch_1_Loc; 171 was added in the non-executed "else" part of the "if" statement, to 172 prevent the suppression of code generation for the assignment to 173 Ch_1_Loc. 174 175o In Func_2, the second character comparison statement has been changed 176 to 177 if (Ch_Loc == 'R') 178 ('R' instead of 'X') because a comparison with 'X' is implied in the 179 preceding "if" statement. 180 181 Also in Func_2, the statement 182 Int_Glob = Int_Loc; 183 has been added in the non-executed part of the last "if" statement, 184 in order to prevent Int_Loc from becoming a dead variable. 185 186o In Func_3, a non-executed "else" part has been added to the "if" 187 statement. While the program would not be incorrect without this 188 "else" part, it is considered bad programming practice if a function 189 can be left without a return value. 190 191 To compensate for this change, the (non-executed) "else" part in the 192 "if" statement of Proc_3 was removed. 193 194The distribution statistics have been changed only by the addition of 195the measurement loop iteration (1 additional statement, 4 additional 196local integer operands) and by the change in Proc_4 (one operand 197changed from local to global). The distribution statistics in the 198comment headers have been updated accordingly. 199 200 201The string operations (string assignment and string comparison) have 202not been changed, to keep the program consistent with the original 203version. 204 205There has been some concern that the string operations are over- 206represented in the program, and that execution time is dominated by 207these operations. This was true in particular when optimizing 208compilers removed too much code in the main part of the program, this 209should have been mitigated in version 2. 210 211It should be noted that this is a language-dependent issue: Dhrystone 212was first published in Ada, and with Ada or Pascal semantics, the time 213spent in the string operations is, at least in all implementations 214known to me, considerably smaller. In Ada and Pascal, assignment and 215comparison of strings are operators defined in the language, and the 216upper bounds of the strings occuring in Dhrystone are part of the type 217information known at compilation time. The compilers can therefore 218generate efficient inline code. In C, string assignemt and comparisons 219are not part of the language, so the string operations must be 220expressed in terms of the C library functions "strcpy" and "strcmp". 221(ANSI C allows an implementation to use inline code for these 222functions.) In addition to the overhead caused by additional function 223calls, these functions are defined for null-terminated strings where 224the length of the strings is not known at compilation time; the 225function has to check every byte for the termination condition (the 226null byte). 227 228Obviously, a C library which includes efficiently coded "strcpy" and 229"strcmp" functions helps to obtain good Dhrystone results. However, I 230don't think that this is unfair since string functions do occur quite 231frequently in real programs (editors, command interpreters, etc.). If 232the strings functions are implemented efficiently, this helps real 233programs as well as benchmark programs. 234 235I admit that the string comparison in Dhrystone terminates later (after 236scanning 20 characters) than most string comparisons in real programs. 237For consistency with the original benchmark, I didn't change the 238program despite this weakness. 239 240 241When Dhrystone is used, the following "ground rules" apply: 242 243o Separate compilation (Ada and C versions) 244 245 As mentioned in [1], Dhrystone was written to reflect actual 246 programming practice in systems programming. The division into 247 several compilation units (5 in the Ada version, 2 in the C version) 248 is intended, as is the distribution of inter-module and intra-module 249 subprogram calls. Although on many systems there will be no 250 difference in execution time to a Dhrystone version where all 251 compilation units are merged into one file, the rule is that separate 252 compilation should be used. The intention is that real programming 253 practice, where programs consist of several independently compiled 254 units, should be reflected. This also has implies that the compiler, 255 while compiling one unit, has no information about the use of 256 variables, register allocation etc. occuring in other compilation 257 units. Although in real life compilation units will probably be 258 larger, the intention is that these effects of separate compilation 259 are modeled in Dhrystone. 260 261 A few language systems have post-linkage optimization available 262 (e.g., final register allocation is performed after linkage). This 263 is a borderline case: Post-linkage optimization involves additional 264 program preparation time (although not as much as compilation in one 265 unit) which may prevent its general use in practical programming. I 266 think that since it defeats the intentions given above, it should not 267 be used for Dhrystone. 268 269 Unfortunately, ISO/ANSI Pascal does not contain language features for 270 separate compilation. Although most commercial Pascal compilers 271 provide separate compilation in some way, we cannot use it for 272 Dhrystone since such a version would not be portable. Therefore, no 273 attempt has been made to provide a Pascal version with several 274 compilation units. 275 276o No procedure merging 277 278 Although Dhrystone contains some very short procedures where 279 execution would benefit from procedure merging (inlining, macro 280 expansion of procedures), procedure merging is not to be used. The 281 reason is that the percentage of procedure and function calls is part 282 of the "Dhrystone distribution" of statements contained in [1]. This 283 restriction does not hold for the string functions of the C version 284 since ANSI C allows an implementation to use inline code for these 285 functions. 286 287 288 289o Other optimizations are allowed, but they should be indicated 290 291 It is often hard to draw an exact line between "normal code 292 generation" and "optimization" in compilers: Some compilers perform 293 operations by default that are invoked in other compilers only when 294 optimization is explicitly requested. Also, we cannot avoid that in 295 benchmarking people try to achieve results that look as good as 296 possible. Therefore, optimizations performed by compilers - other 297 than those listed above - are not forbidden when Dhrystone execution 298 times are measured. Dhrystone is not intended to be non-optimizable 299 but is intended to be similarly optimizable as normal programs. For 300 example, there are several places in Dhrystone where performance 301 benefits from optimizations like common subexpression elimination, 302 value propagation etc., but normal programs usually also benefit from 303 these optimizations. Therefore, no effort was made to artificially 304 prevent such optimizations. However, measurement reports should 305 indicate which compiler optimization levels have been used, and 306 reporting results with different levels of compiler optimization for 307 the same hardware is encouraged. 308 309o Default results are those without "register" declarations (C version) 310 311 When Dhrystone results are quoted without additional qualification, 312 they should be understood as results obtained without use of the 313 "register" attribute. Good compilers should be able to make good use 314 of registers even without explicit register declarations ([3], p. 315 193). 316 317Of course, for experimental purposes, post-linkage optimization, 318procedure merging and/or compilation in one unit can be done to 319determine their effects. However, Dhrystone numbers obtained under 320these conditions should be explicitly marked as such; "normal" 321Dhrystone results should be understood as results obtained following 322the ground rules listed above. 323 324In any case, for serious performance evaluation, users are advised to 325ask for code listings and to check them carefully. In this way, when 326results for different systems are compared, the reader can get a 327feeling how much performance difference is due to compiler optimization 328and how much is due to hardware speed. 329 330 331The C version 2.1 of Dhrystone has been developed in cooperation with 332Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the 333"Version 1.1" distributed previously by him over the UNIX network 334Usenet. Through his activity with Usenet, Rick Richardson has made a 335very valuable contribution to the dissemination of the benchmark. I 336also thank Chaim Benedelac (National Semiconductor), David Ditzel 337(SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael 338Saavedra-Barrera (UC at Berkeley) for their help with comments on 339earlier versions of the benchmark. 340 341 342[1] 343 Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming 344 Benchmark. 345 Communications of the ACM 27, 10 (Oct. 1984), 1013-1030 346 347[2] 348 Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text) 349 Informal Distribution via "Usenet", Last Version Known to me: Sept. 350 21, 1987 351 352[3] 353 Brian W. Kernighan and Dennis M. Ritchie: The C Programming 354 Language. 355 Prentice-Hall, Englewood Cliffs (NJ) 1978 356 357 358 359 360 361