C Compare vs. PPC ASM Compare Function

Genesi

MySQL®Partner

Oct 2005

Table of Contents

1Objective

2Executive Summary

3Setup

4Results

4.1PowerPC Tests

4.2Comparing Different Systems

List of Figures

Figure 1. PowerPC G5 at 1.80 GHz

Figure 2. PowerPC at 1.4 GHz

Figure 3. PowerPC G3 at 600 MHz

Figure 4. Memory-Bound Routines

Figure 5. Cache-Bound Routines

1Objective

Benchmark tests were run to determine the relative effectiveness of the following:

  • C Function.
  • PPC ASM Function.

No single benchmark can assessperformance from all angles; so more benchmark results will be delivered in the near future, generated by different test tools.

2Executive Summary

For all tests, GCC was used.

The tests indicated that the Scalar PPC function was able to improve performance on all tested PPC platforms.

The G3 results do not improve much on longer strings. This is caused by the small 2nd level cache.

The G5 results show a very good memory bandwidth.

3Setup

The test environment was as follows:

  • The tests were performed on sorting arrays of 10,000 elements and arrays of 1,000 elements.
  • The tests were performed on strings of one to 255 characters in length.

4Results

4.1PowerPC Tests

The compare performance scales in a somewhat linear proportion to the length of the string. For this reason, tests were performed on three typical length values, as follows:

  • 8 characters.
  • 32 characters.
  • 120 characters.

Figure 1 illustrates the throughput of normal vs. optimized function on PowerPC G5 at 1.80 GHz.

Figure 1. PowerPC G5 at 1.80 GHz
Figure 2 shows throughput of normal vs. optimized function on PowerPC at 1.4 GHz.

Figure 2. PowerPC at 1.4 GHz
Figure 3 illustrates the throughput of normal vs. optimized function on PowerPC G3 at 600 MHz.

Figure 3. PowerPC G3 at 600 MHz

Quick analysis of the results:

  • The ASM function improves performance on all PPC systems.
  • The G5 results show a very good memory bandwidth (Figure 1).. This is clearly visible on (orange) bar 3 – working on longer strings in a huge array.
  • The G3 results do not improve much on longer strings (Figure 3). This is caused by the small 2nd level cache.

4.2Comparing Different Systems

The purpose of the test was to show the benefits of even simple optimization of time-critical routines. Figure 4 shows the compare performance needed to sort an array of 10,000 elements (predominately memory-bound routines).

.

Figure 4. Memory-Bound Routines

Figure 5 shows the compare performance needed to sort an array of 1,000 elements (predominately cache-bound routines). The smaller number of elements helps to more effectively use the cache. Thus, most systems improve in this test.

Figure 5. Cache-Bound Routines

Quick analysis of the results:

  • Performance on Itanium and Sparc is bad.
  • The PPC CPUs benefit a lot from the ASM code.
  • The G5 can make good use of its memory bandwidth. This is clearly visible in Figure 4.
  • When working on arrays with a better fit in the cache, G4 and G5 score very similar results.

Copyright © 2005, MySQLABPage 1

C Compare vs. PPC ASM Compare Function