C Compare vs. PPC ASM Compare Function
Genesi
MySQL®Partner
Oct 2005
Table of Contents
1Objective
2Executive Summary
3Setup
4Results
4.1PowerPC Tests
4.2Comparing Different Systems
List of Figures
Figure 1. PowerPC G5 at 1.80 GHz
Figure 2. PowerPC at 1.4 GHz
Figure 3. PowerPC G3 at 600 MHz
Figure 4. Memory-Bound Routines
Figure 5. Cache-Bound Routines
1Objective
Benchmark tests were run to determine the relative effectiveness of the following:
- C Function.
- PPC ASM Function.
No single benchmark can assessperformance from all angles; so more benchmark results will be delivered in the near future, generated by different test tools.
2Executive Summary
For all tests, GCC was used.
The tests indicated that the Scalar PPC function was able to improve performance on all tested PPC platforms.
The G3 results do not improve much on longer strings. This is caused by the small 2nd level cache.
The G5 results show a very good memory bandwidth.
3Setup
The test environment was as follows:
- The tests were performed on sorting arrays of 10,000 elements and arrays of 1,000 elements.
- The tests were performed on strings of one to 255 characters in length.
4Results
4.1PowerPC Tests
The compare performance scales in a somewhat linear proportion to the length of the string. For this reason, tests were performed on three typical length values, as follows:
- 8 characters.
- 32 characters.
- 120 characters.
Figure 1 illustrates the throughput of normal vs. optimized function on PowerPC G5 at 1.80 GHz.
Figure 1. PowerPC G5 at 1.80 GHz
Figure 2 shows throughput of normal vs. optimized function on PowerPC at 1.4 GHz.
Figure 2. PowerPC at 1.4 GHz
Figure 3 illustrates the throughput of normal vs. optimized function on PowerPC G3 at 600 MHz.
Figure 3. PowerPC G3 at 600 MHz
Quick analysis of the results:
- The ASM function improves performance on all PPC systems.
- The G5 results show a very good memory bandwidth (Figure 1).. This is clearly visible on (orange) bar 3 – working on longer strings in a huge array.
- The G3 results do not improve much on longer strings (Figure 3). This is caused by the small 2nd level cache.
4.2Comparing Different Systems
The purpose of the test was to show the benefits of even simple optimization of time-critical routines. Figure 4 shows the compare performance needed to sort an array of 10,000 elements (predominately memory-bound routines).
.
Figure 4. Memory-Bound Routines
Figure 5 shows the compare performance needed to sort an array of 1,000 elements (predominately cache-bound routines). The smaller number of elements helps to more effectively use the cache. Thus, most systems improve in this test.
Figure 5. Cache-Bound Routines
Quick analysis of the results:
- Performance on Itanium and Sparc is bad.
- The PPC CPUs benefit a lot from the ASM code.
- The G5 can make good use of its memory bandwidth. This is clearly visible in Figure 4.
- When working on arrays with a better fit in the cache, G4 and G5 score very similar results.
Copyright © 2005, MySQLABPage 1
C Compare vs. PPC ASM Compare Function