Real-Time Scene Complexity

Real-time Scene Complexity

An empirical study of scene complexity versus performance in real-time computer graphics

Olaf Jansen and Joost van Dongen

July 11th, 2006

Abstract

By performing a large number of tests on different computers the influence of the complexity of the graphics in a game on the frame rate is analyzed. The conclusion is that artists and programmers should keep these guidelines in mind while creating a game:

-keep the polygon-count low;

-keep the vertex-count low;

-keep the number of separate objects/render-calls low;

-mip-mapping dramatically increases the performance of high-resolution textures;

-as long as mip-mapping is used and all textures fit in memory, the texture resolution hardly influences the performance;

-bilinear filtering is almost for free, but trilinear filtering has a significant cost;

-multi-passing is quite expensive and should be used as little as possible;

-multi-texturing is much less expensive than multi-passing;

-alpha-blending is very expensive if there is a lot of object-overlap;

-environment mapping using a cube-map or spherical map is hardly more expensive than a standard 2D UV-mapped texture;

-vertex and pixel shaders are not expensive, only the complexity of a specific shadermight make it expensive.

This report is part of our Computer Science Master in Game & Media Technology at Utrecht University.We would like to thank dr. R.W. van Oostrum for being our supervisor, and prof.dr. M.H. Overmars and Jeroen van Mastrigt for makea project together with game design students from the UtrechtSchool of the Arts possible. We would also like to thank Martien Jansen, Jasper Koning, Koen Pater, Merel Rietbergen and Igor Zinkenfor running our tests on their computers, and the rest of the project team of The Blob, with whom we worked together for four months. These are: Fabian Akker, Gijs Hermans, Jasper Koning, Fahrang Namdar, Ralph Rademakers, Huub van Summeren and David Vink.
Contents

Introduction4
Methodology5
Test results8

Polygon Count8
Vertex to Face Ratio9
Object Count12
Object Copies13
Texture Count13
Texture Resolution14
Texture Filtering16
Texture Compression16
Textures per Object18
Number of Passes19
Texture Stretch (Resolution)20
Texture Stretch (Mapping)22
Camera Distance23
Alpha Blending24
Environment Map26
Texture Atlas27
Render textures28
Fixed Function versus Shader29

Reflection31
Conclusion32

References33

Appendix A: Testcomputers34

Appendix B: The settings of the test-application35

1. Introduction

Performance is an important issue in real-time computer graphics: when frames cannot be rendered fast enough, the application is no longer real-time. Performance is mostly determined by two factors; the first being the hardware (e.g. graphics processor) in use and the second the complexity of the scene that is to be rendered. These two factors are of course interrelated: a more complex scene requires more powerful hardware to achieve real-time performance.

This gives rise to several interesting questions: What factors determine the ‘complexity’ of a scene? Which of these have a greater influence on performance than others, and how do these factors scale up? Before we attempt to answer these questions, let us look at the motivation for finding their answers.

In the past four months, we have implemented a three-dimensional computer game together with a team of (graphical) artists. During the development of this game, we have utilized an existing graphics engine on top of which we have built the game itself. Right from the get-go, it was clear to us that the artists that created the scene and assets used within it required some knowledge of the target complexity of each asset in the game: the last thing you want when developing a real-time application is discovering that the completed scene and assets are much too complex to be rendered in real-time. In order to provide the team with some form of asset complexity guidelines, it is crucial to be able to (roughly) estimate the performance cost incurred by specific factors (like triangle count, amount of textures and texture resolution).

To this end we have conducted an experiment on performance analysis based on our experiences in the game development team. It is important to note that this analysis is not intended to be used as a definitive guide on performance versus asset or scene complexity, but more to be used to provide some indication of how ‘expensive’ (in terms of performance) certain factors are. It is our intention to give programmers some experimental data they can use to formulate some form of asset complexity guidelines.

2. Methodology
Testing the influence on the frame rate of the parameters we are interested in is not an easy task. There are many pitfalls that can render the tests useless. It might happen that other parameters clutter the resulting data, or that the hardware has changed the parameters to increase its performance. Even if the designs for the tests are good, it may take an infeasible amount of work to actually implement them. In this section we will analyze the problems that may arise during testing and explain our methodology. Finally, we will also discuss the weak points that we were not able to solve and that should be kept in mind while analyzing the actual test results.
It is hard to test only the influence one specific parameter has on the performance of the video hardware, because the parameter must always exist in a scene. An example of this is testing what the effect of the number of triangles has. It is possible to turn off all the lights, remove all the textures and put no other objects in the scene, but some other parameters cannot be turned off:the triangle may take up any amount of pixels on the screen, it may share its vertices with neighboring triangles or not, and all the triangles may be part of one big object or not. All these parameters may strongly influence the results of the test. If for instance the chosen screen resolution is very low, then drawing pixels takes little time, while setting the mesh up for drawing is not influenced by the resolution. So changing the screenresolution only changes the time one part of the drawing of polygons takes and will therefore give different results.

To solve this problem we tried to create many tests, testing parameters in different environments. An example of this is that we tested the influence of texture filtering on different texture resolutions and with mip-mapping turned on and off. Also, we tried choosing the variables so that everything remains the same, except for one variable. This did not work out in all cases, because sometimes the variables happened to change something else as well, rendering our test useless. In many other cases however, we have found interesting test results.

The method we used to test the influence of a parameter on the performance is by creating a standard scene and then having the GPU draw it over and over again with different settings. First we let it draw 500 frames with one setting for the variable, then with another, and so forth. By logging the time it took to render 500 frames, we can see the change in performance while the parameter changes. One thing that could clutter our results is the loading of the scene. For this reason we log frame 100 to 600 and ignore the first 100 frames. This way the scene has already been loaded and caches have been filled before our test starts.

For easier visualization and analysis of test results, we have created several graphs from the results. Because we are not trying to benchmark specific hardware but want to get an idea of the behavior of parameters in general, we tested on different hardware setups and averaged the results. We tested on seven different computers, of which three had an ATI GPU, three an NVIDIA GPU and one an onboard IntelGPU. The exact specification of these computers can be fount in the appendix. With these chipsets from different developers it is tempting to test which is faster. However, this was not our goal and would also be an incorrect thing to do. We deliberately used GPU’s from different years, so if the hardware from one company is faster, this is probably because we used newer or more expensive hardware from the vendor, not because its hardware GPU’s are actually better. What is interesting, though, is to see whether these cards have different behavior with changing parameters. For this reason, we made all graphs three times: one graph of the average of the NVIDIA cards, one graph of the average of the ATI cards and one graph for the total average. In most cases only the last graph is interesting, but we did get some interesting results from the separate graphs.

Testing how long it takes to render 500 frames is not without risk. While performing a test, the virus scanner may decide it needs to do something in the background, lowering the frame rate, or any other process may take over part of the CPU. To find out what precision we can claim for the tests, we made an extra test that does not look for the influence of any variable. It consists of one scene that was run eight times with exactly the same settings. We did this for three different scenes to gain more certainty. This showed that on the ATI and NVIDIA hardware, the difference in time between runs of the same test was usually 0% to 5%. In some very rare cases, the highest and lowest outcome showed a difference of 10%. The conclusion of this is that our tests are fairly precise, but not totally and that for this reason small differences in values must be ignored when drawing conclusions.

Run / PC 1 / PC 2 / PC 3 / PC 4 / PC 5 / PC 6 / PC 7
1 / 6.08402 / 4.716 / 2.63101 / 4.83299 / 2.46901 / 7.26898 / 25.5691
2 / 6.08302 / 4.715 / 2.63401 / 4.83199 / 2.47901 / 7.26698 / 14.215
3 / 6.07802 / 4.715 / 2.63401 / 4.83199 / 2.47301 / 7.27198 / 14.831
4 / 6.07702 / 4.715 / 2.63301 / 4.83599 / 2.47001 / 7.27598 / 21.625
5 / 6.08202 / 4.714 / 2.63201 / 4.83099 / 2.47101 / 7.27298 / 26.3821
6 / 6.08902 / 4.715 / 2.63101 / 4.83099 / 2.47101 / 7.27198 / 26.4541
7 / 6.08902 / 4.715 / 2.63301 / 4.83199 / 2.47301 / 7.27198 / 26.5471
8 / 6.07502 / 4.715 / 2.63201 / 4.83199 / 2.47301 / 7.27298 / 26.1071
Max - Min / 0.014 / 0.002 / 0.003 / 0.005 / 0.01 / 0.009 / 12.3321

Each computer ran the same test eight times. Visible here is that PC 7 had such variable results that they were not useful for our results.

The only card to give a surprising result here was the Intel onboard laptop GPU. This card showed that the time it takes to perform a test might actually double from the lowest to the highest result. As this was not accidental and all of these results showed great variance in outcome, we did draw any conclusions from the results of this GPU and left it out of the analysis in the next chapter.

To test many different variables in different settings, we needed to create lots of test-scenes. It is not feasible to this by hand, because this would require at least five minutes per test, which is a very optimistic estimate. We ended up creating 493 different tests, so doing this by hand was simply not an option.

The alternative we have chosen is an application that generates a scene according to the parameters the user chooses on start-up. The resulting scene is a collection of tubes with arbitrary polygon counts and materials. Below is an example of what this might look like. Out application uses the open source 3D-engine OGRE [1].

An example of a scene generated by our test-application. This specific curve consists of four objects, all with the same texture.
The total number of parameters that can be set in the test-application is 24. An explanation of each of these settings can be found in the appendix. Meshes are generated by creating an extrusion of a circle over a Bezier curve. This method was chosen because it resembles an actual shape and is therefore also more like the type of meshes in a game than a random polygon soup without any logical topology.
To calculate the averages of the tests on different computers we also created a small merge application that receives the output of several tests and outputs the averaged values in a format that is suitable for Microsoft Excel, which we used to draw the graphs. This mergetool was not the only technique we used to be able to do as many tests as we did. Another one was that we created abatchfile that automatically writes the specification of the test-computer to a file and then runs all the 493 different tests. To get the full results from any computer, we did not have to do anything more than download our application, start the batch file and copy the resulting textfiles with the results afterwards.
3. Test results

We have performed a number of different tests. This chapter presents the results from each test, along with an analysis and discussion of those results. Where possible, performance guidelines are derived.

Polygon Count

When considering asset complexity, polygon count is usually one of the first things that come to mind. The polygon count is, as the name implies, the total amount of polygons that are rendered in a single frame. When suffering from performance problems, reducing the polygon count is a ‘classic’ method of increasing performance.

In the polygon count test, the same scene is rendered several times with an increasing polygon count. The increase in polygon count is achieved by simply subdividing the polygons of the objects in the scene. Because polygons are subdivided into smaller parts, the number of pixels that are drawn remains roughly the same. This has two advantages: the first is that we are implicitly testing fill rate here and really get results for the number of polygons. The second is that in general in games the choice is to make the same object with more or less polygons, not to make it bigger, so this makes our test more relevant for the real-life situation.

As can be seen in the preceding chart, the performance under increasing polygon count remains almost the same up to around 25600 polygons. Further increasing the polygon count appears to have a roughlylinear performance cost (note the logarithmic horizontal and vertical scale). That the time remains almost constant with low polygon rates is likely to be caused by the render setup cost dominating the cost of actually processing vertices and rendering polygons when there are few polygons.

Since the relation appears linear, it is important to minimize the amount of polygons to be rendered. One method that can be used for this is creating a high-resolution normal map from a very detailed mesh and applying this normal map to a less detailed (i.e. low polygon count) version of the mesh at runtime. Other well-known methods are level-of-detail to decrease the polygon count of objects that are far off and even replacing distant objects or groups of objects by sprites.

Vertex to Face Ratio

Graphical artists are often told to ‘watch the polycount’ as a reminder to keep the amount of triangles in check. On the other hand, artists are almost never warned to keep the vertex count low. Unlike what might be expected, vertex count and triangle count are not by definition directly related: each triangle may have three unique vertices of its own, or each vertex can be used by many different triangles. The maximum triangle to vertex ratio is 1:3, but in game-models it will often be more like 1:1 and can in theory be decreased to any ratio.

Decreasing the vertex-count might seem like an oddity, because it is not practically possible to create the same model with any triangle to vertex ratio. However, the ratio can be decreased dramatically by using smooth shading on all vertices or flat shading. GPU’s cannot handle multiple normals on a single vertex, so for flat shading each vertex is split into several vertices, one for each different normal. So in fact, the number of vertices might be tripled by the use of flat shading instead of smooth shading, making it interesting to see whether this influences performance.

To the left flat shading, where each vertex has three normals, bringing the actual vertex count in the GPU to 24 for this box. To the right smooth shading, where each vertex has only one normal and the actual vertex count is 8.

This test allows us to see how the amount of vertices influences performance under a fixed triangle count. The test has been repeated three times, each with different triangle counts (20k, 240k and 800k triangles, respectively). The test has also been performed with and without lighting, to see what kind of influence lighting has on the results, which is interesting because gouraud shading is calculated on the vertices.

These tests clearly show that the vertex count is very important: the vertex count can influence performance greatly. Remember that the amount of polygons is the same, so most artists would not know the difference.

This result is as expected: lighting is calculated per vertex, so the test with lighting is more expensive than the test without lighting.