Evaluating MPEG-4 Video Decoding Complexity

João Valentim, Paulo Nunes, Fernando Pereira

Instituto Superior Técnico (IST) – Instituto de Telecomunicações

Av. Rovisco Pais, 1049-001 Lisboa, Portugal

Phone: +351 21 841 84 60; Fax: +351 21 841 84 72

e-mail: {joao.valentim, paulo.nunes, fernando.pereira}@lx.it.pt

Abstract

When using the MPEG-4 standard, the several video objects composing a scene may vary in size along time and may be encoded at different temporal rates using different macroblock (MB) coding types. To limit the decoding complexity of the corresponding bitstreams, it is then necessary to put some limits on the variability of the number and type of MB/s as well as on the bitrate and on the picture memory required to store the decoded data.

This paper evaluates the decoding complexity of the several MB types used in MPEG-4 video coding, by using statistics of the MB decoding times obtained with an optimized MPEG-4 video decoder. Based on these statistics, this paper proposes a set of relative complexity weights for the relevant MB complexity classes, which can be used to improve on the MPEG-4 Video Complexity Verifier (VCV) model [1].

1.Introduction

MPEG-4 is the first object-based audiovisual coding standard. To control the minimum decoding complexity resources required at the decoder, MPEG-4 Visual defines the so-called Video Complexity Verifier [1]. Knowing the relative decoding complexity of the various MB coding types used in MPEG-4 video coding [1] is fundamental for the specification and implementation of a more effective MPEG-4 Video Complexity Verifier (VCV) model since the current MPEG-4 VCV model only distinguishes between boundary and non-boundary MBs. This means that opaque and transparent MBs are complexity weighted exactly in the same way. Moreover, the MBs are not distinguished according to the coding tools – shape and texture – used. A more effective complexity model, more adjusted to the real complexity of the encoded data, shall take into account the actual decoding complexity of the different MB types that can be used, which requires credible complexity evaluations.

This paper evaluates the decoding complexity of the different MB types based on their decoding times obtained with an optimized version of the MPEG-4 reference software [2].

2.Decoding Complexity Modeling

The decoding complexity of the encoded data can, in a first approach, be related to the data rate that the decoder has to process, i.e. can be related to the number of MBs per second that the decoder has to decode. However, the computational power required to decode each MB may largely vary due to the many different MB types (e.g. shape information: opaque, transparent and boundary MBs) and coding modes (e.g. texture coding modes: Intra, Inter, Inter4V, etc.). The complexity measure to choose depends on the degree of approximation to the real decoding complexity that is required; however, the closer to the real decoding complexity the model intends to go, the more difficult it is to be generic, since the decoding complexity also depends on implementation issues.

A careful analysis of the problem shows that there are several ways to measure the decoding complexity of the encoded data, associated to the rate of any of the following parameters [3]:

  • Number of MBs
  • Number of MBs per shape type (opaque, boundary, or transparent).
  • Number of MBs per texture and shape coding type (Inter+NoUpdate, Inter4V+InterCAE, etc).
  • Number of arithmetic instructions and memory Read/Write operations.

The decoding complexity model proposed in this paper is based on the number of MBs per coding type (texture and shape), which was found to be the one best representing the major factors determining the actual decoding complexity of the encoded data, while maintaining a certain level of independence regarding the decoder implementation. This means that the MB complexity types to evaluate will be characterized by a combination of shape and texture coding tools.

3.MPEG-4 Macroblock Classification

In the MPEG-4 Visual standard [1], a video object is defined by its texture and shape data. Although video objects can have arbitrary shapes, texture and shape coding relies on a MB structure (1616 pixels), where texture coding as well as motion estimation and compensation tools are similar to those used in the previously available video coding standards.

Texture data can be coded using six different coding modes [1]:

  • Intra – The MB is coded independently from past or future MBs.
  • Inter – The MB is differentially coded, using motion compensation with one motion vector.
  • Intra+Q – Intra MB with a modified quantization step.
  • Inter+Q – Inter MB with a modified quantization step.
  • Inter4V – Inter MB using motion compensation with four motion vectors (one for each 88 luminance block).
  • Skipped – MB with no texture update information to be sent.

Shape data can be coded using seven different coding modes [1]:

  • NoUpdate & MVDS == 0 – The shape information for the current MB is equal to the shape of the corresponding MB in the past prediction Video Object Plane (VOP).
  • NoUpdate & MVDS != 0 – The shape information for the current MB is obtained from the past prediction VOP after motion compensation.
  • Opaque – All shape pixels in the MB belong to the object support.
  • Transparent – None of the shape pixels in the MB belongs to the object support.
  • IntraCAE – The shape is coded using Context-based Arithmetic Encoding (CAE) in Intra mode.
  • InterCAE & MVDS == 0 – The shape is coded using CAE in Inter mode, without motion compensation.
  • InterCAE & MVDS != 0 – The shape is coded using CAE in Inter mode, with motion compensation.

In order to reduce the number of MB coding complexity types, the MB types with similar complexities were grouped in the same class as shown in Table 1. This is the case of Intra and Intra+Q as well as Inter and Inter+Q texture MB coding types, where the quantization step change does not cause a significant complexity difference; the same is true for the shape MB coding types with and without MVDS, since both types need a prediction, although from different past spatial positions.

Table 1 – Relevant texture and shape MB complexity types

Texture information / Shape information
Intra (Intra & Intra+Q) / NoUpdate (MVDS == 0 & MVDS != 0)
Inter (Inter & Inter+Q) / Opaque
Inter4V / Transparent
Skipped / IntraCAE
InterCAE (MVDS == 0 & MVDS != 0)

4.Macroblock Complexity Evaluation

To evaluate the decoding complexity of the various MB complexity types (combination of shape and texture coding tools), it is necessary to establish a complexity criterion, i.e. a complexity measure. In this paper, the proposed measure is the decoding time of each MB type obtained with an optimized version of the MoMuSys decoder [2] for several representative MPEG-4 test sequences and different profile@level combinations.

Figures 1 to 8 show the MB decoding times for the various MB complexity types obtained with the test sequences: Akiyo, Children, Coastguard, News, Stefan and Weather. The decoding times for the MB types that do not have to send any texture encoded data (no DCT coefficients) are presented in histograms, while the decoding times for the MB types with DCT coefficients are presented as a function of the number of DCT coefficients in the MB. In these charts, each dot represents the decoding time of one MB (thousands are there).

As can be seen from these Figures, transparent and skipped MBs take less time to be decoded than the other types of MBs because they do not have any texture or shape information to be decoded. There are however two types of transparent MBs: the MBs that are far away from the object border and the MBs that are next to the object border to which the repetitive padding process has to be applied [1]. This padding process is responsible for the increase in the decoding time, leading to two distinct cases of transparent MBs as shown in Figure 1. Skipped MBs in rectangular objects have approximately the same decoding time of the fastest transparent MBs since their decoding is similar – none of them has texture or shape information to decode (Figure 2). MBs with Skipped texture but from arbitrarily shaped objects take more time to decode due to the shape header decoding. To decode the Skipped MBs with NoUpdate shape (Skipped+NoUpdate), it is necessary to use the past VOP as well as the shape header decoding. For this MB type, there are two distinct decoding cases: macroblocks that are padded and macroblocks without padding (Figure 3). The results show that the decoding time increases when the shape is coded with CAE. In this case, InterCAE MBs take typically more time to be decoded than IntraCAE MBs (Figure 4 and Figure 5, respectively).

Figure 1 – Transparent MBs decoding time

Figure 2 – Skipped MBs decoding time – rectangular objects

Figure 3 – Skipped+NoUpdate MBs decoding time

Figure 4 – Skipped+IntraCAE MBs decoding time

Figure 5 – Skipped+InterCAE MBs decoding time

The decoding time of the macroblocks whose texture is coded with DCT depends on the number of encoded DCT coefficients, increasing linearly with the number of coefficients (Figures 6 to 8). For the same type of shape coding and considering the same number of DCT coefficients, the maximum decoding time increases with the texture coding type in the following order: Intra, Inter, Inter4V. However, the differences are very small and it is difficult to establish a clear relation for the all range of DCT coefficients. For Intra (texture) MBs, there are two distinct cases depending on the use or not of AC prediction for the DCT coefficients; the MBs that use AC prediction take longer time to be decoded.

If the same type of texture coding is considered, the decoding time increases with the shape coding type in the following order: Opaque, NoUpdate, IntraCAE and InterCAE.

Figure 6 – Opaque MBs decoding time

Figure 7 – NoUpdate MBs decoding time

Figure 8 – IntraCAE and InterCAE MBs decoding time

5.Relative Macroblock Complexity Weights

The MPEG-4 VCV model defines, for every profile@level combination, a set of rules and limits to assure than when respected at the encoder, the required decoding computational capacity is always available at the decoder (which also respects the limits set) [1]. The computational capacity is measured in MB per second, and the model defines two buffers, one storing the number of boundary MBs (B-VCV) and another storing all the MBs without distinction (VCV). For each of these buffers, the buffer size and the decoding rate of the buffers is specified for each profile@level combination. The buffer size and the decoding rate are defined in terms of MBs and MB/s, without any differentiation in terms of MB types.

Since the current MPEG-4 VCV model does not distinguish the various MB coding types, this means that the decoder must be able to decode any set of MBs that does not overflow the VCV buffers for the given profile@level, independently of the MB coding type; this implies that the decoder must be prepared to deal with the worst case scenario, i.e. the case where all MBs are from the most complex coding type.

In the previous section, it was shown that the decoding complexity, measured in terms of the decoding time, varies significantly according to the MB coding type and not only according to the boundary and not-boundary distinction as assumed by the MPEG-4 VCV [1]. This section proposes (after measurements) MB complexity weights which should model more effectively the decoding complexity of a MPEG-4 video coded object. Taking into account that the MPEG-4 VCV is implicitly designed for the most complex MB type, the complexity weights must be defined relatively to the most complex MB type in the context of each profile, i.e. the maximum complexity weight is set to 1 for this MB type and all the other weights are relative to this one and thus less than 1. This solution allows the implementation of a “trading system”, where it is possible, for example, to trade one of the most complex MBs by two MBs with half the relative complexity, while still maintaining the bitstream decodable by a compliant decoder, this means without having to require higher decoding resources.

The relative complexity weight for each MB complexity type is obtained as the ratio between the maximum decoding time for the considered type (this is a conservative solution since most of the times the MBs for that type will be less complex) and the higher maximum decoding time from all the MB types relevant for the profile in question: the Inter4V+InterCAE type for profiles with arbitrarily shaped objects and the Inter4V type for profiles only with rectangular objects:

Table 2 shows the maximum decoding times and the decoding time ratio for the various MB complexity types.

There are some MB complexity types whose decoding times, and particularly the maximum decoding times, are very similar. To simplify future alternative VCV models, these MB types were grouped in just one complexity class as shown in Table 3. The relative complexity weight attributed to each class is the weight of the most complex MB type included in that class (again conservative).

Table 2 - Maximum decoding times and decoding time ratios for the various MB complexity types

MB Complexity Type / Maximum time(ms) / Time ratio
Transparent / 0.21 / 0.12
Skipped+Opaque / 0.22 / 0.12
Intra+Opaque / 1.06 / 0.58
Inter+Opaque / 1.04 / 0.57
Inter4V+Opaque / 1.28 / 0.70
Skipped+NoUpdate / 0.38 / 0.21
Intra+NoUpdate / 0.97 / 0.53
Inter+NoUpdate / 1.17 / 0.64
Inter4V+NoUpdate / 1.38 / 0.76
Skipped+IntraCAE / 0.58 / 0.32
Intra+IntraCAE / 1.46 / 0.80
Inter+IntraCAE / 1.60 / 0.88
Inter4V+IntraCAE / 1.77 / 0.97
Skipped+InterCAE / 0.73 / 0.40
Inter+InterCAE / 1.73 / 0.95
Inter4V+InterCAE / 1.82 / 1.00
Skipped (only rect. VO) / 0.16 / 0.13
Intra (only rect. VO) / 1.08 / 0.89
Inter (only rect. VO) / 1.06 / 0.88
Inter4V (only rect. VO) / 1.21 / 1.00

Table 3 – MB decoding complexity classes and corresponding relative complexity weights

MB Class / MB type / Relative weight
C1 / Inter4V+InterCAE
Inter+InterCAE
Inter4V+IntraCAE / 1.00
C2 / Inter+IntraCAE
Intra+IntraCAE / 0.88
C3 / Inter4V+NoUpdate
Inter+NoUpdate
Intra+NoUpdate / 0.77
C4 / Inter4V+Opaque
Inter+Opaque
Intra+Opaque / 0.70
C5 / Skipped+InterCAE / 0.40
C6 / Skipped+IntraCAE / 0.32
C7 / Skipped+NoUpdate / 0.21
C8 / Skipped+Opaque / 0.12
C9 / Transparent / 0.12
C10 / Inter4V (only rect. VO) / 1.00
C11 / Inter (only rect. VO)
Intra (only rect. VO) / 0.89
C12 / Skipped (only rect. VO) / 0.13

The weights presented above have been defined in a rather conservative way, by using the most complex case within each MB complexity class, in order to stay adequate even if there is some variation due to different implementation platforms.

6.Final Remarks

This paper studies the relative decoding complexity of the various MB types used in MPEG-4 video coding [1] and proposes a set of MB complexity classes and associated decoding complexity weights, which can better reflect the actual decoding complexity of the MB types in question.

The obtained decoding complexity weights are very important since they may allow a much better use of the available decoding resources by preventing the over-evaluation of the decoding complexity of certain MB types and thus making possible to encode scenes (for the same decoding resources) which otherwise would be considered too requiring. The efficient use of decoding resources is very important, notably for applications environments where resources are scarce and expensive such as mobile applications.

7.References

[1]ISO/IEC 14496-2: 1999, “Information Technology – Coding of Audio-visual Objects – Part 2: Visual”.

[2]ISO/IEC 14496-5: 1999, “Information Technology – Coding of Audio-visual Objects – Part 5: Reference software”.

[3]P. Nunes, F. Pereira, “MPEG-4 Compliant Video Encoding: Analysis and Rate Control Strategies”, Proceedings of the ASILOMAR 2000 Conference, Pacific Grove – CA, EUA, October 2000.