ECMS 2007 Paper Guidelines

ECMS 2009

9th International Workshop on Electronics, Control, Modelling, Measurement and Signals 2009

8-9-10 July, 2009, University of Mondragon, Spain

------

Perceived QoS Analysis of MPEG4-AAC Audio Delivery

In Railway Communications

ECMS 2009

9th International Workshop on Electronics, Control, Modelling, Measurement and Signals 2009

8-9-10 July, 2009, University of Mondragon, Spain

------

J. Casasempere1, P. Sanchez 2,I. Unanue 1, J. Del Ser 1

1 TECNALIA-ROBOTIKER, 48170 Zamudio, Spain

2 IKUSI – Angel Iglesias S.A., 2009 San Sebastian, Spain

{jcasasempere,iunanue,jdelser}@robotiker.es,

ECMS 2009

9th International Workshop on Electronics, Control, Modelling, Measurement and Signals 2009

8-9-10 July, 2009, University of Mondragon, Spain

------

Abstract:This work considers the transmission of audio streams over railway broadband networks using the AAC coding standard family. We propose a selective audio compression mechanism according to the wireless network conditions for heterogeneous media delivery, which allows an optimization of the received audio contents and an enhancement of the user quality experience in such communication networks.In this paper an audio evaluation platform combining NS-2 and real audio coding has been designed in order to evaluate the performance of different codecs (namely, AAC-LC, AAC-HEv1, AAC-HEv2, MP3 and OGG) over 802.16e railway networks.Extensive simulations have been done so as to compare the quality of service (QoS) of our scheme with other audio coding techniques in terms of jitter, delay, frame loss rate or quality (using the PEAQobjective QoS evaluation algorithm). The obtained results show how the AAC codec family outperforms other existing standards (e.g. MP3 or OGG) when used in railway communications subject to handover events.

INTRODUCTION

Nowadays, the interest in multimedia content broadcasting over wireless broadband networks is rising sharply within both industry and academia. In such scenarios, providing high coding efficiency and low latency allows to optimize the bandwidth usage and storage occupancy, as already foreseen by a wide range of applications based on low-rate digital audio transmission [1]. Focusing on this last example, the search for low-bitrates digital audio coding schemes has originated a bunch of proposals providing high efficiency compression through the implementation of advanced signal processing techniques. The intense efforts of the ISO standardization committee towards this goal have given birth to the relatively recent AAC lossless compression schemes [2].

Moreover, unpredictable wireless environments such as those existing in dynamic 802.16e (Mobile WiMAX) networks require the design and implementation of adaptive techniques aimed at matching the parameters defining the content delivery to the conditions of the radio channel. As a matter of fact, railway transport networks are among those scenarios where 802.16e-based devices outperform other wireless technologies (e.g., 802.11 and satellite systems) in terms of bandwidth or coverage area. The need for further investigation on this topic has been noticed by the Spanish Government, who has provided funding support to several areas of this research through the approval of the TelMAX Project [3] under the CENIT Program (Centro para el Desarrollo Tecnologico e Industrial).

Our manuscript will shed some light on the digital audio delivery through 802.16e railway networks by analyzing in depth the perceived Quality of Service (QoS) of the AAC codec family in terms of objective audio quality measurements (e.g. ODG indicator per audio sample based on the PEAQ algorithm [4,5]). A further contribution focuses on the design and simulation of an AAC-based audio coding framework that adapts the characteristics of the compressed audio content as a function of the network status, packet loss rate and/or delay, hence attaining an enhanced audio quality for several representative audio samples. Our paper concludes by studying the degradation of the audio quality when transmitting AAC audio content in handover scenarios, where the effect of the network reconfiguration over the defined audio quality metrics will be under analysis.

The remainder of this paper is organized as follow: in Section II the audio simulation and evaluation platform used for our study is presented, whereas in Section III we discuss the simulation results. Finally, concluding remarks are drawn in Section IV.

PROPOSED SYSTEM

System Arquitecture

The system architecture is depicted in Figure 1, where ground to ground (G2G), ground to vehicle (G2V) and vehicle to vehicle (V2V) communications are under consideration. Network users are placed in train cabins, platforms and headquarters, all equipped with multimedia devices capable of both capturing and receiving audio streams.

Fig. 1: System architecture.

The network model utilized in our analysis consists of up to 30 802.16e mobile nodes (MN) in a square area of 3000x3000 meters. The MNs are assigned to a Base Station (BS) connected to the railway headquarters and the audio transmission is placed in-between.The MNs are able to establish audio communications with command centres via the associated BS. In the wired side of the network, BS and command centres are interconnected through a 100Mbps interconnection with 1 ms propagation delay. The mobile nodes follow a random movement pattern with variable speed, hence mimicing a random train trajectory.

Sinceour work considers the parallel transmission of audio and video contents, in the platform it is assumed that a mobile node transmits a compressed audio stream, while other interfering nodes send UDP CBR (Constant Bit Rate) traffic at a rate of 1 Mbps. The MTU (Maximum Transmission Unit) was set to 1472 bytes. The audio stream starts at a pre-configured time after the simulation begins, whereas the other parallel UDP CBR transmissions start randomly within the audio transmission.

Audio transmission and network simulations have been implemented using the Evalvid tool [6] integrated within the NS-2 simulation platform. The simulator was modified in order to support the IEEE 802.16e standard. The communication system is implemented based on the IEEE 802.16 standard (802.16-2004) and the mobility extension 80216e-2005 [7] to provide wireless broadband communication between railway mobile and fixed nodes. The utilized 802.16e modulation scheme is 16-QAM OFDM with overall rate 3/4 in the 3.5 GHz band with a channel bandwidth of 9.6 MHz.Time Division Duplexing (TDD) with Point to Multipoint (PMP) operation is used.Fragmentation and reassembly of frames is enabled. IEEE 802.16e extensions are included in order to support scanning and handovers.

Audio Quality Evaluation

In the last decade different approaches have been proposed in order to evaluate and quantify the quality of a given compressedaudio sample. Empirical listening experiments are still recognized as the most reliable method of quality assessment, but unfortunately theyentailobvious disadvantages mainly related to the test conditions (i.e. expert listeners, optimal conditions and strict experimental procedures). In this context, the two main methods for subjective quality evaluations are defined in recommendations ITU-R BS.1116[8]and ITU-R BS.153 [9].

An alternate method to measure the quality of an audio communication is the use of an objective quality evaluation metric such as the ODG (Objective Difference Grade)indicator. The Objective Difference Grade is based on the PEAQ algorithm [10], which compares a signal that has been processed with the corresponding original signal. Both signals are transformed into a time-frequency representation by a psychoacoustic model, and a task-specific model of auditory cognition reduces these data to a number of Model Output Variables (MOV). The objective of such MOV’s is to predict the subjective quality rating assigned to the processed signal in an ITU-R BS.1116 based listening test. Finally, these values are mapped to a five-grade impairment scale where 0.0stands for imperceptible impairments, -1.0 perceptible but not annoying, -2.0 corresponds to slightly annoying impairments, -3.0 is annoying and -4.0 to very annoying impairments.

In order to analyze the observed communication quality in a railway environment, objective audio measurements have been performed, which yield ODG values in different received audio transmissions between the command centre and the MN´s. Thanks to the obtained results it is possible to identify the performance of the current audio compression algorithms, registering the set of compression parameters that renders the optimum performance for a certain type and characteristics of a given audio sample.

Audio Content

Three different audio samples have been selected for representing typical audio and voice communications in railway transport systems. The first sample file represents classical music (without voice, i.e. mimicing piped music), whereas the second sample corresponds to the sound of an ambulance siren. Finally, the third processed sample is a fragment of human speech (including silence periods). For all the utilized samples, the duration (3 seconds), sample rate (48 ksamples per second), resolution (16 bits per sample) and number of channels (2) give rise to a stored file size of 562.5 kBytes per audio sample.

In order to compare the performance of the actual compression mechanisms, the audio sequences have been compressed under different coding standards. On one hand, MP3 and OGG have been selected to represent the most common methods used heretofore. On the other hand, our study also incorporates the family of AAC compressors (including different evolutions of the standard such as AAC-LC, AAC-HEv1 and AAC-HEv2standing for “aac low complexity”, “aacPlus v1” and “aacPlus v2”, respectively), since this codec family has spurred an intensive research activity during the last year[11].The far enhanced compression rates offered by this family of codecs derive from the introduction of challenging advanced signal processing techniques such as the Spectral Band Replication (SBR) and the Parametric Stereo (PS).From a practical standpoint, the Nero codec [12] has been employed for performing AAC compression. In order to obtain variable compression rates and consequently, different compressed bitrates of the audio content, the Nero codecrequires the specification of a quality parameter (hereafter denoted as Q). The value of Q can be arbitrarily drawn from the [0,1] range, where values closer to 0 represent lower bitrates and values near 1 higher bitrates.

As explained before, the MP3 and OGG codecshave also been evaluated in our platform. Both codingschemes have been implemented by means of the ffmpeg linux codec [13], which is basically an opensource tool composed by a wide range library of audio and video codecs.

SIMULATION RESULTS AND ANALYSIS

Audio Compression at Variable Bitrates

In order to evaluate the audio quality with each one of the aforementioned codecs, transmissions of audio streams have been simulated over the railway network model presented in Section II, where railway users exchange audio data while moving through the network.

Having said this, Figure 2 plots the objective quality level in terms of the ODG metric for the AAC codec family and the other considered OGG and MP3 codecs. This Figure permits to analyze the performance of each codec as a function of the compressed bitrate. In this context, it should be pointed out that, at the time of performing the compression, neither the OGG nor the MP3 codec were able to span compression rates below 32 kbps, which leads to a significant limitation if the scope of our application focuses on extremely low audio bitrates.

Fig. 2: ODG level variation per bitrate with different audio codecs (music sample).

Alsoobserve that two of theAAC standard codec family (namely, those labelled with HEv1 and HEv2) offer a better ODG quality metric at low bitrates as opposed to higher bitrates (>80kbps), where the AAC-LC and OGG codecs outperform them. In summary, what we conclude from the above figure is that we should select the most suitable AAC codec as a function of the the target compressed bitrate. Based on this rationale, we propose to adaptively choose one of such AAC codecs depending on the network statistics (state of the wireless link). This is further ellaborated in the next subsection.

Proposed Adaptive Audio Coding System

Basedon the ODG behavior of each AAC codec as a function of the compressed bitrate, different ranges have been identified where the performance of a given codecsurpasses that of the rest of the codecs. In this context, for low bitrates below 32kbps the AAC-HEv2 codec represents as the best encoding algorithm. In the range from 32to 80 kbps the AAC-HEv1codec perform as the best encoding scheme. Finally, for bitrates higher than 80kbps, the selected encoding system will be the basic version of AAC, i.e. AAC-LC.

The proposed selective audio compression enables the opportunity to optimize the audio transmission according to the network state thanks to the proper selection of the audio codec. If the network traffic is not saturated, the mechanism providing the best audio quality (in this case AAC-LC) will be selected due to the good transmit conditions to forward the excess of generated audio data. In the case of wireless link failures or bandwidth reduction, a drammatic decrease of the audio data rate should be desirable, and hence low bitrate coding schemes are more suitable under such constraints.There lies the reason why the selected codec would be the one presenting the best possible quality with the lower bitrate.

Fig. 3: ODG level per audio sample with the proposed scheme.

With the aim of illustrating the quality level associated to the proposed coding selection mechanisms, Figure3 represents the obtained quality level for the 3 different kinds of audio files in each of the bitstream ranges. Comparing the different ODG values of the audio samples, the best obtained results are decidedly for music audio and the worst for human voice. The special behavior of the human voice could be the fact that most of its frecuence components has a value below 4 KHz.The case of the human voice has been specially treated by voice processors called vocoders and nowadays acceptable audio qualities are obtained at very low bitrates, therefore the proposed system could work altogether with them for that kind of communications.

Network Evaluation

In what relates to the network performance metrics of the considered audio codecs,delay and jitter of the packets associated to the communicated audio stream have been measured in our platform. Each of the AAC codecs aredifferently encapsulated in MP4,which results in different packet sizes. AAC-HEv2 represents the lower encoded data to be transmitted. However, its average data packet size is higher than in AAC-HEv1. This leads to a shorter transmision time but a higher transmition delay and jitter as represented in Table 1. As can be observed, the registered values for both the delay and the jitter are kept low and approximately constant until the number of interfering nodes exceeds 20.

TABLE 1. Delay and jitter values for variable number of nodes (music audio sample).

Delay / Jitter
Codec/Nodes / 5 / 10 / 20 / 30 / 5 / 10 / 20 / 30
AAC-HEv1 / 4.5 / 4.8 / 5.5 / 14.3 / 0.9 / 1.4 / 1.7 / 6.3
AAC-HEv2 / 4.9 / 5.2 / 6 / 14.7 / 0.8 / 1.4 / 1.4 / 8.8
AAC-LC / 4.6 / 5.1 / 6 / 15.6 / 0.4 / 0.8 / 1.2 / 10.3

Handover Audio Communication Evaluation

In railways scenarios, another important point that must be taken into account is the impact of lost frames over audio quality during handover processes. Focused on evaluatingthese effects, the considered audio sequences (human voice, music and siren) are transmitted while the MN moves between BS´s. The results of the extensive simulations for a medium quality compression (Q=0,55) are presented in Table 2 under variable handover thresholds Th1= 0,33, Th2= 0,45 and Th3= 0,56 seconds. Please note that the average ODG value decreases with a higher value of the handover period for all the tested audio samples.

TABLE 2. Average ODG values for each audio sequence inpresence of handover.

Th1 / Th2 / Th3
Music / -0.92 / -1.42 / -2.09
Siren / -1.68 / -1.74 / -1.78
Human Voice / -1.53 / -2.33 / -2.6

The loss of frames in the case of human speech seems to be really annoying for listeners when compared with the results obtained for the music audio sample. On the contrary, the ODG value of siren is approximately constant disregarding the different handover thresholds simulated. As detailed in [5], the ODG metric is based on both a psychoacoustic and auditory cognition model, which can affect distinctly to the processed audio signal depending on its spectrum and statistical characteristics.

Fig. 4: Measured evolution of the ODG metric as a function of the handover duration (music sample).

Figure 4 represents the impact of different handover times in the music samplecompressed through theAAC-LC codec. Similarly to previous studies focused on MPEG-H.264 video communications [15,16], notice that the quality of audio stream recovers after a certain temporal delay from the end of the handover process. The rationale behind this delay lies on the post-maskering procedures involved in the ODG metrics, which implies a post-filtering of the received signal before sample-wise evaluating the aforementioned metric. This constrasts to what occurs with MPEG-H.264 video communications, where the MPEG frame encoding algorithm (i.e. I, P and B frames) implies a temporal dependence among consecutive information units within the media bitstream [15,16].

CONCLUDING REMARKS

This paper has analyzed the behaviour of five different audio codecs (AAC-HEv1, AAC-HEv2, AAC-LC, MP3 and OGG) in terms of generated audio quality by encoding three different audio samples at different bitrates. To that end, an audio evaluation platform has been implemented in order to model audio broadcasting through railway scenarios. The simulation results have proved that the lossy AAC compression algorithms outperform other conventional codecs (MP3 and OGG) in terms of objective user quality of service. Based on such simulation results, a step further is taken in this contribution by proposing a novel selection procedure for the audio compressioncodec where the utilized audio codec varies in accordance with the status and conditions of the wireless network. In our system, the selected audio codecrenders the bestaudio quality at a target bitrate range.

Another interesting result of our work hinges on the impact of the handover duration in the average ODG value. Generally speaking, it should be clear that the average ODG value decreases as the handover duration increases. For the same handover threshold, each of the tested audio samples present different average ODG values. Our manuscript concludes that such ODG behavior is due to the underlying metric computation algorithm, which tries to predict the subjective quality rating that a conventional ITU-R BS.1116 based empirical listening test would assign to each corrupted audio sample.