Princy Sharma, Sachin Singh and R. S. Anand

Comparative Evaluation of Adaptive Algorithms and Supervised NMF Algorithm for Single Channel Speech Enhancement

Department of Electrical Engineering

Indian Institute of Technology Roorkee, Roorkee-247667, Uttarakhand, India

Abstract--- Noise removal is necessary in many applications of signal processing, biomedical and communication. It is also important in applications like telephonic communication and speech recognition. Speech signal enhancement is an important topic in speech processing as corruption of speech due to additive background noise causes severe difficulties in various communication environments. Background noise is a major source of degradation of speech quality. In this paper different adaptive algorithms are applied to remove background noise present in speech signal. The performances of different adaptive algorithms like LMS, NLMS, RLS are analyzed. It is then compared with the supervised speech enhancement method using non negative matrix factorization (NMF) and NMF with sparseness constraint. These algorithms are evaluated under various input signal-to-noise ratio condition.

Keywords─ Adaptive filters, LMS, NLMS, RLS, NMF, speech enhancement

I. Introduction

The most common problem in speech processing is effect of interference or background noise in speech signal. This noise degrades the quality of speech and reduces its intelligibility. The sources of this noise can be traffic crowds, television sound, competing background speech, noise from automobiles, thermally generated noise, etc. It is important to remove the noise to get a better quality speech signal Removal of noise from speech signal has various applications including hearing aids, speech/speaker recognition, and speech communication over telephone and internet [1].

Adaptive filtering is an important area in digital signal processing. In many applications the signal characteristics may change quite fast. This requires application of adaptive algorithms that converge rapidly [2]. If background noise is varying slowly as compared to the speech signal then it is easy to estimate the noise. Various adaptive algorithms like LMS, NLMS and RLS can be used to estimate the unwanted interfering signal and then it is subtracted from the corrupted signal. The resultant signal has improved signal to noise ratio and better intelligibility [3].

A supervised approach using non-negative matrix factorization can also be used for speech enhancement [4]. For supervised approaches a model is considered for the speech and noise signal and the model parameters are estimated using training samples of that signal. Then an interaction model is defined by combining speech and noise models and noise reduction task is carried out [4].

This paper is organized such that section II briefly describes characteristics of speech signal and frequently encountered noise types. Section III discusses the concept of adaptive noise cancellation and adaptive algorithms. Section IV briefly describes the non-negative matrix factorization technique and its application in speech enhancement. Section V provides the simulation results.

II. Speech Signal

Speech is the basic way of communication for humans. It is an acoustic waveform that conveys information from a speaker to a listener. The waveform shows the acoustic signal or sound radiated as pressure variations from lips while articulating linguistically meaningful information. It has a bandwidth of 4 kHz. Speech signal is random in nature, it is non-stationary, and the frequency spectrum is not constant in time [5].Speech signal can be described as a quasi-stationary signal which implies that the statistical parameters of speech remains reasonably constant within a very short time interval and that time interval is of the order of 10 ms -20 ms Human beings have an audible frequency range of 20 Hz to 20 kHz but the human speech has significant frequency components only up to 4 kHz. A typical speech utterance consists of a string of basic sound unit called phonemes. Different speakers uttering the same string of phonemes convey the same information yet sound different as a result of differences in vocal tract length and shape. Speech can be classified as voiced or unvoiced speech. Voiced speech is periodic with high amplitude while unvoiced speech is random with lower amplitude.

Noise

Noise is an unwanted signal that interferes with the communication or measurement of another signal. Noise itself is the signal that conveys information regarding the source of the noise [5]. Noise can cause transmission error and may even disrupt a communication process, therefore its removal is an important part of telecommunication and signal processing system. In general the noise affecting the speech can be modeled as (a) white noise (b) colored noise (c) impulse noise (d) transient noise pulses

White noise is an uncorrelated noise process with equal power at all frequencies. Because of broadband spectrum, white noise has strong masking capabilities. Colored noise refers to any broadband noise with non-white spectrum. Most audio-frequency noise, such as noise from moving cars, computer fans , electric drill noise and people talking in background has a non-white predominantly low frequency spectrum. Impulse noise consist of short duration on-off pulses caused by variety of sources, such as switching, surface degradation of audio recordings, clicks from keyboard, etc. Transient noise pulses often consist of a relatively short sharp initial pulse followed by decaying low frequency oscillation.

III. Adaptive Noise Cancellation

Adaptive noise cancellation is a type of interference cancellation where noise in the received signal is estimated and subtracted from the corrupted signal. The process uses adaptive filter for estimation of noise. The resultant signal, thus, has improved signal-to-noise ratio [6].

Fig.1: Adaptive noise cancellation configuration

In this process, the primary input is the received signal from which noise has to be removed. It serves as the desired signal for adaptive filter. The reference input is a noise which is correlated in some sense to the noise in the primary input. The output of the filter is subtracted from the primary input. The adaptive filter coefficients adapt to cause the error signal to be a noiseless version of signal as shown in the figure. The noise signal for this configuration needs to be uncorrelated to the signal. The error signal should converge to the exact signal but this is not possible in practice. Hence the difference between signal and error is always greater than zero. The only way is to minimize the difference using some error minimization technique [7].

Adaptive Filters

An adaptive filter adapts itself to changes in its input signal according to a given algorithm. The algorithm changes the coefficients of the filter according to a given criteria typically an error signal to improve its performance. Thus, adaptive filter can be described as a digital filter combined with an adaptive algorithm [7].

Adaptive algorithms

These are the algorithms which adjust the coefficients of the digital filter according to certain criteria to provide an output which is as close as possible to the desired response. Few of them are discussed briefly [7],[8];

A. The Least Mean Square ( LMS) algorithm

In the method of LMS, filter weights are calculated such that the cost function,

Jn=E{e2(n)}, is minimized. The implementation steps are as follows [9]:

1. The desired response is defined and each filter coefficient is set to zero.

w(n)=0, n=1,2,3,……,N (1)

For each sampling instant following steps are carried out;

2. Calculate the output of the adaptive filter in response to the input signal.

yn=nNwnx(n) (2)

3. Estimation error is generated by comparing this output with the desired response

en=yn-d(n) (3)

4. The filter weights are updated in accordance with the estimation error.

wn+1=wn+μ.en.x(n) (4)

where µ is the step size of the adaptive filter, x(n) is the filter input vector.

B. Normalized LMS algorithm

It is a modified form of standard LMS algorithm. The coefficients of the adaptive filter are updated using the following equation [9]:

wn+1=wn+αxn2+βenx(n) (5)

The NLMS algorithm has a time varying step size αxn2+β where 0<α<2 and β is a small positive number. This step size improves the convergence speed of adaptive filter. This algorithm converges faster than the standard LMS algorithm as the step size is optimized at each iteration. If the input for some length of time is very small, then xn2 is also very small which may cause numerical instability problems in division. This is overcome by adding a small positive number, β to xn2. The computational complexity of NLMS is higher than LMS.

C. Recursive Least Squares algorithm

The RLS algorithm calculates the weights of the adaptive filter such that the cost function

Jn=i=1nλn-i.e2(i) (6)

is minimized. The algorithm is implemented as follows [3][9]:

1. Output of the adaptive filter, y(n), is calculated.

2. Estimation error, e(n)=y(n)-d(n), is calculated.

3. Filter coefficients are updated as

wn+1=wn+en.k(n) (7)

Where w(n) is the filter coefficients vector and k(n) is the gain vector defined as ;

kn=Pn.x(n)λ+xTn.Pn.x(n) (8)

where λ is called the forgetting factor and P(n) is the inverse correlation matrix of the input signal. The matrix P(n) is initialized as P0=δ-1I, where δ is a small positive constant, and is updated as

Pn+1=λ-1Pn-λ-1knxTnP(n) (9)

Compared to other algorithms, RLS algorithm exhibits faster convergence. For stationary input signal, the forgetting factor can be set equal to 1. The system is then said to have infinite memory.

IV. Non-negative matrix factorization

Non-negative factorization is a method for finding parts based linear representation of data [4]. Non-negative matrix factorization decomposes a non-negative matrix, V∈RF×T+, into two matrices W and H such that the elements of the two matrices are non-negative.

V≈W.H (10)

Where W=RF×K+ and H=RK×T+ and normally K ≤ min (F,T).

The columns of W are the bases vectors which represents the pattern in the data and the rows of H are the base activation weights. This approximation is not unique. Various update rules have been proposed [7] to estimate W and H iteratively. The matrices are by minimizing the cost function as for example Euclidean distance or Kullbak -Leibler ( KL) divergence

DKL(V‖WH)=ij(VijlogVijWHij-V-WHij) (11)

To find the optimum value for KL divergence between V and WH, an iterative scheme with multiplicative rule can be used as proposed in [10].

W←W⊗ (VW.H).HTHT.1 (12)

H←H⊗VWH.WTWT.1 (13)

Where 1 is a matrix of size V and all its elements are ones and the multiplications ⊗ and divisions are all component wise operation.

NMF usually results in sparse decomposed matrices. If this sparseness is controlled then it gives part-based representation that is qualitatively better than those given by standard NMF [11]. In this case the W and H matrices are updated using following rule

W←W⊗ (VW.H).HTHT.1+α (14)

H←H⊗VWH.WTWT.1+β (15)

Where α and β are sparse factor.

Speech enhancement using NMF

In this method it is assumed that the noisy signal is an additive mixture of two sufficiently different sources i.e. speech and noise. NMF is applied to the magnitude spectra as it is assumed that the short time magnitude spectra of noisy signal can be represented as a linear combination of distinct components, those representing only speech spectra (Wspeech)and those representing only noise spectra (Wnoise ). These components are known as Spectral Basis Vectors (SBVs). The speech enhancement comprises of two stages: training and denoising.

In training stage , the SBVs of speech and noise are determined. This can be done by separately performing NMF on clean speech and noise separately. First the spectrum magnitude of both clean speech |Vspeech|and noise (|Vnoise|) is computed. Afterwards the KL divergence between the magnitude spectra and their corresponding factored matrices ((WspeechHspeech) and (WnoiseHnoise)) is minimized using the learning rule as described above. It is necessary to initialize these matrices properly as it is an iterative procedure. The SBVs contained in Wspeech and W noise are used as speech and noise models in next stage.

In denoising stage Wspeech and Wnoise are kept fixed and are concatenated to form a single set of SBVs called Wall. Given the magnitude spectrum of noisy speech signal (|Vmix|), NMF is performed on it to get |Vmix| ≈ Wall.Hall by minimizing the KL divergence between |Vmix| and Wall Hall updating only activation matrix Hall with the learning rules. In order to control sparseness parameters α and β should be properly chosen. The magnitude spectrum of denoised speech is estimated as |Vspeech|≈ WspeechHspeech where Hspeech is the rows of Hall corresponding to the actication coefficients of Wspeech. Finally the spectrogram is recovered using phase of original noisy signal and denoised speech signal is transformed into time domain.

V. Results and Conclusions

The performance of algorithms namely LMS, NLMS, RLS and NMF are compared for the enhancement of speech signals. The performances of the algorithms are compared on the basis of parameters such as output SNR, Execution time of algorithm, Mean square error, and convergence of weights of the adaptive filters. The speech signals from AURORA database are used for generating noisy speech signal of desired SNR by adding the signal to various types of noise. The noisy speech signals with their SNR ranging from -10 dB to 8 dB are taken as input to the system. The original signal, noisy signal that were presented to the adaptive filter and the filtered output are shown in Fig 2 .

Fig.2(a). Clean Speech Signal

Fig.2(b). Noisy Speech Signal

Fig.2(c) Denoised Speech using LMS algorithm

Fig.2(d). Denoised speech using NLMS algorithm

Fig. 2(e). Denoised Speech using RLS algorithm

Fig.2(f). Denoised speech using NMF algorithm.

Table 1 shows the improvement in SNR (in dB) obtained using the different algorithms. The variation of SNR of filtered speech signal with SNR of noisy speech input is shown in fig 3. All the results were seen in MATLAB.