1

ECE 446 MATLAB Project 2

Question 1

I have chosen to work with the file cw162_8k.mat; “A speedy man can beat this track mark.”

Figure 1-a

Figure 1-a

Figure 2-a

Figure 2-b

Question 2

In order to determine the pitch manually we first graphed our autocorrelation sequence and looked for the largest positive peak after Rx[0] for each graph. Next we found the number of the lag that matched up to the maximum peak found in the previous step. From here we took our sampling frequency (Fs=8000Hz) and divided by the number of lags to the peak found in the previous steps, and the result was the pitch. We found that there were 43 lags when calculating them manually and with our pitch detector. So this shows that 8000(Hz)/43(lags)=186.0465(pitch). I believe this value is reasonable pitch based on the fact that the normal frequency range for men is 80 – 160 Hz, so the pitch of 186 is not far too far from this range. An explanation for this value being out of the normal range could be because it is the pitch of the vowel (e) which is a high pitch sound on its own.

Question 3

Figure 3-a Figure 3-b

Figure 3-c Figure 3-d

I found that the composite frequency response was basically what I was expecting (seen in figure 1-d above.) I figured that when all of the bands were added together I would get a more or less flat magnitude response where all of the frequency bands were affected equally. I figured this would be the case based on the idea of what a filter bank is doing which is merely creating a separation between signals. It also makes sense that there was a slight attenuation for all of the signals because they were still all being filtered. As for the Phase I did not really know what to expect but when looking at the composite graph above it makes sense that it is only the addition of all of the phases from the 18 bands.

Question 4

Figure 4-a

Figure 4-a

Figure 4-b

When observing my automatic pitch detector I believe it is working well. The average pitch frequency is about 120Hz which represents the middle of the frequency band for average pitch for men (80-160Hz). When comparing the pitch detector to the actual signal (shown above in figure 4-b) a pattern can be seen. The breaks between words in the actual signal can be seen as pitch values of zero in figure 4-b. As mentioned earlier the EE sound that gave a pitch of 186Hz did only represent a portion of the total pitch for the sound file, and the average pitch was a reasonable value.

Question 6

Figure 6-a

Figure 6-b

Figure 6-c

When comparing the magnitudes of the LP and DFT spectra you are basically looking at the same graph. They both seem to follow each other across the different frequency bands. If the magnitudes of the two signals were within the same range the LP graph would more or less make an envelope above the peaks of the DFT graph. The LP spectrum gives a better representation of the formants. The DFT spectrum gives a better representation of the fundamental frequency. I believe the 20th order model gives the best representation of the formant frequencies because you can see much more variation and detail in this graph that is lost in the lower frequency models, it shows how the LP model tracks the DFT spectra. This tracking is more visible with the 20th order model because it creates more poles that allow the LP spectra to match the changes in frequency and help distinguish the different formant frequencies.

Question 8

Both the channel and the LP vocoders add some noise to the original file. The sounds remind me of the audio from a tape that is older and has scratches and dust on it. The channel vocoder has a scratchy sound in the background during the length of the audio clip. This sound is never too loud and the file remains easy to understand. The LP vocoder gives a slightly different affect on the sound file. This vocoder actually makes the sound file sound less scratchy over all. This is the case up until the file has the hard consonant “Break” (a speedy many can break this track mark). When the word break is said it becomes quite distorted. Once this word is finished it sounds clearer again, and is easy to understand. This problem was likely just an error in the LP vocoder, but it could be an issue across the board that that this type of encoding is plagued with.

Question 9

Assuming a 16-bit quantization and knowing the sampling rate of the original signal was 8000Hz you get 128kbits / second which corresponds to 128,000 bits / second. If assuming that each pitch value can be encoded in 8 bits and that each band envelope value requires 12 bits, I find that the bit rate of the channel vocoder is (18bands*12bits+8 for pitch)*(8000/100) for a total of 12,920Kbits/sec. The bit rate of the LP vocoder is (8bits(pitch)+ 12bits*12Poles +8bits (for gain))*(1/.015)=10666.66Hz. So this shows that after the two vocoders are used they significantly reduce the bit rate. The original signal was 128kbits / sec and we got that down to about 10.6 – 12.9 kbits / sec. The bit rate of the LP vocoder was less than the bit rate of the Channel vocoder because it takes more bits to implement the 18 bands of the Channel vocoder than the 12poles of the all pole filter vocoder.

Question 10

Figure 10-a

Figure 10-b

Figure 10-c

Figure 10-d

Figure 10-e

Figure 10-f

The three spectrograms look quite a bit alike. When looking at the original spectrogram it has the darkest lines compared to the other two. This means there must be a greater quantity of each frequency. Both Vocoders appear to have more spaced out lines and there seem to be fewer of them as well. This is in essence showing how the broad frequencies are being diminished. When looking at the channel vocoder it seems the formant frequencies are being lessened at a greater rate that the LP vocoder. Both however seem to lessen the frequencies other than the formant ones.

Question 11

The source-filter model of speech is the base that both the channel vocoder and LP vocoder use. The source filter model has two parts: a time varying filter, and a source signal. The process the Channel vocoder uses has several steps. First a bandpass filter bank is used to split the speech signal into multiple frequency bands. After these filters and full wave rectification the signals will have envelopes. Next a down-sampling takes place to decrease the bit rate for the different frequency bands. Next the pitch of the sound segment is obtained from the autocorrelation function and voicing values are created. If the segment is calculated to be voiced an impulse train creates a pitch for it, otherwise the sound segment has no pitch and a noise applied. In order to make synthesized sound from these files the previously calculated band envelope values are interpolated to a given sampling rate and then scale the source signal. The LP vocoder utilizes the all-pole model. The initial sound segment is sent through and all-zero filter to help establish an error signal. This signal is then utilized as the “source” part of the model. From here pitch values for the original signal are created by running pitch detection on the error signal. The process for producing the pitch here is the same as for the channel vocoder. An impulse train will get the pitch if the segment is determined to be voiced; otherwise noise will be used for the sound source. The main similarities between the channel and LP vocoders are that they are both based on the source-filter model of speech, and they both utilize the same pitch detection and creation of source signals. The way that the two models differ is based on the way filtering takes place. The channel vocoder utilizes a bandpass filter bank where as the LP vocoder uses the all-pole filter.

Question 12

The thing I found most challenging about this lab exercise was the same challenge I encountered in the first lab exercise, and that is trouble coding in MATLAB. However I did find I was able to use a good deal of the coding knowledge I gained in the first project to help me through the second. Over all I feel the more I keep using the program the more I will become proficient at it.

Question 13

I would say that the most important thing that I learned from doing this project would be the knowledge I gained about speech analyzers and synthesizers. I never actually understood how complex of an operation was taking place to make seemingly simple things like down sampling possible.

Question 14

Like: Working with a real life example and listening to how the speech signal changed going through the different vocoders.

Dislike: Trouble shooting in MATLAB to get desired results.

MATLAB Project 2

Speech coding

ECE 446

Biomedical Signal Processing

Mark Van Camp

Partner:

Michael Carpenter

Appendix:

pitch_detect.m

function [pitch] = pitch_detect(x)

%PITCH_DETECT Performs pitch detection on a speech waveform

%

% p = pitch_detect(x);

% x is vector containing frame of speech data

% p is scalar containing pitch of frame in Hz or 0 if unvoiced

Fs = 8000;

% First, remove the dc value of the frame by subtracting the mean . . .

x = x-mean(x);

% Then find the minimum and maximum samples and center clip to

% 75% of those values (`cclip') . . .

l = length(x);

max = 0;

min = 0;

for i = 1:l;

if(x(i)>max)

max = x(i);

end

if(x(i)<min)

min = x(i);

end

end%end for loop

cmax = 0.75*max;

cmin = 0.75*min;

clipped = cclip(x,cmin,cmax);

% Compute the autocorrelation of the frame . . .

[c,lags] = xcorr(clipped,l,'coeff');

% Find the maximum peak following Rx[0] (`peak') . . .

peak = 0;

peakindex = 0;

for i = (l+2):(2*l-1);

if(c(i) > c(i-1) & c(i) > c(i+1)) %% this is a peak

if(c(i)>peak)

peak = c(i);

peakindex = i;

end

end

end%end for loop

%[peak2,peakindex]=peak(c(l+1:2*l+1));

% Determine if the segment is unvoiced based on the 'voicing strength'

% (the ratio of the autocorrelation function at the peak pitch lag

% to the autocorrelation function lag=0) . . .

% If voicing strength is less than 0.25, call it unvoiced and set pitch = 0,

% otherwise compute the pitch from the % index of the peak . . .

if(peak < 0.25) %then unvoiced

pitch = 0;

else%voiced

pitch = Fs/(peakindex-l);

end

myownfilter_bank:

function bank=filt_bank(N,L,Fs,B)

% FILT_BANK Filter bank generator

% bank = FILT_BANK(N,L);

% N number of bands

% L length of each FIR filter

% Fs sampling frequency in Hz (default 8000)

% B width of each band in Hz (default 200)

% Generates an LxN matrix

% Each of the N columns contains an L-point FIR filters

% The following parameters are specified within the function

% so they'll be the same for both the analysis and synthesis

% portions of the channel vocoder:

% Set defaults

if nargin < 3

Fs =8000; % default sampling frequency in Hz

end

if nargin < 4

B = 3600/N; % default width of each band in Hz

end

start = B/2; % first center freq. in Hz

% preallocate output for speed

bank = zeros(L,N);

% Design a prototype lowpass filter . . .

% Choose the cutoff frequency to obtain a bandwidth of B.

% We suggest that you use a Kaiser window with beta = 3.

lpf = fir1(L-1,B/Fs,'low',kaiser(B/Fs,3));

% Need to make it a column vector

lpf = lpf(:);

% Now, we have to shift the lowpass filter into a series of bandpass filters

% (See Problem Set 2, #1)

% First, we need to create a discrete-time vector t for argument to cosines.

% The length of t must equal the length of the lowpass filter (L),

% and the spacing between the samples must be equal to 1/Fs.

% Note that t must be a column vector.

t = [ 0:1/Fs:(L-1)/Fs]';

% In loop, design filters for the remaining bands:

for i = 1:N

% Compute desired center frequency from i, start and B

cf = i*B-(B/2);

% Shift lowpass prototype to center frequency . . .

% See (See Problem Set 2, #1)

bank(:,i) = lpf .* cos(2*pi*cf*t);

end

chvocod_ana:

function [y,p]=chvocod_ana(x, D, N)

%CHVOCOD_ANA Analyzes speech waveform into pitch and band envelope signals

% [y,p] = CHVOCOD_ANA(x, D, N)

%

% x vector containing speech signal to encode

% D decimation rate

% N number of bands

%

% y output matrix of band envelope signals

% each column contains output of one band

% each row contains output of one frame

% p output vector containing pitch signal with one

% value for each frame

% This code template has two separate stages, corresponding to the

% source-filter model of speech production:

% The first stage involves characterizing the "source" by pitch detection.

% Pitch detection is accomplished by breaking up the original signal

% into frames and then determining if each frame is is voiced or unvoiced.

% If the frame is voiced, then we also estimate the fundamental frequency

% of the glottal source.

% The second stage involves characterizing the "filter", that is

% determining the band envelope values. This is accomplished by filtering

% the original signal into frequency bands, determining the envelope of

% each band and decimating.

% Make x a column vector just to be sure.

x = x(:);

Fs = 8000; % sampling frequency

frlen = floor(0.030 * Fs); % use 30ms frame length

nframes = ceil(length(x)/D); % total number of frames

% Note that nframes depends on the decimation rate, D.

% Try to understand why . . .

% preallocate outputs for speed

p = zeros(nframes, 1);

y = zeros(nframes,N);

%------%

%------get "source parameters" (pitch detection)------%

%------%

% First, lowpass filter the signal with 500 Hz cutoff frequency,

% since we're interested in pitch, which is 80-320 Hz for adult voices.

% Be careful: the lowpass filtered signal should be used as the input

% to the pitch detector, but not to the filter bank!

lpf = fir1(64,500/Fs,'low');

xlpf = filter(lpf,1,x);

% Here's the loop. Each iteration processes one frame of data.

for i = 1:nframes

% Get the next segment. The indexing has been done for you.

startseg = (i-1)*D+1;

endseg = startseg+frlen-1;

if endseg > length(xlpf)

endseg = length(xlpf);

end

seg = xlpf(startseg:endseg);

% Call your pitch detector . . .

p(i) = pitch_detect(seg);

end

% Remove spurious values from pitch signal with median filter . . .

p = medfilt1(p,3);

%------%

%------get "filter" parameters (determine band envelope values)------%

%------%

% Compute FIR coefficients for filter bank (use 65-point filters) . . .

% The variable bank should be a 65xN matrix with each column containing

% the impulse response of one filter

bank = myownfilt_bank(N,65);

% Apply the filterbank to the input signal, x . . .

% Please be sure that you understand why you shouldn't apply

% the filterbank to the lowpass filtered signal, xlpf

% In loop, process each band:

for i = 1:N

% Apply filter for this band (bank(:,i)) to input x (not xlpf) . . .

X = fftfilt(bank(:,i),x);

% Take magnitude of signal and decimate.

% (The matlab function decimate includes lowpass filtering) . . .

%y(:,i)=decimate(abs(X),D);

y(:,i)=decimate(abs(X),D);

end

lpvocod_ana:

function [coeff, gain, pitch] = lpvocod_ana(x, p)

%LPVOCOD_ANA Analyzes speech waveform into pitch, gain, coefficients

% [coeff,gain,pitch] = LPVOCOD_ANA(x, p)

%

% x input signal

% p order of LPC model

%

% coeff matrix of LP coefficients

% (column index = frame number;

% row index = coefficient number)

% gain vector of gain values (one per frame)

% pitch vector of pitch values (one per frame), 0=unvoiced

%

%

% Make x a column vector just to be sure.

x = x(:);

% Initialize variables for loop.

Fs = 8000; % sampling frequency

frlen = round(0.03 * Fs); % length of each data frame, 30ms

noverlap = fix(frlen/2); % overlap of successive frames, half of frlen

hop = frlen-noverlap; % amount to advance for next data frame

nx = length(x); %length of input vector

len = fix((nx - (frlen-hop))/hop); %length of output vector = total frames

% preallocate outputs for speed

gain = zeros(1, len);

pitch = zeros(1, len);

coeff = zeros(p+1, len);

coeff(1,:) = ones(1,len);

% Design (but do not yet apply) a lowpass filter with 500 Hz cutoff

% frequency, since we're interested in pitch, which is 80--320 Hz for

% adult voices. Make the filter length substantially shorter than the

% frame length . . .

lpf = fir1(64,500/Fs,'low');

% Here's the loop. Each iteration processes one frame of data.

for i = 1:len

% Get the next segment/frame of the unfiltered input signal.

% The indexing has been done for you.

seg = x(((i-1)*hop+1):((i-1)*hop+frlen));

% Compute the LPC coefficients and gain of the windowed segment (`lpcoef')

% and store in coeff matrix and gain vector . . .

[coeff(:,i),gain(i)]=lpcoef(seg,12);

% Compute the LP error signal

error = filter(coeff(:,i),gain(i),seg);

% Detect voicing and pitch for this frame by applying 500 Hz lowpass filter

% to error signal and then calling your pitch detector . . .

v = filter(lpf,1,error);

pitch(i) = pitch_detect(v);

end %end for loop

% Remove spurious values from pitch signal with median filter . . .

pitch = medfilt1(pitch,3);

lpvocod_syn:

function y = lpvocod_syn(coeff, gain, pitch, fr_int)

%LPVOCOD_SYN Synthesize speech waveform from pitch, gain, and LPC

% coefficients

%

% y = LPVOCOD_SYN(pitch, coeff, gain, fr_int)

%

% coeff matrix of LP coefficients

% (column index = frame number;

% row index = coefficient number)

% gain vector of gain values (one per frame)

% pitch vector of pitch values (one per frame), 0=unvoiced

% fr_int frame interval (sec)

%

% y synthesized speech signal

%

% Error checking

if (nargin < 3), error('There must be 3 or 4 input arguments'); end;

[nrows nframes]=size(coeff);

if (nframes ~= length(pitch)), error('Pitch vector has illegal length'); end;

if (nframes ~= length(gain)), error('Gain vector has illegal length'); end;

% Initialize variables for loop.

Fs = 8000; % sampling frequency

if (nargin < 4), fr_int = .015; end; % assume 15-msec frame interval

frlen= round(fr_int*Fs);

% Preallocate output for efficiency

y=zeros(nframes*frlen,1);

delay=0; % delay to first pitch pulse

filt_state = zeros(size(coeff,1)-1,1);

next_delay=0;

% Loop over frames to generate total source signal

for i=1:nframes

% Pitch value for each frame indicates if voicing source or noise source

% if pitch(i) > 0

% % Compute pitch period (in samples) and generate impulse train (`pulse_train') . . .

% % (Be sure to save new delay for next frame)

%

% else

% % Generate noise source (`randn') . . .

%

% end

[source,next_delay]=sw_source(pitch(i),Fs,frlen,next_delay);