Hi My Name Is Jeff England

Jeff England Project Presentation

EE6820

SLIDE 1: INTRODUCTION

Hi my name is Jeff England. I am a CVN student and I have chosen to do my project on stereo mix source identification and separation. My project is based on some work done by Carlos Avedano. There is a reference to his paper at the end of the slides.

SLIDE 2: How stereo mix is created

To start I would like to explain how a stereo mix is created. Each Source is usually recorded either independently or they are recorded into their own separate tracks so that unique panning coefficients can be applied to each source. In this case a source is defined as a separate track, for example an instrument or vocal recording. The panning coefficient simply determines the direction of the source. In other words, the percentage that is coming out of the left channel and the percentage that is coming out of the right channel. In the equation shown i represents either the left channel when i is equal to 1 or the right channel when i is equal to 2. All sources with their associated panning coefficients are then summed together and a reverberation impulse response is then convolved with the summation result.

SLIDE 3: Similarity Function

The basic idea is to compare the left and right channels in the frequency domain to identify the different sources based on the panning coefficients. So if we let Xi(m,k) equal the STFT of the input signal xi(m,k), where m is the time index and k is the frequency index, we can create an equation that will define the similarities between the left and right channels. This equation shown at the bottom of the slide is bounded between 0 and 1. Sources panned to the center are equal to 1 and sources panned completely to either side are equal to 0. Therefore we need to determine if the source direction is in the left plane or the right plane since both have a value ranging between 0 and 1. A better definition would be a range between -1 to 1.

SLIDE 4: Panning Index

So we can find the source direction in the left right plane by subtracting the similarity functions for each channel and using a resolving function to determine if we should multiply by 1, 0, or -1 in the panning index equation which is shown at the bottom of the slide. So a value of 1 would mean the source is in the left plane, a value of 0 is located directly in the center and a value of -1 will be located in the right plane. The actual difference value is not used. The panning index is now simply defined as the similarity function with a range from -1 to 1.

SLIDE 5: SIR

The panning index equation works best when the different sources in the mix do not overlap in the transform domain. However, this is usually not the case therefore a measure of the error needs to be made to determine the boundaries of the panning index window. So we define the error as shown at the bottom of the slide. The error is the amount of interference between two sources.

SLIDE 6: Panning Index Window

Now we select time-frequency bins equal to the panning index. Selecting only these bins will separate the particular source from the mix. Using a window centered on the panning index will reduce distortion but may increase interference; this is due to the fact that most sources have some overlap in the frequency domain. Carlos Avendano suggests using a Gaussian window which is shown below. ε controls the width of the window and Ψc is the rejection point and is calculated as the maximum panning index error.

SLIDE 7: End Result

To find the end result we subtract the windowed result from the original signal to get a new signal without the separated source. Then we need to simply take the inverse STFT to get back to the spatial domain and obtain an audio signal that can be played back.

SLIDE 8: Goals & References

My first goal is to duplicate Carlos Avendano’s work. Then I would like to have an algorithm automatically tell me how many different tracks or number of panning indices, are located in the mix. If I have more time then I could try to identify what instruments were used in the mix or there are a couple of other different ideas that Professor Ellis has mentioned to me. Here is a reference to the paper that was used by Carlos Avendano.