499B Midterm Progress Report-

Bionic Beat Box Voice Processor (BBBVP)

1.Introduction

The BBBVP acquires and processes digital audio data, in the form of a ‘beat box’ through a Matlab based graphical user interface (GUI). The user can sing a beat into a microphone and use the software to transform the voiced beat into professional high quality drum loop, ready for export into a sampler or any audio recording environment. The voiced beat is chopped up and compared to a training set of data based on a specific user. Each burst is classified and the appropriate real drum sound is transplanted into the loop. The user has the ability to map any vocal sounds to any WAV file samples. All of this and more have already been implemented.

  1. Sound Recognition

The current sound recognition algorithm uses 256 overlapping bandpass filters of approximate bandwidth 156 Hz to create an array of features. Each burst of beat box data is filtered through each of the 256 filters. The means of the resultant signals after the filtering is saved into a single dimensional array of length 256. Therefore 256 individual features are created for each burst.

The feature array is compared against the training set to determine a match for the voiced beat burst. Upon recognition a real drum sample is transplanted into the drum loop. We are not happy with the current voice recognition algorithm. It is not robust enough, meaning that the user has to beat a very clear voicing for the recognition to find the correct match. Furthermore Matlab runs rather slow when it has to filter 256 times per beat box burst.

Currently under development is a more advanced pattern classification algorithm using a single layer neural network with a wide-ranging set of features as the inputs. Here is a list of possible features that could be used:

-Burst ramp time

-Power in the burst (autocorrelation at zero lag)

-Decay time

-A few LPC coefficients

-Peak frequency

-Zero crossing of the burst to determine pitch

Much testing needs to be done to figure out the combination of features leading to optimal class separation. Such an advanced recognition engine is required to match more complicated percussive sounds such as Tabla bols. This will be an ongoing research project.

3.The GUI

Matlab’s Guide visual GUI tool was used to develop the BBBVP’s GUI. The GUI is split into three sections: Control, Mapping, and Training. There is also a status indicator reading ‘Ready’ in the figure below.

Figure 1: BBBVP GUI

3.1.Control

This is the top left section of the GUI in figure 1. Here exist the buttons and sliders used to record, process and play back the voiced input.

3.1.1.Record and Stop

When ‘record’ is clicked, the software starts acquiring the input of whichever sound card is currently selected for microphone recording in the Window Sounds and Multimedia setting page.

A possible addition would allow the user to select the audio input. The sample rate of data acquisition is currently fixed in the software to 44100Hz. This ensures that no aliasing will occur. However the program will run faster if a lower sample rate is chosen. The sounds produced by a human beat box typically do not exceed 8000Hz, therefore a sample rate as low as 16000Hz could be used. A further addition will allow the user to pick a sample rate. The software ends data acquisition upon the user clicking the ‘Stop’ button. Now the data is ready for processing.

3.1.2.Process Beat

The ‘Process Beat’ button triggers a complex set of events that lead to the beat box being transformed into a real drum loop. First of all the software analyses the time-domain signal and finds the start and end points of each beat box sound burst. The points are used later to decide where to put the drum samples. The algorithm used to detect the bursts simply scans through a one-dimensional array representing the digitized microphone input. A window of 500 samples is used to scan the array. As the mean value of this window varies above or below a certain threshold we isolate the bursts. The window size can be increased to deal with noisier signals. An increased window size will tend to ignore noisy low amplitude bursts that could be mistaken for voice beats. The thresholds for beginning and end of burst can also be modified to deal with noisy signals. It would be nice addition to let the user select a ‘Sensitivity’ level that can be adjusted for different environments. Spectral Subtraction denoising is also used to decrease the amount of random noise throughout the signal. This will be discussed later in the final report.

Once the bursts have been located each burst is ripped out of the array and send to the voice recognition engine. From each burst, defining features are extracted which then get compared to the users training set. The voice recognition will be discussed in detail further on. Once the sounds have been identified, the BBBVP looks to the mappings (lower left of figure 1) and appropriately ‘transplants’ the mapped sounds for each burst into a new WAV file.

3.1.3.Play, Loop and the Dry Wet Mix Slider

The status window will let the user know when the processing is complete. Now the loop is ready to be heard! The user has a dry/wet mix option to hear the processed loop. If the slider is all the way dry, only the original voiced beat box will be heard when the ‘Play’ button is pressed. At the other end, the slider will cause playback of only the transformed beat. An interesting variation on the drum loop can be obtained by moving the slider. The play back can also be infinitely looped with ‘loop’ radio button.

3.1.4.Save Beat

The processed beat can be saved to a WAV file for later use in another application such as Pro Tools, or Recycle by Propellerheads. Recycle is loop making program that can be used to split beats up into individual slices for use in an hardware or software sampler. It also converts the WAV file loops into the REX format used by Propellerheads Reason software.

3.2.Play Click Track

To help the user stay on beat, or to create a beat with a precise tempo, a click track can be generated. Using the topmost slider (see Fig. 1), the user can adjust the number of beats per minute (bpm). When the click track is played, the software outputs a short beep sound according to the bpm frequency selected. The click track continues as long as the toggle button is depressed. While recording, it is recommended the click track be played through headphones to avoid interfering with the microphone input.

The quantization check box, in the record area, can be used to lineup a recorded beat or to modify the current beat’s tempo. Unintentional variations in the rhythm can be shifted in the output to match the current bpm setting. Quantization must occur during the initial processing of the beat. This feature is still in the works.

3.3.Training

To identify sounds in the voiced beat, a template for each unique sound must be established by the software. The training area (top right of figure 1) is used to configure the types of sounds identifiable in each recorded beat.

Upon selecting a sound number from the drop-down list and clicking the ‘Train’ button, the microphone input waits for four repetitions of the sound. For successful training, the samples must be consistent with each other, as well as distinguishable from other sounds in the template database. The repetitions ensure the user is capable of accurately duplicating a noise and creates an averaged representation of the sound for the database. Text messages indicate the status of the training procedure and if the new sound is accepted.