Vocal Mirror

Electronic Voice Analysis for

Therapeutic Diagnosis and

Rehabilitation

BME 273 Final Report

04/23/2002

Group 21: Joe Owens-Ream

Advisor: Dr. Tom Cleveland

Abstract

Many professional singers and speakers develop vocal problems through misuse of their own talking or singing voice. This unnecessary wear can lead to problems. The goal of the project is to apply already existing technology to this problem. The Fast Fourier Transform and simple audio technology are all that are needed, along with a sound-card and the LabVIEW program. Through LabVIEW, I created a virtual device to help re-train these ailing professional voices. The retraining happens on a number of levels and they each involve different aspects of the device. The goal of the device is to, from a live signal or recording, extrapolate data about the frequency and amplitude usage of the speaker's/singer's voice. This data will be analyzed by the system and output information on the pattern of voice usage (pitch and volume range) and provide real-time suggestions for use in the form of aural cues. The cues will be pitched so as to alert the patient how they have exceeded their desired range, either too loud or soft, or too high or low. This data can then be used to diagnose or characterize speaking/singing problems and tendencies. The VI was created through merging of a number of working LabVIEW sub-VI’s. This is the standard method of creation in the LabVIEW environment. LabVIEW was chosen because of its significant existing knowledge base. These sub-VI’s when merged formed a program that performed most of the services requested at the outset. The final program lacks portability due to the current PDA market. Thus logging and visualization portions were completed and the audio section was not. The adjustments needed to the program are very minor and the sub-VI for audio output would be a simple addition.

Introduction

The main motivation for this project arises from the problems of people in the professional singing and speaking world. Many professional singers and speakers develop vocal problems through misuse of their own talking or singing voice. This unnecessary wear can lead to problems such as vocal nodules, vocal polyps, cysts, and other voice disorders of unknown etiology, and eventual loss of voice. By retraining the patient to correctly use their voice, a speech pathologist saves their voice. This involves imparting self-knowledge of pitch and amplitude use to the patient. (Dr. Cleveland and Dr. Kurby)

There are many common symptoms of this misuse including vocal fatigue, increased vocal effort, lack of vocal quality and/or overall volume, hoarse or raspy voice, voice breaks, and breathy voice. If the patient is a singer, the patient may additionally show a loss of high notes, and/or unstable pianissimo (quiet) phonation, and increased breathiness throughout the singing range.

The prevalence of voice disorders in children up to 14 is about 6%. In adulthood this decreases to as low as 1% but increases to 6.5% for those 45 to 70. It is indicated in these studies that these numbers for the adult population are low and that many voice disorders remain untreated or even unnoticed for years. (Leske, 1981 and Marge, et al., 1985)

Disorders of vocal abuse and misuse are the most prevalent and preventable types of voice disorders. The frequency of repeated vocal nodule occurrence ranges from 15% to 35%. Of the total working population in the United States, approximately 25% have jobs that critically require voice use, and 3% of the population have occupations in which their voice is necessary for public safety. (Ramig, L.O., & Verdolini, K. 1998, February)

To establish some background information on the voice, consider this information. Vibrations induced when air is passed by vocal folds produce sound. These vibrations can cause wear and thus misuse can lead to a variety of vocal problems including vocal nodes and cysts. Such wear often happens without the patient’s knowledge, until the damage has been done. Speaking at the incorrect optimal pitch, usually too low or in the Vocal Fry Range, can cause this wear. (Dr. Cleveland) The extremes of volume and pitch are the areas of speech wear the damage is frequently done. (Dr. Cleveland)

The problems of current system include the amount of time spent in voice clinic, and the resulting lack of feedback when there is no speech pathologist present. I call this problem the “no take-home version” problem. There is no way for the patient to get feedback on their voice misuse when they are out of the voice clinic. Also in, and out of, the clinic there is a current lack of visual and aural representation of problems associated with their personal misuse. (Dr. Cleveland)

Methodology

Toward the problems stated in the introduction section, I have created a working solution that is as follows. The goal of the device is to, from a live signal or recording, extrapolate data about the frequency and amplitude usage of the speaker's/singer's voice. This data will be analyzed by the system and output information on the pattern of voice usage, and provide real-time suggestions for use. This data can then be used to diagnose or characterize speaking/singing problems and tendencies. The data logging function will compute statistics on range and use of voice usage in terms of volume and frequency. The real-time feedback will be in the form of an aural stimulus. The stimulus will be pitched so as to alert the patient how they have exceeded their desired range, either too loud or soft, or too high or low.

To address this solution LabVIEW was utilized. The built in signal analysis suite was incorporated to minimize programming work. This VI, created through LabVIEW, addressed all of the issues of the working solution except for the real-time output of audio signals. This is to be worked on in the future portable version. Some of the possible modes of application are to have the VI exported to multiple computers through LabVIEW's Application Builder. This application would work on and PC or Mac (Laptop hopefully) with a sound-card.

The VI will input the signal and perform a Fast Fourier Transform (FFT) on the data. The FFT will be a windowed, real-time transform that is present in all our physics books. The data in its original form will be analyzed for amplitude information. The first set of limit-loggers will be performed on this amplitude information. After the data is input the FFT is applied and the result is plugged into a VI that will determine and output the fundamental frequency of the signal. This base frequency is the pitch of the voice at that time. The pitch data is then input into the limit-logger. The log file will save data based on the excursions of the ideal pitch and volume range. This VI is complete and working. Mostly its success is due to the aptitude of LabVIEW in this area.

The visualization section of the VI will take the fundamental frequency (Hz) and amplitude (dB) data and transform them into pitch (scaled letters) and volume (relative dB’s) scales. The pitch will be visualized as a pitch on a piano, and the volume relative to different voice pictures. The meter will have pictures of shouting, speaking and whispering. This will give the patient a visual representation of where they are with their voice. This self-knowledge is often severely lacking. (Dr. Cleveland)

Results

The equipment and costs were very minimal. They included a Microphone, which was $20-$40. The A/D Board I used was already in computer with LabVIEW. Use of LabVIEW software in BME computer lab was free. To total this up the projected final costs were minimal to none.

See the appendix for pictures of the VI programs and for images of it’s application. The logging function generated sound clips of the volume or pitch infractions but as yet did not do any statistical analysis of the duration, frequency, or any other totals of the voice data. Thus there is no “real” data to report. The future data will include experimental use of the system for retraining of voices. The system should also perform a board of statistical tests on the voice data.

Some of the other work completed was the researching the physics of sound waves and the physiology of the voice and vocal disorders. Through consultation with Dr. Cleveland I narrowed of design definition to include logging, and visual output options. I spent time researching possible, but non-existent, overlap with current devices or software. The current market includes many options for speech analysis for such things as voice recognition for security and typing replacement. There is no application of the current signal analysis technology to speech pathology. This area has had little attention by engineers.

Other aspects of my work included evaluating LabVIEW's signal analysis options and built in VI’s. I spent the majority of my time working with LabVIEW to create a working program that could accomplish all of the desired functions.

Conclusions

In this project the design specifications were broken into three areas, each a separate part of the master VI: the logging, the visualization, and the aural output. Of these the latter was the one that depended on having a portable version. Since this is not currently possible with the current setup of LabVIEW and the current generation of PDA’s. It is anticipated that the next generation of PDA’s will have the ability to run a LabVIEW Application Builder .exe file. The current generation of very high-end Windows CE platforms may have this ability. These would be a rather expensive solution. Since this portable sub-set is further in the future this sub-VI was not handled in this semester. Future work would concentrate on this topic.

Of the two other sub-VI’s the logging functions were completely successful. The logging can be performed in real-time or on a taped signal. The logging takes sub-recordings of the excursions of the set limits. The limits can be set at any frequency or amplitude. The only thing lacking is a conversion for the setting of the logging parameters. In a future version of the VI, the interface for setting the logging parameters would have a pitch scale (on a piano) that had two markers to place. Also, the amplitude would be set by marking off a two tics on a decibel scale. The current system has to be set with Hertz and Volts. The volts setting is a little tricky but can be set by relative comparison quite easily. The frequency setting is a simple conversion. This VI works and is virtually complete.

The, as yet, un-mentioned sub-VI that inputs and records signals came with the LabVIEW program. The last sub-VI is the visualization VI. This Visualization works and is in real-time. The only needed work is the conversion of the frequency data into pitch labels. The amplitude data is converted to decibels but the meter with pictures is still needed for full understanding of the patient.

Besides these fine-tuning needs the VI was a success due to the miracle of LabVIEW. The program works and has been tested on myself.

Recommendations

The modifications and improvements mentioned earlier would greatly increase the efficacy of the program and device. In the short term continued testing of device and VI is needed as well as clinical trials of the device. Also work could be done with the Application Builder to create a better more universal VI. The conversion from a laptop-sized computer to a PDA would make this a truly useful product as well. This conversion would be where the true usefulness to the patient would be. All of the other functions of the system would only be aiding a clinician in their office and somewhat superfluous to their own demonstrations. The “take-home version” is where my original idea came from and where I see the device making an impact in people’s lives.

There are no ethical issues to be considered in my project. The voice is an often-used tool and is a part of all of our lives. The issue concerning this project is the maintenance and preservation of that natural tool.

The last improvement would be to design various analysis sequences to output statistics or live audio information. These statistics would be tailored to the speech pathologist’s desires in that they would be based on information they would think is useful in diagnosing a patient’s problems.

References:

The main source was Dr. Thomas Cleveland and Dr. Melissa Kurby to a lesser extent; other ancillary sources are listed below.

Leske, 1981 and Marge, et al., 1985.

American Speech-Language-Hearing Association Ad Hoc Committee on Service Delivery in the Schools. (1993, March). Definitions of communication disorders and variations. Asha, 35 (Suppl.10), 40-41.

Ramig, L.O., & Verdolini, K. (1998, February). Treatment efficacy: Voice disorders. Journal of Speech, Language, and Hearing Research, 41, S101-S116.

Pannbacker, M. (1999, August). Treatment of vocal nodules: Options and outcomes. American Journal of Speech-Language Pathology, 8, 201-208.

National Institute on Deafness and Other Communication Disorders. (1999, May). Disorders of vocal abuse and misuse (NIH Pub. No. 99-4375). Bethesda, MD: Author.

Appendices:


Ideation Process

Innovation Situation Questionnaire

1. Brief description of the problem

Professional singers and speakers misuse their voices in a harmful, but correctable, way. This remedy has currently a lack of technology applied to it.

2. Information about the system

2.1 System name

Vocal re-trainer for professional voice uses that mis-use their voice.

2.2 System structure

The structure will be a user interface created to input and output audio and visual signals.

2.3 Functioning of the system

The system will input an audio signal. It will perform a FFT on it to ascertain the fundamental frequency and the amplitude. This data will be logged, output both visually and in situations of limit excursion an audio signal will be output. The patient upon receiving these sues, both visual and aural, will refine their vocal use patterns.

2.4 System environment

The environment is twofold. The office of the voice clinician will be the ideal place for use of the logging function of the system so as to make sure the statistics of usage are interpreted in order to tune the system. The system, once tuned to the optimum voice range of the patient, will be primed to give feedback to the patient in their own lives which could begin with home use, and then if the problem persisted maybe their work or other area. The device would be unobtrusive in that the system is handheld and small.

3. Information about the problem situation

3.1 Problem that should be resolved

People misuse their voice and cause physical damage to their livelihood.

3.2 Mechanism causing the problem

Through poor technique and bad habits of speaking/singing they physically damage their vocal chords. Other damage occurs through, talking/singing too much and too often, though lack of sleep, though lack of proper nutrition, and through general poor health habits and several diseases.

3.3 Undesired consequences of unresolved problem

Damage or complete loss of the voice for either speaking or singing or both.

3.4 History of the problem

The prolonged misuse of the voice is the cause of the problems.

3.5 Other systems in which a similar problem exists

The vocal chords, which are muscles, can be likened to any other sports injury. Over, and mis-use cause damage to the muscles.

3.6 Other problems to be solved

Retraining and surgery are often the solution. Also complete abstention from use is often the solution.

4. Ideal vision of solution

The goal of the device is to, from a live signal or recording, extrapolate data about the frequency and amplitude usage of the speaker's/singer's voice. This data will be analyzed by the system and output information on the pattern of voice usage, and provide real-time suggestions for use. This data can then be used to diagnose or characterize speaking/singing problems and tendencies. The data logging function will compute statistics on range and use of voice. The real-time feedback will be in the form of an aural stimulus. The stimulus will be pitched so as to alert the patient how they have exceeded their desired range, either too loud or soft, or too high or low. The system will also be completely portable and unobtrusive.

5. Available resources

Readily-available resources: LabVIEW Program and the signal analysis suite embedded in it. Microphone and head-phones.

Derived resources: LabVIEW Application Builder, and the vocal clinic subjects.

6. Allowable changes to the system