Development of a minimum protocol for assessment in the paediatric voice clinic. Part 1: evaluating vocal function.
Dr Wendy Cohen and Dr Elspeth McCartney
Speech and Language Therapy
School of Psychological Sciences and Health
University of Strathclyde
76 Southbrae Drive
Glasgow
G13 1PP
Email:
Tel: + 44 141 950 3450
Mr Haytham Kubba and Mr David Wynne
ENT Department
Royal Hospital for Sick Children
Dalnair Street
Glasgow
G3 8SJ
Development of a minimum protocol for assessment in the paediatric voice clinic. Part 1: evaluating vocal function.
Abstract
The European Laryngological Society (ELS) recommend that functional assessment of voice disorder in adults requires evaluation of a number of different parameters. The current paper presents a discussion of four of the five parameters highlighted in the ELS protocol: perceptual evaluation of voice; videostroboscopic examination; evaluation of aerodynamic performance in voice and acoustic analysis. Subjective rating of voice in children is explored in a companion paper. These parameters have been extensively evaluated in adults, and a review of the literature pertaining to the paediatric population is presented.
Introduction
In the UK, children are normally referred to hospital ear, nose and throat (ENT) clinics by a general practitioner (GP) seeking assessment. Prevalence of voice disorders in children in the UK is estimated at 6%[1]. Onward referrals for voice therapy are usually made to community speech and language therapy (SLT) services. All such child services are paid for by the UK National Health Service (NHS). Clinicians require a battery of efficient and replicable measures that evidence current status, are useful to community Speech and Language Therapists (SLTs) and that may be repeated over time to measure outcomes. It is also important that any assessment protocol is not burdensome to a child. Within the only paediatric ENT voice clinic in Scotland, appointments generally last 25 minutes to undertake videostroboscopic imaging and a case history, and do not include an SLT or further voice assessment.
The five European Laryngological Society (ELS) recommended parameters are to be taken alongside a full ENT laryngeal examination and a case history. They are perceptual evaluation of voice, videostroboscopic imaging of vocal fold movement, acoustic analysis of specific voicing aspects, aerodynamic support for voicing and a subjective rating of voice impact[2]. Each parameter illuminates different aspects of functioning. Further, research and meta-analyses of voice therapies requires a minimum data-set of replicable and standardised measures, to compare outcomes across centres and clients. Little consideration has however been given to the development of an appropriate assessment protocol for children, what measurements require to be taken, and by whom.
Whereas these five parameters may be assessed by a specialist SLT in the UK, initial videostroboscopy evaluation is usually undertaken by an ENT surgeon. The first four parameters assess vocal function and the latter provides information relating to subjective evaluation of the impact of the symptoms of voice disorder on activity/participation and quality of life (QOL). This paper is concerned with the first four parameters; that is assessment of vocal function in children. A review of the evidence supporting each of the four ELS parameters in relation to functional evaluation of voice disorder and how this can be considered for children is presented. A companion paper[3] explores the application of tools to establish subjective evaluation of the impact of voice disorder on children’s activity/participation and QOL.
The use of a multidimensional approach to the evaluation of voice lends itself to the ability to compare, contrast and correlate each component in order to direct the clinical team towards accurate diagnosis and intervention. The notion that functional voice evaluation requires the use of several assessment layers is not new and as Kent [4] summarises, “a comprehensive assessment of speech function depends upon a balance of physical and perceptual analyses. Exclusive reliance on either one alone may limit the understanding of speech impairments” ( p.6).
However, it is not yet known if separate investigation of all voice function parameters is in fact necessary. Studies that have made comparison between perceptual and acoustic assessment of voice have found some levels of correlation. Some studies involve assessment and analysis of routine clinical measurement activity[5] [6], others incorporate more complex analyses and/or introduce different levels of skill in the listeners’ judgments of perception[7] [8]. Most such investigations have however involved adults with voice disorder, and Sataloff [9] points out that the application of protocols for the multidimensional evaluation of voice has been applied more consistently to adults than to children. Further investigation comparing correlations amongst the ELS assessment parameters in children is required and this is being proposed by the current authors.
Broad considerations when evaluating paediatric voice
Pre puberty, there are a number of differences between the paediatric and adult larynx that do not necessarily relate simply to gross anatomical differences such as size and rigidity of the larynx. There are known histological differences between the child and the adult larynx relating to the development of the lamina propria at about 7 years of age[10] , while less well understood biochemical differences in the developing larynx may need to be taken into consideration when evaluating the resultant vocal output[11].
Phonation relies on several biomechanical and neurological principles[12] and voice is not the bi-product of laryngeal movement alone. Adequate voicing requires suitable respiratory, articulatory and resonatory systems to power the vibration of the vocal folds and modulate the resulting acoustic output into the various sound patterns of the spoken language. Normal child development encompasses changes in these systems and the timing and progress of this development needs to match laryngeal development to create a competent and efficient vocal mechanism.
It is important therefore to be cognisant of these factors when reviewing evidence within the literature in relation to functional assessment of paediatric voice disorder in relation to the ELS recommendations.
The ELS Vocal Function Parameters
Perceptual evaluation
One of the most important methods of evaluating voice relates to how the listener perceives the voice. Judgement of the severity of a voice disorder is important so that voice therapy goals for the client with dysphonia can be matched to perceived severity. There are a variety of formats for describing the nature and features of dysphonia that use descriptors such as “breathy”, “hoarse” and “harsh”, assessed by grading scales.
Two scales are currently in common clinical use: the four point likert rating scale (0 - 3) proposed by Hirano [13], known as “GRBAS” and the CAPE-V[14] which uses a 100-point visual analogue scale. Each provides a method for perceptual evaluation of voice quality. Both use the point 0 as a referent to “normal” voice.
The GRBAS has five ratings: grade (the overall degree of hoarseness), roughness (a psycho-acoustic impression of irregularity of vocal fold vibrations), breathiness (psycho-acoustic impression of the extent of air leakage through the glottis), aesthenic (weakness or lack of power in the voice, related to intensity) and strain (psycho acoustic impression of a hyperfunctional state of phonation)7. CAPE-V incorporates scales relating to overall severity (a global, integrated impression of voice deviance), roughness (perceived irregularity in the voicing source), breathiness (audible air escape in the voice), strain (perception of excessive vocal effort / hyperfunction), pitch (perceptual correlate of fundamental frequency (FO)) and loudness (perceptual correlate of intensity).
While GRBAS ratings can be made on any speech sample and clinical practice incorporates comparison of spontaneous connected speech with a reading sample, CAPE-V requires production of two sustained vowels (/a/ and /i/), a sample of connected conversational speech, and sentences of varying design. Each of these sentences has an emphasis on one laryngeal aspect: every English vowel; easy voicing onsets (e.g. /h/); hard glottal attacks; nasal sounds; plosive sounds. Simulations and practice, including some child voice disorder examples, can be accessed at http://engage.doit.wisc.edu/sims_games/showcase/speechpathology/index.html.
The ELS recommendation is for a minimum evaluation of grade, breathiness ,and roughness1. Although descriptive terms vary between GRBAS and CAPE-V, both meet ELS minimum recommendations.
Carding and colleagues[15] suggest that a clinically valid perceptual rating scheme requires to be theoretically sound, internationally acceptable and have proven reliability and that the GRBAS scale is the minimum level of perceptual analysis that SLTs should be undertaking. The CAPE-V promises a more refined assessment, due to the longer scale, and continues to be widely researched by the international SLT profession. Studies have endeavoured to measure rater reliability of CAPE-V, incorporating levels of training to support voice quality definitions in adults with voice disorder. Where auditory ‘anchor’ examples of representative disorders are given to inexperienced SLT students[16] and experienced SLTs [17], inter-rater reliability is strongly affected by training regardless of level of experience prior to rating.
There are few investigations of either GRBAS or CAPE-V where the speech samples have come from children with voice disorder, but Kelchner and colleagues[18] obtained audio samples from 50 children with a history of airway conditions who had undergone laryngo-tracheal reconstructive surgery. The speech samples were rated by three experienced SLTs where a high level of agreement for four of the six CAPE-V scales (overall severity, roughness, breathiness and pitch) was found. The authors however recognise the challenges in collecting the full range of tasks within the CAPE-V protocol from children with voice disorder. For this reason, clinical assessment of children may utilise a screening tool such as the Quick Screen for Voice (QSV) [19] where aspects of voice quality such as “rough or hoarse”, “breathy”, “vocal strain and effort” are rated as present or absent through a check list. This is a screening tool with a relatively low specificity (58%)[20] however, the tasks within the QSV may be more appropriate for children than the full CAPE-V protocol.
The high level of inter-rater agreement demonstrated by Kelchner and colleagues 18 lends support to the ELS recommendation that perceptual evaluation of voice should include a minimum evaluation of grade (or overall severity), breathiness and roughness. However, some consideration remains to be given to the length and type of speech samples that children, particularly young children, are able to provide.
Videostroboscopy
Videostroboscopy is well-established in adult voice practice but has yet to gain wide acceptance in children. Although infants can be restrained for laryngoscopy with ease, and teenagers may be persuadable, there is a perception that it is not possible to perform awake examination of the larynx in young children as they will not cooperate. In fact, the limiting factor seems to be the experience and motivation of the endoscopist in dealing with children.
There is a high success rate reported in the literature using rigid endoscopy in children[21] though the toleration of rigid endoscopy may be more practical in children over 10 years of age[22]. Several paediatric specialists have also reported consistently successful awake transnasal fibreoptic laryngoscopy for voice problems across all age groups including pre-school children [23] [24] [25] [26], particularly when detailed and careful explanation is given prior to undertaking evaluation[27]. Equipment also plays a part in this success: newer videoendoscopes are being manufactured with smaller tip diameters which allows for excellent visualisation of the glottis in younger children. In the clinical experience of one author, the use of fine (2.3mm) fibreoptic endoscopes combined with a High Definition camera and recording stack allows visualisation in children who do not tolerate videoendoscopy. In the past, such endoscopes would have been inadequate for stroboscopy but developments in optics and image processing have made effective stroboscopy via the flexible endoscope a reality. Image quality is now excellent leading to much-improved diagnosis in the outpatient setting and a significant reduction in the need for laryngeal examination under general anaesthetic. The combined assessment by ENT and SLT together is the next logical step, providing the same level of service for children as is now routinely provided for adult.
While the feasibility of outpatient diagnosis has been established, the ideal dataset to record is yet to be established. The range of diagnoses seen is very different to that in adults[28]. Medical diagnoses such as reflux laryngitis, vocal fold paralysis and mucus retention cysts should all be relatively easy to document. Voice disorders in children after laryngeal surgery for airway obstruction or after laryngeal intubation injury are well documented [29] [30] [31] [32]. Advances in reconstructive surgeries to reduce airway obstruction or to minimise the impact on vocal function of intubation injuries are evident, with clinicians becoming increasingly concerned with using techniques that will also enhance outcomes associated with voice quality [33].
Regardless of the underlying diagnosis, it may be helpful to record various features of laryngeal function: glottal configuration, adequacy of closure and extent of opening can all be graded, along with symmetry of amplitude and phase[34], but the reliability and clinical utility of doing so is unknown in children.
Acoustic analysis
The availability and affordability of computers and analysis software has led to their increased use in the routine evaluation of voice disorder, specifically for acoustic measures. There has been discussion of the need for consistency and agreement when recording vocal data, and the algorithms inherent in the various analysis programs can impact on the reliability and validity of cross study comparisons [35] [36]
The ELS minimum acoustic evaluation of voice requires perturbation measurements (cycle to cycle variation in frequency and amplitude, measured using jitter and shimmer) and reference to harmonic-to-noise computations (taking into account the lack of standardised optimal algorithms for evaluating this) from a sustained open vowel such as /a/ at a comfortable pitch and volume. Further evaluation of frequency when producing a sustained /a/ at a louder volume can indicate possible changes in voice quality relating to vocal flexibility. Baken[37] provides a detailed range of various frequency related normative measurements in adults and children to which many authors still refer.
Research has found conflicting evidence relating to the effect of gender, age and height on acoustic measures of vocal function in children with normal voice. It has been thought that anatomical changes associated with height may impact on frequency perturbation, and this has been found in one study of children aged 7-15 years [38] but not by others where the age range is greater (4-18 years) [39] and less (6-12 years) [40]. Aside from age differences, acoustic measures of vocal function differ across these studies, and it has already been pointed out this in itself can make comparison more troublesome.