RUNNING HEAD: Good reliability in atypical speech lateralisation

Measurement reliability of atypical language lateralisation assessed using functional transcranial Doppler ultrasound

Jessica C. Hodgsona, 1 and John M. Hudsona

aSchool of Psychology, University of Lincoln, Lincoln, UK

1Present address: NIHR Hearing Biomedical Research Unit, Nottingham, UK

Corresponding Author

Jessica C Hodgson

NIHR Hearing Biomedical Research Unit

Ropewalk House

Nottingham NG1 5DU

UK

E-mail:

Phone: + 44 (0) 115 823 2636

Abstract

It is well established that some individuals present with atypical, non-lefthemisphere, cerebral lateralisation for languageprocessing. However previous studies exploring the reliability of functional blood flow responses to detect lateralised activation during speech have focused only on individuals with typical left sided dominance. Here we report test-retest and between-task reliability measures obtained with functional transcranial Doppler ultrasound in 47 participants, including 9 with atypical language presentation. Results showed good test-retest reliability in atypically lateralised individuals, even after an interval of 120 days. Between-task reliability was weaker, but still within acceptable ranges.

Key words:

Transcranial Doppler

Cerebral blood flow measurement

Speech

Hemispheric Lateralisation

Reliability

Introduction

It is well established that the left cerebral hemisphere is dominant for language processing and production in the majority of people. However it is also known that some individuals have atypical hemispheric representation for speech processing, such that there is a deviation from the typical pattern of left hemisphere dominance (Knecht et al., 2000a; Deppe et al., 2000). This reduced left sided bias is observed more frequently in individuals who are left handed (Knecht et al., 2000b)and in some neurodevelopmental disorders such as Dyslexia (Illingworth and Bishop, 2009), Specific Language Impairment (Bishop et al., 2014)and Developmental Coordination Disorder (Hodgson and Hudson, 2016).However, little is known about why atypical language lateralisation occurs (see Bishop, 2013)and whether such lateralisation profiles are stable across tasks and between measurement sessions. Increased variability in lateralisation indices have been reported in people with atypical language representation (Knecht et al., 2003) as well as in young children(Kohler et al, 2015) and there is a suggestion that laterality profiles of individuals who display a reduced left hemisphere bias may be indicative of distributed cortical activation due to task complexity, rather than of altered language processing (Brownsett et al., 2014).

Here we report on the test-retest and between-task reliability of functional Transcranial Doppler (fTCD) for measuring hemispheric speech lateralisation, with a focus on the measurement reliability in individuals with atypically represented speech. fTCD is an ultrasound technology which uses a 2 MHz pulsed sound wave to insonate through areas of temporal bone in order to detect cerebral blood flow velocity (cBFV). Changes in velocities within the middle cerebral arteries can be examined during various cognitive tasks involving speech and language, motor action, perception and visuo spatial processing. Bi-modal recording allows simultaneous measurements to be taken from the left and right sides of the head, meaning the methodology provides a useful role in the cognitive neuroscience of hemispheric lateralisation.The advantages of fTCD are that the technology is quick to administer, very affordable (especially compared to other imaging techniques) and portable. fTCD is highly suitable for use with young children, patient groups and others not able to undergo more invasive or intimidating imaging procedures. As a research tool fTCD is becoming increasingly popular, helped by the recent advances in analysis software available (Badcock et al., 2012).

Previous reports indicate good reliability for measures of speech lateralisation using fTCD(Knecht et al., 1998b; Stroobant and Vingerhoets, 2001), but it is less clear whether this is also the case for individuals who display atypical hemispheric lateralisation. Small sample sizes in previous studies on test re-test reliability (10 and 20 subjects respectively) mean it is difficult to draw conclusions about variance levels within atypical dominance, as none of the subjects in these studies had atypical speech representation. In contrast, between-task reliability for speech lateralisation has been more widely assessed using fTCD (Bishop, Watt and Papadatou-Pastou, 2009; Stroobant, Van Boxstael andVingerhoets, 2011) primarily with a view to ascertaining reliability of child-friendly paradigms designed to probe speech compared to standard verbal fluency tasks used with adults. But lateralisation profiles in these studies are often only reported at the group level, again meaning that judgements about individual variability are difficult to make.

Methods

We obtained language lateralisation indices using fTCD imaging during a word generation task (Knecht et al., 2000a) from 47 healthy adult participants (15 males; aged 18-59 yrs, mean age = 23.5 yrs; SD age = 8.4; 18 right handed and 29 left handed). Hand preference was determined by responses to a 21-item handedness inventory (Flowers and Hudson, 2013), from which handedness quotients were derived using the following formula: (Right – Left) / (Right + Left) *100. Scores above 0 denoted right handedness and scores below 0 denoted left handedness.We deliberately targeted left handed individuals to increase the likelihood of atypical language representation in our sample. The same 47 participants returned to the lab to undergo a second session of fTCD imaging during the same word generation task between 59 and 121 days after session 1 (mean separation was 81 days, SD: 18.2). For 33 of the participants (11 males; mean age = 22.1 yrs; SD age = 5.3; 11 right handed and 22 left handed) lateralisation indices from a second speech production paradigm, animation description (Bishop, Watt and Papadatou-Pastou, 2009)were also obtained during session 1, allowing for a within subjects comparison of task reliability.The reduction in sample size is due to variability in the set-up time between participants, meaning in 15 cases there wasn’t time to run the second speech paradigm. Ethical approval for the work was obtained from University of Lincoln School of Psychology, and all participants gave informed consent. None had neurological or cerebrovascular disorders, or impairments with language or reading; all had normal or corrected to normal vision.

Speech Paradigms

Word Generation: this task involves participants generating words to a single letter cue. Each trial began with a 5 s period in which participants were prompted to clear their mind. A letter was then presented in the centre of the computer screen for 15 s, during which time participants were required to silently generate as many words as possible that began with the letter displayed. (At the onset of the trial a 500 ms epoch marker was simultaneously sent to the transcranial Doppler). Following the generation phase, to ensure task compliance, participants were requested to report the words aloud within a 5 s period. The trial concluded with a 35 s period of relaxation to allow CBFV to return to baseline before the onset of the next trial. The WG paradigm consisted of 23 trials in total. Letter presentation was randomised and no letter was presented more than once to any given participant. The letters ‘Q’, ‘X’ and ‘Y’ were excluded from the task. Within fTCDultrasound research word generation has been used extensively (Knecht et al., 2000a, b; Bishop, Watt and Papadatou-Pastou, 2009; Hodgson and Hudson, 2016) and is widely considered to be a reliable paradigm for determining language dominance in this technique (Knecht et al, 1998b).

Animation Description: this task was developed from the desire to test pre-literate children on speech production tasks (Bishop, Watt and Papadatou-Pastou, 2009), in order to answer questions about the developmental trajectory of hemispheric language lateralisation. The paradigm, (described in detail by Bishop, Badcock and Holt, 2010), requires participants to watch a 12 second cartoon in silence, and then to report what they had seen in the clip at the onset of a question mark ‘speak’ prompt. This ‘speak’ phase lasts for 10 s, which is then followed by a rest phase for 8 s to allow the CBFV signal to return to baseline. The baseline period is taken from the ‘watch’ phase of the paradigm. Each trial lasts 30 s and there are a total of 20 animation clips displayed, in a random order generated by a python based computer script.

fTCD Analysis

Relative changes in cBFV within the left and right Middle Cerebral Arteries (MCAs) were assessed using bilateral fTCD monitoring from a commercially available system (DWL Doppler-BoxTMX: manufacturer, DWL Compumedics Germany GmbH). A 2-MHz transducer probe attached to an adjustable headset was positioned over each temporal acoustic window bilaterally. PsychoPy Software (Pierce, 2007)controlled the speech production paradigms and sent marker pulses to the Doppler system to denote the onset of a trial. Data were analysed off-line with a MATLAB (Mathworks Inc., Sherborn, MA, USA) based software package called dopOSCCI (see Badcock et al., 2012 for a detailed description).Data processing and analysis for the Animation description paradigm was undertaken as per Hodgson, Hirst and Hudson (2016),and the word generation paradigm was analysed as outlined in Hodgson and Hudson (2016).

Speech laterality indices were derived for each participant based on the difference between left and right sided activity within a 2 s window, when compared to a baseline rest period of 10s. The activation window was centralised to the time point at which the left-right deviation was greatest within the period of interest (POI) (Badcock et al., 2012). In the word generation paradigm the POI ranged from 3 – 13 s following presentation of the stimulus letter. For the animation description task the POI ranged from 12 – 22 s following onset of the trial. Speech laterality was assumed to be clear in all cases in which the LI deviated by > 2 SE from 0.Left-hemisphere or right-hemisphere speech dominance was indicated by positive or negative indices respectively. Cases with an LI < 2 SE from 0 were categorised as having bilateral speech representation. Individuals were categorised as having ‘Typical’ speech representation if they displayed a clear LI score which was positive, alternatively individuals with a bilateral LI score or a clear LI score which was negative were categorised as having ‘Atypical’ speech representation(Flowers and Hudson, 2013; Hodgson, Hirst and Hudson, 2016). Participants required a minimum of 75% acceptable trials to be included in the analysis; all participants reached this threshold.

Results

LI scores from the word generation paradigm resulted in 9 individuals classified as atypically lateralised (displaying either right sided activation or activation less than 2 SE from 0; LI scores ranged from -4.43 to 0.81) and the remaining 38 individuals with typical left hemisphere lateralisation (LI scores ranged from 1.19 to 6.61). LI scores from Time 1 (T1) and Time 2 (T2) on the word generation task revealed a strong positive correlation, r (47) = 0.79 p = 0.0001, indicating that fTCD has a good test re-test reliability even after a delay in re-testing of over 120 days (see Figure 1a). During this task 8 individuals with atypical speech laterality at T1 all replicated an atypical lateralisation profile at T2. One individual shifted from a bi-lateral profile at T1 to a right sided bias at T2.

To assess the comparability, rather than just the relationship, between the two measurements taken,a Bland-Altman (B-A) analysis (Altman and Bland, 1983) was conducted. This is a method of quantifying agreement between two quantitative measurements by constructing limits of agreement. These statistical limits are calculated by using the mean and the standard deviations of the differences between two measurements (see Giavarina, 2015 for overview of method). The mean of the differences between each set of measurements is also known as the measurement bias. The bias between LI scores taken from T1 and T2 was -0.17 (B-A standard deviation = 1.67), and the resulting limits of agreement (LOA), allowing for +/- 1.96 standard deviations from the mean LI, were -3.43 (lower LOA) and 3.10 (upper LOA). These figures were calculated as follows: Bias+/- 1.96*SD. The differences between LI scores from T1 and T2can be plotted against the mean of the two measurements, which allows for the investigation of any possible relationship between measurement error and the estimated ‘true’ value. Inspection of the resulting B-A plot (see Figure 2a) indicates that only 3 data points (1 atypically lateralised and 2 typically lateralised) fall outside of the maximum limit of agreement, indicating that these points are more than 1.96 standard deviations from the calculated bias. The majority cluster within the calculated limits, indicating good overall agreement between measurements taken at time points 1 and 2.

Results from the between-task reliability analysis revealed that the animation description speech paradigm classified 8 individuals as atypically lateralised (LI scores ranged from -4.47 to -1.22); however 3 of these cases were participants previously categorised with typical left hemisphere dominance during the word generation task. This deviation in a small number of cases is reflected by a weaker correlation between the animation description LIs and the word generation LIs from T1, r (33) = 0.50 p = 0.003 (see Figure 1b), compared with the test-retest correlation, but at 0.50 it still denotes an acceptable level of agreement between tests.

Bland-Altman analysis on the sets of LI scores from each speech paradigm indicate that the Animation description task mean LI scores deviated by 0.31 (B-A bias; B-A standard deviation = 2.28) from the mean word generation LI scores overall. The calculated limits of agreement were -4.16 (lower LOA) and 4.78 (upper LOA). These figures are greater than in the previous test-retest reliability analysis, which suggests there is increased variance in LI scores between these two tasks. Visual inspection of the resulting B-A plot (see Figure 2b) indicates that only 2 data points fell outside of these calculated limits of agreement, suggesting that, despite the increased variance in LI scores between the two tasks, the agreement between the paradigms on derivedspeech lateralisation scores is still statistically acceptable.

1

RUNNING HEAD: Good reliability in atypical speech lateralisation

Figure 1. a) Plot of test re-test correlation between mean LI scores from the word generation task at test times 1 and 2. b) Plot of the correlation between mean LI scores on the two speech production tasks; Negative values indicate right hemisphere activation and positive values indicate left hemisphere

Figure 2. Bland-Altman plots depicting a)the mean of the laterality indicesfrom test times 1 and 2 (derived from the word generation task) against the difference between the laterality indicesfrom test times 1 and 2; b)the mean of the laterality indicesfrom the word generation task and the animation description task against the difference between the laterality indicesfrom each task. On each plot the solid line represents the bias between the measurements, and the dashed lines represent the upper and lower limits of agreement (which equate to 1.96 standard deviations in either direction). Values falling within these dashed lines indicate acceptable measurement agreement.

1

RUNNING HEAD: Good reliability in atypical speech lateralisation

Conclusions

This study is one of the first todirectly review the reliability of atypical hemispheric speech representation as measured by fTCD. Good agreement was found between two test points when using the word generation paradigm to measure hemispheric lateralisation of speech production, even for individuals with atypical speech lateralisation. This suggests that within-subjects measurements of speech lateralisation using this paradigm are relatively stable across time, providing additional support for the use of this task for deriving lateralisation indices in both typical and atypically lateralised participants (see also Knecht et al 2000a). One potential caveat to this is the unknown impact of the length of time between testing sessions on the reliability results, with a mean of 81 days it was a relativelylong retest duration and therefore may have introduced excess variability to the results. However, it is worth noting that the previous study by Knecht and colleagues (1998b) which measured test-retest reliability on the same paradigm using the same fTCD had a much more variable interval length, ranging from one month to 14 months, and used a significantly smaller sample (n=10). Despite these differences the reliability of the LI scores at T1 and T2 was good in both cases, suggesting that varying retest interval length would not significantly alter the agreement of the LI results.

In addition, comparison of language dominance scores across two speech production tasks showed acceptable between-taskreliability;however there were differences in atypical classification in a small number of cases, and overall reliability and agreement was reduced in comparison to the test-retest analysis. This increased variation in participants’ LI scores likely reflects the different requirements of each task in terms of the level of language construction and subsequent speech output required. The animation description task requires participants to make a linguistically coherent and structuredresponse, in comparison to the simpler phonological and lexical response required by the fluency-based word generationtask. This may explain why there was greater variance in LI scores in the animation description task, reflecting increased cognitive processing. It is possible therefore to conclude that as these tasks are making different requirements on the language network, they should not be interchanged experimentally for purposes of language lateralisation research, without theoretical justification. That view, however, over-simplifies the point of using different speech paradigms for assessment of hemispheric dominance, part of which is to find robust speech paradigms that tap into the wide range of language processes in order to examine in more detail whether particular aspects of language produce different lateralisation patterns. As such, the need for data on the relative comparability of cortical responses between paradigms is very necessary. Furthermore, the original motivation behind the development of the animation description task wasspecifically to elicit robust speech responses in pre-literate young children, in order to gain insight into age related changes in speech lateralisation(Bishop et al., 2009; see also Hodgson et al, 2016). The authors themselves note that the task requirements of that task do vary from the more widely used word generation paradigm (Bishop et al., 2009), but felt this was an acceptable variation in order to address developmental questions of speech processing.