Measuring Lexical Difficulty (by Speaker) in LDS General Conference Talks

Deseret Language and Linguistics Symposium 2004

Mark Davies

BYU Dept. of Linguistics and English Language

http://davies-linguistics.byu.edu

1. Database / methodology

· Comparison corpus: thirty years of talks from General Conference (1971-2000)

· Nearly 4,000,000 words (“tokens”); 20,300 separate word forms (“types”)

· Comparison to “Standard English” (British National Corpus: 100 million words)

Table 1. Comparing Standard English (BNC) with “General Conference” English

Word / GenConf / BNC / Value / word / GenConf / BNC / Value
quorums / 64 / 0.1 / 640.0 / dwell / 52 / 1 / 52.0
counseled / 50 / 0.1 / 500.0 / heritage / 46 / 1 / 46.0
priesthood / 959 / 2 / 479.5 / cometh / 46 / 1 / 46.0
revelator / 29 / 0.1 / 290.0 / pioneers / 109 / 3 / 36.3
premortal / 26 / 0.1 / 260.0 / atoning / 36 / 1 / 36.0
succor / 24 / 0.1 / 240.0 / verily / 33 / 1 / 33.0
revelators / 21 / 0.1 / 210.0 / giveth / 25 / 1 / 25.0
mortal / 153 / 1 / 153.0 / angels / 70 / 3 / 23.3
gratitude / 132 / 1 / 132.0 / supernal / 23 / 1 / 23.0
ordained / 114 / 1 / 114.0 / acquainted / 22 / 1 / 22.0
telestial / 9 / 0.1 / 90.0 / miracle / 65 / 3 / 21.7
repentance / 195 / 3 / 65.0 / lamb / 42 / 2 / 21.0

· Grouped together in “word families” (cf. Nation 2000)

cf. courage, [encourage. encouraged, encouraging], encouragement, discouraged

· Create list of 2500 most frequent word families in General Conference

Table 2. Frequency listing of common “word families” in General Conference

Rank / word / rank / Word / Rank / word / rank / word
1 / THE / 11 / A / 200 / PRINCIPLE / 210 / BEST
2 / OF / 12 / THEY / 201 / MIND / 211 / STRENGTH
3 / BE / 13 / HAVE / 202 / MIGHT / 212 / MOTHER
4 / AND / 14 / YOU / 203 / SOUL / 213 / ACT
5 / TO / 15 / FOR / 204 / BUILD / 214 / LAW
6 / THIS / 16 / IT / 205 / MORMON / 215 / LIGHT
7 / WE / 17 / AS / 206 / BISHOP / 216 / OLD
8 / HE / 18 / WITH / 207 / COUNSEL / 217 / GET
9 / IN / 19 / NOT / 208 / READ / 218 / STATE
10 / I / 20 / DO / 209 / RESPONSIBLE / 219 / HAPPY

· Notice that quite simplistic in terms of part of speech disambiguation and lemmatization (e.g. play, means, record)


2. Calculating the score for each talk

· Select four most recent talks by each member of the First Presidency (n=3) and the Quorum of the Twelve Apostles (=12)

· Total of 60 talks; each one about 1600-2000 words (about 15-minutes)

· For each talk, enter it into a web-based form that is connected to the GenConf frequency database

· Measures of “lexical difficulty”

1) Coverage -- what percent of the words that they used were found in the top 1500 “word families”

2) Difficulty -- where the words fall within the 1500 word list

Table 3. Calculating lexical difficulty

Frequency range / Score
1-300 / 0
301-600 / 1
601-900 / 2
901-1200 / 3
1201-1500 / 4
Not in top 1500 / 5

Figure 1. Relatively simple talk, in terms of lexis


Figure 2. Relatively complex talk, in terms of lexis

Some considerations:

· Quotations from literature

· “Scriptural language”

· Topic (e.g. Elder Ballard [Oct 2004] talk dealing with technology

3. Results

Figure 3. Clustering by speaker

Comments:

· Importance (again) of topic -- e.g. Monson (with/out literary quotations)

· Importance of register -- Haight is spoken, vs. all others written

· Clear outliers (Maxwell = complex, Eyring = simplified)

· No clear relationship to education level (i.e. Eyring = PhD, others with advanced degrees)

· Is there a register of GCS: “General Conference Speak”?

· Pres. Hinckley as the prototype -- others (subconsciously??) clustered around him