WORKSHOP:

Teaching technical writing with data-driven learning

Ryan K. Boettger & Stefanie Wulff

Practicing Corpus Queries

  1. Load the untagged BNC sample files. Create a concordance of technical. Sort it by the right-hand context.What kinds of nouns does technical modify?
  2. Go to the “Word List” ribbon and create a frequency list of the BNC sample you have currently loaded.
  3. How many tokens/types are in this sample?
  4. What are the most frequent words?
  5. Go to the “Collocates” ribbon and enter technical in the search window. Set the collocate parameter to 0 for the left-hand collocates, and 1 for the right-hand collocates.
  6. What are the top collocates?
  7. How does the list of top collocates change when you sort by frequency vs. a statistical association measure? Which output do you find more meaningful?
  8. Load the critical reviews of the TWP sample as a reference corpus, then go to the “Keyword List” ribbon and create a keyword list.
  9. What are the most distinctive words for the general, British English corpus in contrast to the critical reviews written by American English students?

Clear the keyword list settings and unload all corpus files (or simply close AntConc and open it again).

  1. Load the critical reviews of the TWP sample. First, create a word list.What are the most frequent words in the critical reviews?
  2. Create a concordance of *ly and sort the output by the search term, R1, and R2.
  3. What are the top adverbs in the critical reviews?
  4. Unload the critical reviews and load the white papers of the TWP sample instead. Repeat your search. What adverbs do you see now? How does the output differ?
  5. Add the critical review to the corpus files loaded so that you now have the entire TWP sample loaded (104 files). Create a word list.
  6. How many types/tokens does this sample contain?
  7. What is the most frequent lexical noun?
  8. Look for L1 collocates of research using the “Collocates” function.
  9. Sort by the keyness statistic. What are the most strongly associated collocates of research?
  10. Sort by frequency instead. How does the output differ?
  11. Taking advantage of the * (which means “0 or more of anything”), how could one look for instances of passive voice?
  12. How many hits does your search expression retrieve?
  13. Do you retrieve a lot of false hits (i.e., instances that do not contain any passives)?
  14. What hits does your search expression not retrieve that it should?
  15. How could one look for instances of split infinitives (like to boldly go …)?
  16. How many hits does your query retrieve?
  17. Do you retrieve a lot of false hits (i.e., instances that do not contain any split infinitives)?
  18. What hits does your search expression not retrieve that it should?
  19. Look for instances of “naked” this plus copula be in sentence-initial position (i.e., look for This is).
  20. How many hits do you retrieve?
  21. Go to the “Concordance Plot” ribbon and look at where in the texts the instances occur. Do the instances cluster in a specific part of the texts?
  22. Go back to the concordance display and click on any concordance line. This will get you to the “File View” ribbon and show you the larger context in which the instance occurs. As you examine the instances more closely, what meta-discoursal function, if any, does This is seem to perform in the students’ writing?
  23. Repeat the search with This means instead. Is This means as frequent as This is? Does it occur in the same place in texts? Looking at the contexts more closely, can you make out a function that This means performs?

Clear the keyword list settings and unload all corpus files (or simply close AntConc and open it again).

  1. Load the BNC sample that is tagged for parts of speech. (You should have 370 files loaded.)
  2. Create a word list. How many tokens/types does this sample contain? What are the top ten words?
  3. Create a concordance of research. How many hits do you retrieve?
  4. How can you take advantage of the POS-tags and regular syntax to only find hits of research as a verb? (Consult the list of POS-tags in the BNC below).
  5. How can you use the POS-tags and regular syntax to find all adverbs in the sample? How many hits do you retrieve?
  6. Can you write a search expression that finds all adverbs ending in –ly, and only those hits? How many hits does your search expression retrieve? (Tip: you will have to combine POS-tags and regular expressions – see below!).
  7. Can you combine POS-tags and regular syntax to write a more accurate search expression that finds instances of passive voice (and no false hits)?
  8. Can you combine POS-tags and regular syntax to write a more accurate search expression that finds instances of split infinitives (and no false hits)?
  9. Can you combine POS-tags and regular syntax to write a more accurate search expression that finds instances of the naked This? Can you maybe even write a search expression that finds This followed by any verb, not just a form of be (like This means, This implied, and This encourages)?
  10. Sort the concordance by the right-hand context. What are the most frequent verbs occurring with the naked This?
  11. How could you look for instances of attended This, i.e. case where This is followed by a noun phrase (as in This example is boring)?

Regular Expressions

Activate the “Regex” button on top of the search window when integrating these into your search expression in AntConc.

*0 or more / (a|b)a or b
+1 or more / \swhite space
?0 or 1 / \bword boundary
.any character (except a new line) / \wword character

Part-of-speech (POS)-tags in the British National Corpus (BNC)

AJ0adjective (general or positive) e.g. good, old

AJCcomparative adjective e.g. better, older

AJSsuperlative adjective, e.g. best, oldest

AT0article, e.g. the, a, an, no. Note the inclusion of no: articles are defined as determiners which typically begin a noun phrase but cannot appear as its head.

AV0adverb (general, not sub-classified as AVP or AVQ), e.g. often, well, longer, furthest. Note that adverbs, unlike adjectives, are not tagged as positive, comparative, or superlative. This is because of the relative rarity of comparative or superlative forms.

AVPadverb particle, e.g. up, off, out. This tag is used for all prepositional adverbs, whether or not they are used idiomatically in phrasal verbs such as come out here, or I can’t hold out any longer.

AVQwh-adverb, e.g. when, how, why. The same tag is used whether the word is used interrogatively or to introduce a relative clause.

CJCcoordinating conjunction, e.g. and, or, but.

CJSsubordinating conjunction, e.g. although, when.

CJTthe subordinating conjunction that, when introducing a relative clause, as in the day that follows Christmas. Some theories treat that here as a relative pronoun; others as a con-junction. We have adopted the latter analysis.

CRDcardinal numeral, e.g. one, 3, fifty-five, 6609.

DPSpossessive determiner form, e.g. your, their, his.

DT0general determiner: a determiner which is not a DTQ e.g. this both in This is my house and This house is mine. A determiner is defined as a word which typically occurs either as the first word in a noun phrase, or as the head of a noun phrase.

DTQwh-determiner, e.g. which, what, whose, which. The same tag is used whether the word is used interrogatively or to introduce a relative clause.

EX0existential there, the word there appearing in the constructions there is..., there are ....

ITJinterjection or other isolate, e.g. oh, yes, mhm, wow.

NN0common noun, neutral for number, e.g. aircraft, data, committee. Singular collective nouns such as committee take this tag on the grounds that they can be followed by either a singular or a plural verb.

NN1singular common noun, e.g. pencil, goose, time, revelation.

NN2plural common noun, e.g. pencils, geese, times, revelations.

NP0proper noun, e.g. London, Michael, Mars, IBM. Note that no distinction is made for number in the case of proper nouns, since plural proper names are a comparative rarity.

ORDordinal numeral, e.g. first, sixth, 77th, next, last. No distinction is made between ordinals used in nominal and adverbial roles. next and last are included in this category, as general ordinals.

PNIindefinite pronoun, e.g. none, everything, one (pronoun), nobody. This tag is applied to words which always function as heads of noun phrases. Words like some and these, which can also occur before a noun head in an article-like function, are tagged as determiners, DT0 or AT0.

PNPpersonal pronoun, e.g. I, you, them, ours. Note that possessive pronouns such as ours and theirs are included in this category.

PNQwh-pronoun, e.g. who, whoever, whom. The same tag is used whether the word is used interrogatively or to introduce a relative clause.

PNXreflexive pronoun, e.g. myself, yourself, itself, ourselves.

POSthe possessive or genitive marker ’s or ’. Note that this marker is tagged as a distinct word. For example, Peter’s or someone else’s is tagged <w NP0>Peter<w POS>'s <w CJC>or <w PNI>someone <w AV0>else<w POS>'s.

PRFthe preposition of . This word has a special tag of its own, because of its high frequency and its almost exclusively postnominal function.

PRPpreposition, other than of , e.g. about, at, in, on behalf of, with. Note that prepositional phrases like on behalf of or in spite of are treated as single words.

TO0the infinitive marker to.

UNCunclassified items which are not appropriately classified as items of the English lexicon. Examples include foreign (non-English) words; special typographical symbols; formulae; hesitation fillers such as errmin spoken language.

VBBthe present tense forms of the verb be, except for is or ’s am, are ’m, ’re, be (subjunctive or imperative), ai(as inain’t).

VBDthe past tense forms of the verb be, was, were.

VBG-ing form of the verb be, being.

VBIthe infinitive form of the verb be, be.

VBNthe past participle form of the verb be, been.

VBZthe -s form of the verb be, is, ’s.

VDBthe finite base form of the verb do, do.

VDDthe past tense form of the verb do, did.

VDGthe -ing form of the verb do, doing.

VDIthe infinitive form of the verb do, do.

VDNthe past participle form of the verb do, done.

VDZthe -s form of the verb do, does.

VHBthe finite base form of the verb have, have, ’ve.

VHDthe past tense form of the verb have, had, ’d.

VHGthe -ing form of the verb have, having.

VHIthe infinitive form of the verb have, have.

VHNthe past participle form of the verb have, had.

VHZthe -s form of the verb have, has, ’s.

VM0modal auxiliary verb, e.g. can, could, will, ’ll, ’d, wo (as in won’t)

VVBthe finite base form of lexical verbs, e.g. forget, send, live, return. This tag is used for imperatives and the present subjunctive forms, but not for the infinitive (VVI).

VVDthe past tense form of lexical verbs, e.g. forgot, sent, lived, returned.

VVGthe -ing form of lexical verbs, e.g. forgetting, sending, living, returning.

VVIthe infinitive form of lexical verbs , e.g. forget, send, live, return.

VVNthe past participle form of lexical verbs, e.g. forgotten, sent, lived, returned.

VVZthe -s form of lexical verbs, e.g. forgets, sends, lives, returns.

XX0the negative particle not or n’t.

ZZ0alphabetical symbols, e.g. A, a, B, b, c, d.

PULleft bracket (i.e. ( or [ )

PUNany mark of separation ( . ! , : ; - ? ... )

PUQquotation mark ( ‘ ’)

PURright bracket (i.e. ) or ] )

References and Resources

British National Corpus:

Technical Writing Project:

AntConc:

1/4