Computer-Assisted Text Analysis

COMPUTER-ASSISTED TEXT ANALYSIS

Hints on how to use relevant software.

1. INPUT FILE FORMAT

Most software packages assume ASCII text files (with the suffix .txt), and it is safest to keep at least one version of your files in this format.
To convert a WORD file, either …
File -> Save As -> text only or …
Select ALL -> Edit -> Copy [to put on clipboard], and then open Notepad, (All Programs -> Accessories -> Notepad), Paste Clipboard context and Save file .

2. USING HAMLET for Wordlists, KWIC and Co-occurrence

WORDLISTS: Open HAMLET, click on Tools and on the drop-down list, click on Wordlist. Then, in the window for Wordlist.
Enter Filename to be analyzed
Choose by clicking either
descending frequency (giving rank order of words), or
forward alphabetic order .
Click on Create a Wordlist.
The results [Word, Frequency, %age in text Plus type/token ratio] can be Saved

KWIC: As above: Tools -> KWIC
Enter Filename to be analyzed
Enter search-word (or phrase) you want to be identified and retrieved
n.b. “Display context-length in lines” does not currently work: it assumes and gives 1 line
Click Create Key-Word-In-Context list
Results can be Saved

CO-OCCURRENCE ANALYSIS.
Click on Hamlet on the toolbar
Enter the text-file name you wish to analyze.
For using the co-occurrence sub-program, you must have a Vocabulary List (VL) which specifies category names and their associated instances or “synonyms”. This means you have either to:
enter an existing Vocab List file in “Vocabulary File Name”
edit an existing Vocab List (by clicking on “Vocabulary” on the toolbar and clicking on “ Edit existing Vocabulary List”)
create a Vocabulary list from scratch, and save it before running the program. The new VL is created as follows:
(by clicking on “Vocabulary” on the toolbar and clicking on “Create existing Vocabulary List”)
on the resulting drop-down list (Set the options List), accept (or change) the options and click on Continue
<insert window capture> The resulting screen has four columns (left to right)
File/folder list
Name of the new VL (initially NewFile)
an itemised (“laddered”) column (for the name of the category)
an itemised (“laddered”) column (in which the instances/synonymns for each category are listed.
Type in your first category (e.g. ANIMAL) in column (iii) then move the pointer over on to column (iv) and type in an instance (e.g. cow). If there are several instances/synonymns, point to the next entry in the laddered list, and type in the next (horse). Continue until your instances are all entered in column (iv), then move back to column (iii) and type in your next Category (e.g. FISH) and insert its instances in column (iv). Note that the program alphabetizes the list of categories as you enter them.
When your VL is complete, click on NewFile in Column (ii), and you will be invited to save it, and taken to the Folder where it will reside. Here you give it a name, and save it.
Now you are ready to run the Co-occurrence program (called “Hamlet” on the Toolbar).
Enter the text file name
Enter the VL name and
Click on “Count Joint Frequencies …” bar at the bottom of the window.
You will then get an error saying you have not yet chosen a context unit [I haven’t yet found how to set it, other than…]
Click on OK, and the resulting drop-down list will give you that opportunity. This choice is an important one, as widening the context will usually increase the co-occurrences … often a good thing! I usually begin by using the sentence as the context unit.
It is safest to leave the Jaccard coefficient, unless you know what you are doing by including negative matches!
Click on Continue, which returns you to the ‘Hamlet Joint Frequencies of Words in a Text’ window, and click on “Count Joint Frequencies for specified vocabulary” bar, and
The results follow.
You may now choose various options by clicking, including:
Hierarchical Clustering
Smallest Space (tho I cannot get this to run yet! – run file only has header and matrix file is empty!)