COMPUTER-ASSISTED TEXT ANALYSIS

Hints on how to use relevant software.

1.  INPUT FILE FORMAT

  1. Most software packages assume ASCII text files (with the suffix .txt), and it is safest to keep at least one version of your files in this format.
  2. To convert a WORD file, either …
  3. File -> Save As -> text only or …
  4. Select ALL -> Edit -> Copy [to put on clipboard], and then open Notepad, (All Programs -> Accessories -> Notepad), Paste Clipboard context and Save file .
2.  USING HAMLET for Wordlists, KWIC and Co-occurrence
  1. WORDLISTS: Open HAMLET, click on Tools and on the drop-down list, click on Wordlist. Then, in the window for Wordlist.
  2. Enter Filename to be analyzed
  3. Choose by clicking either
  4. descending frequency (giving rank order of words), or
  5. forward alphabetic order .
  6. Click on Create a Wordlist.
  7. The results [Word, Frequency, %age in text Plus type/token ratio] can be Saved
  1. KWIC: As above: Tools -> KWIC
  2. Enter Filename to be analyzed
  3. Enter search-word (or phrase) you want to be identified and retrieved
  4. n.b. “Display context-length in lines” does not currently work: it assumes and gives 1 line
  5. Click Create Key-Word-In-Context list
  6. Results can be Saved
  1. CO-OCCURRENCE ANALYSIS.
  2. Click on Hamlet on the toolbar
  3. Enter the text-file name you wish to analyze.
  4. For using the co-occurrence sub-program, you must have a Vocabulary List (VL) which specifies category names and their associated instances or “synonyms”. This means you have either to:
  5. enter an existing Vocab List file in “Vocabulary File Name”
  6. edit an existing Vocab List (by clicking on “Vocabulary” on the toolbar and clicking on “ Edit existing Vocabulary List”)
  7. create a Vocabulary list from scratch, and save it before running the program. The new VL is created as follows:
  8. (by clicking on “Vocabulary” on the toolbar and clicking on “Create existing Vocabulary List”)
  9. on the resulting drop-down list (Set the options List), accept (or change) the options and click on Continue
  10. <insert window capture> The resulting screen has four columns (left to right)
  11. File/folder list
  12. Name of the new VL (initially NewFile)
  13. an itemised (“laddered”) column (for the name of the category)
  14. an itemised (“laddered”) column (in which the instances/synonymns for each category are listed.
  15. Type in your first category (e.g. ANIMAL) in column (iii) then move the pointer over on to column (iv) and type in an instance (e.g. cow). If there are several instances/synonymns, point to the next entry in the laddered list, and type in the next (horse). Continue until your instances are all entered in column (iv), then move back to column (iii) and type in your next Category (e.g. FISH) and insert its instances in column (iv). Note that the program alphabetizes the list of categories as you enter them.
  16. When your VL is complete, click on NewFile in Column (ii), and you will be invited to save it, and taken to the Folder where it will reside. Here you give it a name, and save it.
  17. Now you are ready to run the Co-occurrence program (called “Hamlet” on the Toolbar).
  18. Enter the text file name
  19. Enter the VL name and
  20. Click on “Count Joint Frequencies …” bar at the bottom of the window.
  21. You will then get an error saying you have not yet chosen a context unit [I haven’t yet found how to set it, other than…]
  22. Click on OK, and the resulting drop-down list will give you that opportunity. This choice is an important one, as widening the context will usually increase the co-occurrences … often a good thing! I usually begin by using the sentence as the context unit.
  23. It is safest to leave the Jaccard coefficient, unless you know what you are doing by including negative matches!
  24. Click on Continue, which returns you to the ‘Hamlet Joint Frequencies of Words in a Text’ window, and click on “Count Joint Frequencies for specified vocabulary” bar, and
  25. The results follow.
  26. You may now choose various options by clicking, including:
  27. Hierarchical Clustering
  28. Smallest Space (tho I cannot get this to run yet! – run file only has header and matrix file is empty!)