The Ethnograph v5.03

In terms of introduction

The developer of The Ethnograph is the American sociologist John Seidel. The impetus for writing the software was created when John was working on his qualitative Ph.D. thesis and had lots of material that needed to be analyzed. From his experience with statistics programs on main frame computers, John already had developed some programming skills, and he started to use them to facilitate the analysis of his qualitative data. The result of this was The Ethnograph version 1.0, which was intended solely for personal use. People who observed John working with it realized the potential usefulness of such software and encouraged John to develop it further. At first it was only friends and friends of friends who also started to use the software, but with the development of version 2.0, the software became commercially available, and a new enterprise was born. All of this occurred during the early 1980s.

John Seidel thus can be regarded as one of the pioneers of QDA software, whom a number of other developers followed a few years later. It should also be stated that the developers of QDA software, including John and all the others that followed, have their own ideas about qualitative data analysis and this is reflected in the different designs of the various programs. For John, qualitative data analysis is composed, essentially, of three parts:

  1. Noticing things,
  2. Collecting instances of these things and
  3. Thinking about these things.

These elements can be described as follows:

  1. When reading your data, you will start noticing things. You are likely to mark those things in the margins and add a few notes.
  2. Over time, you will find other instances of the same noteworthy things in your data and you will start collecting those things under the same name. When you have reached this phase you are already in the middle of the coding process, which becomes more and more refined the more the analysis progresses.
  3. The next logical step is to think about those things you have noticed and collected. For this to be possible you need to find and retrieve these instances in your data. In this process the underlying structure of your data will become more and more obvious. You will be able to see sequences, patterns, hierarchies, wholes, etc. which have been hidden before in the masses of your data. While you are thinking about your data, you might want to go back to the original not-fragmented text, you might want to re-code some passage, add new code words or get rid of some old ones. A computer program has to accommodate for all of this by allowing for the kind of flexibility that is inherent in qualitative data analysis.

For a more detailed description of the "John Seidel method" see http://www.qualisresearch.com. Then choose the link QDA paper.

This is how John has translated these ideas into the software package The Ethnograph:

  1. Noticing things = Creating a project and reading your data files

If you want to use a software package for analyzing your qualitative data, then the preparation of one or more data files will be your first task after having collected the data. Your data can be prepared within any of the commonly used word processors like MS Word, WordPerfect, AmiPro, etc. In order to work with The Ethnograph it is necessary to format your data in a particular way. An Ethnograph data file has a 40 character line and hanging paragraph indents. All of your data must be saved as ASCII text files. If you have already transcribed your data or you feel that the formatting business is too complicated for you, you can also use the Ethnograph Editor for the purpose of creating your data files. Please also refer to the Tips & Tricks section: Important notes on transcribing your data.

After you have prepared your data files, the next step is to create a project and to import your data files into the program. If you choose to type your data directly into the Ethnograph Editor, you need to create a project first. Importing a data file means that each line of the data file is numbered consecutively. Now you can embark on the task of reading your data files, either on a hard copy or on-screen.

  1. Collecting things = Coding data files

When reading through your data, soon you will start noticing multiple instances of certain occurrences and you will want to mark them with a code word. With the help of the line numbers, you can identify the start and stop lines of these data segments and assign "code words" to them. This can be done either on a paper copy or directly on screen, using the Code a Data File Procedure. The smallest size of a data segment you can code is one line (40 characters), and the longest range you can code can be up to 9,999 lines long. 9,999 lines (the maximum length of a data file), which is approx. 200 pages of text. It is unlikely that you will have such long data files. A good rule of thumb is that a one-hour interview is about 20-30 pages in length. Each coded segment can be defined by up to 12 code words and these code words can be nested or overlapped up to 7 levels deep.

  1. Thinking about things - Searching data files

Coded data can be searched, retrieved and displayed in a number of ways. A search operation can be very simply, i.e. only being based on single code words. It is however also possible to built more complicated search requests that consist of a string of up to five code words. The code words of such search strings can be linked by "or" and "not" operators. An "and" operator can be simulated by letting run a few "or" and "not" searches.

The retrieval of relevant text segments makes it easier to see things in your data and to think about them. It for example helps the researcher to recognize relationships that previously were unnoticed because they were disguised by too much noise in the data.

Features of The Ethnograph v5.03

Coding

It is possible to code your data interactively. This means that you can view and read the text of your data files on screen while you are coding. You can choose between two modes of entering code words: the Code Set and the Quick Code Methods. There is not really a rule governing when to use which mode. It depends on your personal style of coding and your preferences. While working with The Ethnograph, you will quickly find out which mode suits you best.

For each coded segment the boundaries are marked in the margin and the code word appears one line above the coded text segment. It is also possible to print the coded version of your data file including the code words and the boundaries in the margin.

Code words can be conveniently selected from either the Code Book or the Tree View. Both are automatically created and updated when coding. Within the Coding Window a simple text search and code search function is available.

Memos

Three types of memos are available: Text Memos, Project Memos and File Memos. You can attach up to 26 memos to each line in a data file, up to 1000 memos to a project and up to 1000 memos to each data file. Each memo can be up to 32 pages in length and is date and time stamped. In addition, the author of a particular memo can be indicated (important for teamwork), memo categories can be defined and each memo can be marked by three code words. The memo categories, code words, line numbers, date and time of creation/modification, author's name and memo title can be used as sorting devices.

All memos can be accessed from a variety of places in the project. Text memos for example are identified by the letter M displayed next to the first line number of the text segment to which the memo refers. A double click on the M opens the memo window. Further, text memos can be chosen as a form of output in the search procedure (see below).

Code Book

When coding your data files, a Code Book is automatically created. As you can only use code words with a maximum length of 10 characters (this is a necessary restriction, otherwise it would be impossible to display all of your code words within the text), you can use the Code Book to write longer definitions for your code words. You can enter a definition of up to 500 characters. This is roughly half a page in length. Writing such a definition forces you to be precise about the code words you use - something which is essential when you are working in a team. In addition, it helps you to recollect the various stages of the analysis process when you are at the stage of report writing.

A further option offered by the Code Book is to structure your coding schema by assigning parent and child codes. This has an effect on how the coding schema is displayed in the Tree View.

Tree View

The tree view displays the coding structure in form of a side-ways tree (similar to the display of data files and folders in the Windows Explorer). You can either just view the tree or both the code book and the tree side by side. The structure of the tree is based on your definition of parent and child codes. The parent codes are higher level codes and the child codes are lower level codes. A code can be both a parent code of a subgroup of codes and a child code of a superordinate code.

Parent and child codes can either be assigned in the Code Book or simply via dragging and dropping in the Tree View. By collapsing or opening the various branches of the Tree View, it is possible to focus one's view on just a particular area of the coding schema. This can also be helpful when selecting codes from the Tree View in the Code and Search Procedure.

Editor

The Editor supports the user in creating and re-formatting data files in The Ethnograph format. Thus, the special data file format required by The Ethnograph is no longer a problem especially if one considers the additional functionalities this format offers to the user (see Search Procedure and Search Filters and Variables).

The Editor is not a full-blown word processing package but it offers a spell checker.

The Search Procedure

The Ethnograph Search Tool nicely integrates the various search options and filters that can be used in a search. You can search for segments coded by a single code word or for segments coded by multiple code words. When you use multiple code words in a search, the segments you are looking for can be overlapped or nested or you can specify a range of lines in which they are supposed to co-occur (proximity search). When looking for overlapping codes, it is possible to either generate a search output that only shows the overlapping lines (Small Picture View) or all of the lines coded by the code words used in the search (Big Picture View).

It is also possible to define queries that look for text segments coded by code word "A" but NOT by code word "B". Another option is to look for sequences in your data, i.e. you are interested in whether certain parts of events always occur in the same sequence, first A, then B and then C; or whether certain stories are always told in the same way, first A, then C and then B. If you have formulated a number of complex queries, you can save the Multiple Code Word Entry Screen and call it up again for later searches. It is also possible to edit the saved queries.

The Parent/Child structure of the coding schema allows you to use the equivalent of a semantic DOWN operator (i.e. CODE A with kids).

In addition to codes, speaker and/or section identifiers can also be used in a Search (e.g. give me all speech turns of the focus group participant Angie; give me all text segments marked by the identifier "event 9/11"). The Identifier Search is one of the functionalities that becomes possible due to the special formatting requirements.

Search Filters and Variables

Identifiers, Face- and Identifier Sheet Variables and File Codes can be used as filters in a search. As mentioned above, Identifiers are defined in the process of transcribing your data. A common use of identifiers is to mark speaker turns, e.g. the name of the interviewee or the names of participants in a focus group. Those identifiers are referred to as Speaker Identifiers. If your data are, for example, letters or newspaper articles, you can identify them by date, source, etc. These types of identifiers are called Section Identifiers. When employed as part of a search, both Speaker and Section Identifiers can be used as if they were code words.

For each of the identifiers you can enter face or identifier sheet variables. These variables are likely to be demographic variables like gender, age, occupation, and so on. For your practical work with The Ethnograph this means that you can, for instance, retrieve all comments about a particular topic that have been made by all of your female interviewees between the ages of 20 and 30 with an income of at least $30,000 per year. If you define such a query, all of the above described search options can also be applied.

A new type of filter are file codes. File Codes are similar to Face Sheet Variables as they are attached to an entire data file. You could for example organize your data files by attaching a file code MALE to all data files with male interviewees and a file code FEMALE to data files with female interviewees. File codes like face sheet variables can be combined by AND, OR and NOT operators. The difference between file codes and face sheet variables is that you can assign up to 16,000 file codes to a data file but only 40 face sheet variables. Face sheet variables however allow you a more fine grained search in that they allow you to use continuous numeric variables (e.g. all text segments coded by Code A for all females age 25 - 32).