Using the Treatment of Mild Hypertension Study (TOMHS) Data

The TOMHS study data is provided as a text file organized by column. This means each row is an observation and each variable is allotted one or more adjacent columns. Organizing data by column allows users to select a subset of variables to input into SAS. In order to make sense of column-organized data, however, you need a data dictionary—a file that tells you which columns are allotted to each variable and the meaning of each variable.

The TOMHS data dictionary is provided as an Excel spreadsheet, datadictionary.xls. The data dictionary spreadsheet is organized so that each row provides the information for a single variable. The seven columns provide different kinds of information about the variable. The # column tells the order of the variables as they appear in the data file. The Variable column lists the variable name. The Type column notes whether the variable is text (Char) or a number (Num). The Length column tells how many columns the variable is allotted. The Pos column notes the first column that is provided for the variable. The Informat column lists the SAS format of the variable. The Description column gives a brief explanation of the meaning of the variable.

To identify a variable, you need two pieces of information: the first column of the variable (provided by the Pos column), and the total number of columns the variable uses (provided by either the Length column or the Informat column). For example, the RANDDATE variable begins in column 10 and has a length of 10. So columns 10-19 are used by the RANDDATE variable. Alternatively, if SAS knows that the RANDDATE variable begins in column 10 and that its format is mmddyy10., then SAS can figure out that RANDDATE is in columns 10-19 since it knows how many characters long an mmddyy10.-formatted variable can require.

For example:

DATA tomhs;

INFILE 'C:\SAS_Files\tomhs.data';

INPUT @1 PTID $10.

@12 CLINIC $1.

@14 RANDDATE MMDDYY10.

@25 GROUP 1.

@27 AGE 1.;

RUN;

Some of the variables record responses to multiple choice questions. In order to interpret the meaning of the responses, you may need to consult the sample questionnaire (hypforms.pdf). This document lists the questions and answer choices.