A CONCISE GUIDE TO THE SAS STATISTICAL PACKAGE
Version 9.3 and 9.4
Professor Thornton
Economics 515
Econometrics
INTRODUCTION
This guide provides an overview of the SAS statistical package and an explanation of a number of useful SAS commands and capabilities. It does not explain all SAS commands and capabilities. SAS is an extremely powerful statistical package, and if you desire to learn more about what it can do you should consult the appropriate SAS Users Manual or purchase one of the many SAS companion books available in bookstores that provide a more detailed explanation about various facets of the SAS system.
DATA SETS
In this guide, SAS commands are explained in the context of examples. The examples are based on the following eight data sets. It is assumed that each data set is contained in a file on a memory stick in drive E. If your data files are on drive C, or on a memory stick located in a different drive such as drive F, modify the examples below accordingly (e.g., replace the letter E with the letter C or F).
WAGEDATA
The data file DATA7-2 comes with the Ramanathan econometrics text book. It consists of a cross-section of 49 workers. The variables are WAGE = monthly wage, EDUC = years of education beyond the eighth grade, EXPER = years of experience, AGE = age of worker, GENDER = indicator variable for gender (1 if male, 0 if female), RACE = indicator variable for race (1 if white, 0 if nonwhite), CLERICAL = indicator variable for clerical worker (1 if clerical worker, 0 otherwise), MAINT = indicator variable for maintenance worker (1 if maintenance worker, 0 otherwise), CRAFTS = indicator variable for crafts worker (1 if crafts worker, 0 otherwise).
CPS85
The data file CPS85 consists of 526 randomly selected employed workers from the May 1985 current population survey conducted by the Department of Commerce. This is a survey of over 50,000 households conducted monthly, and it serves as the basis for the national employment and unemployment statistics. The variables are: ED = years of education, SOUTH = dummy variable (1 if worker lives in south, 0 otherwise), NONWH = dummy variable (1 if worker is nonwhite, zero otherwise), HISP = dummy variable (1 if worker is Hispanic, 0 otherwise), FE = dummy variable (1 if worker is female, 0 otherwise), MARR = dummy variable (1 if worker is married with spouse present in household, 0 otherwise), MARRFE = dummy variable (1 if worker is married female with spouse present in household, 0 otherwise), EX = years of labor market experience, UNION = dummy variable (1 if worker has union job, 0 otherwise), WAGE = average hourly earnings in constant 2003 dollars, AGE = age in years, MANUF = dummy variable ( 1 if worker works in manufacturing industry, 0 otherwise), CONSTR = dummy variable ( 1 if worker works in construction industry, 0 otherwise), MANAG = dummy variable (1 if worker is managerial or administrative, 0 otherwise), SALES = dummy variable (1 if worker is in sales, 0 otherwise), CLER = dummy variable ( 1 if worker is clerical worker, 0 otherwise), SERV = dummy variable (1 if worker is a service worker, 0 otherwise), PROF = dummy variable (1 if worker is professional or technical, 0 otherwise),
MACROCON
The data file MACROCON consists of a times-series of annual data for the period 1959 to 1995. The variables are YEAR = year, CONS = annual consumption spending in billions of dollars, DISINC = annual disposable income in billions of dollars, PRICE = consumer price index, PRIME = the prime interest rate, UN = unemployment rate.
DEMAND
The data file DEMAND consists of prices and quantities purchased of three goods, and income, for a cross section of 30 individual consumers. These data are simulated, not real world data. The variables are: Q1 = quantity purchased of good 1, Q2 = quantity purchased of good 2, Q3 = quantity purchased of good 3, P1 = price of good 1, P2 = price of good 2, P3 = price of good 3, I = consumer income.
PRODUCER
The data file PRODUCER consists of cross-section data for 92 dairy farm households for the year 1986. These data were obtained from a random sample of Utah dairy farmers in five counties that were the major dairy production centers. The variables are: OUTPUT = pounds of milk produced per year, LABOR = hours worked per year by household members, CAPITAL = units of capital, LAND = units of land, PCAPITAL = price per unit of capital, PLAND = price per unit of land, POUTPUT = price per pound of milk, PLABOR = hourly wage of labor. Note that the price of labor and the price of land do not vary across dairy farms i.e., all 92 dairy farms can purchase labor and land at the same price.
LABOR
The data file LABOR consists of cross-section data for 100 families taken from the 1976 panel study of income dynamics, and is based on data for the year 1975. The variables are: LFP = a dummy variable for wife labor force participation (1 if wife worked in 1974, 0 otherwise), WHRS = wife’s hours of work in 1975, KL6 = number of children less than 6 years old in household, K618 = number of children between 6 and 18 in the household, WA = wife’s age, WE = wife’s years of education, WW = wife’s hourly wage for 1975, HHRS = husband’s hours worked in 1975, HA = husband’s age, HE = husband’s years of education, HW = husband’s hourly wage rate for 1975, FAMINC = total family income for 1975, MTR = marginal tax rate for wife, WMED = wife’s mother’s years of education, WFED = wife’s father’s years of education, UN = unemployment rate in county of residence (percentage), CIT = dummy variable for urban area (1 if family lives in large city, 0 otherwise), AX = wife’s years of labor market experience.
BACKGROUND INFORMATION
SAS is a statistical software package that can be used to read, manage, analyze, and present data. SAS allows you to read data in a variety of different formats, transform the data to conduct statistical analyses, analyze the data, and present the results.
A SAS program has two major components: Data Steps and Procedures. The data step allows you to read SAS data sets or raw data, perform transformations on the data, create new variables, and recode existing variables. The data step is the component of the program that creates SAS datasets. The procedure (usually referred to as PROC) allows you to analyze and present the data. Data steps and procedures are comprised of one or more statements. A statement is usually identified by a keyword that suggests the statement’s function (e.g., REG, MEANS, RUN). Every statement ends with a semicolon.
EXECUTING A SAS PROGRAM
A SAS program can be executed in different ways. The two most important ways are batch mode and interactive windows mode. In batch mode you use a text editor (such as Microsoft WordPad) to write a SAS program in an input file in a text document (.txt). You then tell SAS to execute the program in the input file and place the resulting output in an output file. You then use a text editor to view the output file.
In interactive windows mode, you type SAS statements in a Program Editor window. When SAS statements are executed the output is displayed in an Output window. A Log window is also displayed that contains the log for any SAS statements that are executed. The log window is very useful in writing SAS programs. The log is displayed whether the program works or not. It repeats the SAS statements that are executed, documents any SAS datasets that are created, gives you warnings about potential problems with your program, and error messages for mistakes such as incorrect syntax.
This guide explains how to create and execute SAS programs in interactive windows mode using the Program Editor.
CREATING A SAS DATASET
The first step in SAS programming is to create a SAS dataset. SAS has a large number of tools that can be used to read raw data into a SAS dataset. This process is called importing. The raw data used to create a SAS dataset can be in a number of different formats and locations. This guide explains how to import an Excel file, create a temporary SAS dataset, create a permanent SAS dataset that you can save for future use in a SAS library, and access a SAS dataset stored in the library.
Example
The Excel File WAGEDATA has 49 observations on 9 variables. The names of the variables are WAGE, EDUC, EXPER, AGE, GENDER, RACE, CLERICAL, MAINT, CRAFTS. You want to create a temporary SAS dataset named EARNINGS, and a SAS library named ECON415. You then want to save the temporary dataset EARNINGS as a permanent SAS dataset also named EARNINGS.
If you are using SAS 9.3, you can directly import an Excel file. If you are using SAS 9.4, you must first save the Excel file as a CSV (Comma delimited) file. To use the Excel program to save the Excel file as a CSV file do the following. In Excel, on the menu bar in the upper left hand corner click File. Click Save As. In the Save as type box scroll down the list of file types and click on CSV (Comma delimited). Click Save. Now launch SAS. On the menu bar in the upper left hand corner click File. Click Import Data… Under Select a data source from the list below, Microsoft Excel Workbook should appear; if not, find it under the list of choices. Click Next. In the dialogue box next to Workbook, enter the name and location of the file you want to import. In this example: E:\wagedata. In the dialogue box that appears SAS asks you What table do you want to import? This is the name of the worksheet in the Excel file you are importing. In the Excel file WAGEDATA, there is only one worksheet named data. This should already appear in the box. If not select it. Click Next. In the dialogue box that appears, enter the name you want to give to the temporary SAS dataset you are creating. Enter the name earnings. Click Finish. To verify that you have successfully created a temporary SAS dataset named EARNINGS, click the explorer button on the tool bar. Click the Work icon, and the earnings icon. To create a new SAS library named ECON415, on the menu bar in the upper left hand corner click on Tools. Click on New Library. In the dialogue box next to Name, enter ECON415. In the box next to Path, enter E:\. Click OK. Click the explorer button on the tool bar. Click Work. Use the mouse to drag the file named EARNINGS from the folder named Work to the folder named Econ415. To verify that you have successfully created a permanent SAS dataset named EARNINGS, click on the folder Econ415.
ACCESSING A PERMANENT SAS DATASET
The following examples explain how to load a permanent SAS dataset that you have created and create new temporary or permanent SAS datasets from it.
Example
You want to access the dataset named EARNINGS which is stored in the library named ECON415 on a disk on drive E. You want to create a temporary SAS data set named EARN1.
In the Program Editor window type the following statements.
LIBNAME econ415 ‘e:’;
DATA earn1;
SET econ415.earnings;
RUN;
The LIBNAME statement tells SAS the name of the library and where it is located. The DATA statement tells SAS to create a temporary SAS dataset named EARN1. The SET statement tells SAS to access the permanent SAS dataset named EARNINGS that is located in the library named ECON415. To verify that you have accessed EARINGS and created EARN1, click the Libraries icon in the Explorer window. There is now an icon for ECON415. If you click ECON415, you will see an EARNINGS icon. If you click the Work icon, you will see an icon for the temporary dataset EARN1. Note that when you end your session, the temporary dataset EARN1 will be deleted. If you want to store this new dataset permanently in the library named ECON415, then replace the DATA statement above with the following DATA statement
DATA econ415.earn1;
If you want to store all changes made in the current session in the permanent SAS dataset named EARNINGS, then replace the DATA statement above with the following DATA statement
DATA econ415.earnings;
In this case, you do not create a temporary SAS dataset. Rather, SAS overwrites the permanent SAS dataset EARNINGS with any changes that you make to the data during the current session.
CREATING VARIABLES, RECODING VARIABLES, DELETING OBSERVATIONS
Assignment statements and logical expressions can be used for many purposes, such as creating new variables from existing variables, recoding variables, and deleting observations from the current sample. Each of these are explained below.
ASSIGNMENT STATEMENTS
Assignment statements allow you to create new variables from existing variables. Assignment statements use the following arithmetic operators, which are carried-out in the following order if parentheses are not used: ** (exponentiation), * (multiplication), / (division), + (addition), - (subtraction). The operator for the natural logarithm is LOG.
Example
You want to access the dataset EARNINGS and create a temporary dataset named EARN1 that contains all the variables in EARNINGS plus additional variables that you want to create.
LIBNAME econ415 ‘e:’;
DATA earn1;
SET econ415.earnings;
logwage = log(wage);
yearwage = wage*12;
daywage = wage / 30;
agesq = age**2;
agecub = age**3;
toteduc = educ + 8;
RUN;
SAS will create the variables logwage, yearwage, daywage, agesq, agecub, and toteduc, and place them in the temporary dataset EARN1 along with all existing variables in the dataset EARNINGS.
LOGICAL EXPRESSIONS
Logical expressions use conditional IF, THEN, ELSE statements, and comparison and logical operators. The comparison operators are:
Equal to=eq
Greater thangt
Less thanlt
Greater than or equal to >=ge
Less than or equal to <=le
Not equal to ^=ne
Inin
Notinnot in
The logical operators are:
Andand
Or|or
In the following example, a description of each logical expression and its use is given directly below the expression for ease of reference.
Example
You want to access the dataset EARNINGS, create a temporary dataset named EARN1, and create new variables, recode existing variables, and delete observations from the sample to construct EARN1.
Commands
LIBNAME econ415 ‘e:’;
DATA earn1;
SET econ415.earnings;
This accesses the permanent SAS dataset named EARNINGS from the library named ECON415, and creates the temporary SAS dataset named EARN1.
IF educ > 4 THEN college = 1;
ELSE college = 0;
This creates a dummy variable named college that can take two values: 1 or 0. The IF THEN statement assigns a value of 1 to the variable college if the variable educ is greater than 4. The ELSE statement assigns a value of 0 to the variable college for all observations that do not have a value of one.
IF age > 50 THEN newage = 2;
ELSE IF age > 25 THEN newage = 1;
ELSE newage = 0;
This creates a multinomial variable called newage that can take three values: 2,1,or 0. The IF THEN statement assigns a value of 2 to the variable newage if the variable age is greater than 50. The ELSE IF THEN statement assigns a value of 1 to the variable newage if the variable age is greater than 25 and equal to or less than 50. The ELSE statement assigns a value of 0 to the variable newage for all observations that do not have a value of 2 or 1. Note that only one ELSE statement is allowed per IF THEN statement.
IF gender = 1 THEN sex = ‘male’;
ELSE sex = ‘female’;
This creates a character variable named sex, that can take two names: male or female. The IF THEN statement assigns the name male to the variable sex if the variable gender is equal to 1. The ELSE statement assigns the name female to the variable sex for all observations that do not have the name male.
IF wage > 1300;
This keeps any observation for which the variable wage is greater than 1300. It deletes all observations for which wage is 1300 or less.
IF exper = 1 THEN delete;
This deletes any observation for which the variable exper is equal to 1.
IF exper = 3 and gender = 1 then delete;
This deletes any observation for which both the variable exper is equal to 3 and the variable gender is equal to 1. If either one of these conditions is not satisfied, then the observation is not deleted.
IF educ = 11 or age > = 57 then delete;
This deletes any observation for which either the variable educ is equal to 11 or the variable age is greater than or equal to 57.
IF wage = . THEN delete;
SAS represents a missing observation with a period (.). This deletes any observation for which the variable wage has a missing value.
IF age = . then age = 65;
This assigns the value of 65 to the variable age for any observation that is missing.
RUN;
DELETING VARIABLES FROM A SAS DATASET
Example
You want to create two new permanent SAS datasets from the permanent SAS dataset named EARNINGS. You want to name these new SAS datasets EARNSUB1 and EARNSUB2. You want EARNSUB1 to contain the variables WAGE, EDUC, EXPER, AGE. You want EARNSUB2 to contain the variables WAGE, EDUC.