SAS Data Analysis Examples Chapter 5 - 107

CHAPTER 5

SAS DATA ANALYSIS EXAMPLES

© Alan Elliott, 2000

All rights reserved

For classroom use only

One of the best ways to learn how to use SAS is to begin with examples, then turn them into the specific analyses you need. This Chapter contains numerous examples that can get you started. All of the examples listed here are also on your workbook diskette.

The examples included in this chapter can be useful in creating your own SAS programs. Select an example that is close to the task you need to perform, and use it as a starting point for your own analysis.

Examples are divided into the following categories:

§  Data Creation and Manipulation - All of the “Data” example files begin with a “D”.

§  Data Analysis - All of the “Analysis” example files begin with a “A”.

§  Graphics (both character and pixel) - All of the “Graph” example files begin with a “G”

CHAPTER OBJECTIVE: This chapter contains a number of examples for using SAS. Included are most of the examples used in Chapters 1 to 4, plus other examples.

Data Analysis Examples

Data Creation and Manipulation

D.1 Column Input Example (DINPUTC.SAS)

D.2 Reading Data from Multiple Records (DMULTREC.SAS)

D.3 Using IF Statements to Create New Variables and Delete Records (DIF1.SAS)

D.4 Creating Several Data Sets from a Large Data Set by Subsetting (DSUBSET.SAS)

D.5 Deleting Records with Missing Values (DMISS1.SAS)

D.6 Creating a Permanent SAS Data Set (DSS1.SAS)

D.7 Using a Permanent SAS Data Set in a SAS Job (DSS2.SAS)

D.8 Examining the Contents of a Permanent Data Set (DCONT.SAS)

D.9 Sorting Data Using Proc Sort (DSORT1.SAS) & (DSORT2.SAS)

D.10 Merging Data Sets (DMERGE.SAS)

Data Analysis Examples

A.1 Data Listing using PROC PRINT (APRINT.SAS)

A.2 Paired t-Test Using PROC MEANS (APAIREDT.SAS)

A.3 Frequencies and Chi-Square Table Analysis using PROC FREQ (AFREQ1.SAS)

A.4 Chi-Square Analysis using Count Data (AFREQ2.SAS)

A.5 Independent Group Ttest Using PROC TTEST (ATTEST.SAS)

A.6 One-Way Analysis of Variance Using PROC ANOVA (AONEWAY.SAS)

A.7 Two-Way ANOVA Using PROC GLM (ATWOWAY.SAS)

A.8 Three Way ANOVA Using PROC GLM (A3WAY.SAS)

A.9 One way repeated measures ANOVA

A.10 Repeated Measures ANOVA with Group and Subject Factors

A.11 Simple Linear Regression Using PROC GLM (ASIMPREG.SAS)

A.12 Multiple Regression Using GLM (AMULTREG.SAS)

A.13 Multiple Regression Using PROC REG (AREG1.SAS)

Graphics Examples

G.1 Scatterplot Using PROC PLOT and PROC GPLOT (GPLOT1.SAS)

G.2 Linear Regression Line (GPLOT2.SAS)

G.3 SAS Graphics Examples (GEXAMP1.SAS)


D.1 Column Input Example

*******************************************************

* Reading Data Using Column Input Technique *

* DINPUTC.SAS *

*******************************************************

;

DATA TEMP;

INPUT ID 1 SBP 24 DBP 57 SEX $ 8 AGE 910 WT 1113;

CARDS;

1120 80M15115

2130 70F25180

3140100M89170

4120 80F30150

5125 80F20110

RUN;

PROC PRINT DATA=TEMP;

TITLE 'Example of Column Input';

RUN;


D.2 Reading Data from Multiple Records

This example shows several ways of inputting data when the records span over more than one “card”.

*******************************************************

* Reading Multiple Lines of Data *

* DMULTREC.SAS *

*******************************************************

;

*******************************************************

* Using the multiple INPUT method *

*******************************************************

;

DATA TEMP;

INPUT ID;

INPUT SBP DBP;

INPUT SEX $ AGE WT;

CARDS;

1

120 80

M 15 115

2

130 70

F 25 170

3

140 100

M 89 170

4

120 80

F 30 150

5

125 80

F 20 110

;

RUN;

PROC PRINT DATA=TEMP;

TITLE 'Reading multiple cards per record';

RUN;


*******************************************************

* Using the “/” INPUT method *

*******************************************************

;

DATA TEMP2;

INPUT ID / SBP DBP / SEX $ AGE WT; CARDS;

1

120 80

M 15 115

2

130 70

F 25 170

3

140 100

M 89 170

4

120 80

F 30 150

5

125 80

F 20 110

RUN;

PROC PRINT DATA=TEMP2;

TITLE 'Reading multiple cards per record';

RUN;


*******************************************************

* Using the “#” card number INPUT method *

*******************************************************

;

DATA TEMP3;

INPUT ID #2 SBP DBP #3 SEX $ AGE WT;

CARDS;

1

120 80

M 15 115

2

130 70

F 25 170

3

140 100

M 89 170

4

120 80

F 30 150

5

125 80

F 20 110

RUN;

PROC PRINT DATA=TEMP3;

TITLE 'Reading multiple cards per record';

RUN;


*******************************************************

* Reading Cards in Different order *

*******************************************************

;

DATA TEMP4;

INPUT #2 SBP DBP #1 ID #3 SEX $ AGE WT;

CARDS;

1

120 80

M 15 115

2

130 70

F 25 170

3

140 100

M 89 170

4

120 80

F 30 150

5

125 80

F 20 110

;

RUN;

PROC PRINT DATA=TEMP4;

TITLE 'Reading multiple cards per record';

RUN;


D.3 Using IF Statements to Create New Variables and Delete Records

********************************************************************

* Using IF Statements to Create New Variables and Delete Records *

* DIF1.SAS *

********************************************************************

;

DATA TEMP;

INPUT ID SBP DBP SEX $ AGE WT;

* The following statement creates a new variable called CATEGORY;

IF DBP > 140 THEN CATEGORY='CAT01';

ELSE CATEGORY='CAT02';

* The following statement deletes all records meeting the criteria;

IF SBP > 200 OR DBP > 150 THE DELETE; /* Deletes extremes */

CARDS;

1 120 80 M 15 115

2 130 70 F 25 170

3 140 100 M 89 170

4 120 80 F 30 150

5 125 80 F 20 110

11 220 80 M 15 115

12 130 170 F 25 170

13 240 100 M 89 170

14 120 180 F 30 150

15 275 80 F 20 110

RUN;

PROC PRINT DATA=TEMP;

TITLE 'Example of SAS Commands';

RUN;


D.4 Creating Several Data Sets from a Large Data Set by Subsetting

********************************************************************

* Creating Several Data Sets from a large Data set by Subsetting *

* DSUBSET.SAS *

********************************************************************

;

DATA ALL;

INPUT ID SBP DBP SEX $ AGE WT;

CARDS;

1 120 80 M 15 115

2 130 70 F 25 170

3 140 100 M 89 170

4 120 80 F 30 150

5 125 80 F 20 110

;

* Create a data set containing only MALES;

DATA MALES; SET ALL;

IF SEX = 'M':

* Create a data set containing only FEMALES;

DATA FEMALES; SET ALL;

IF SEX = 'F':

* Output the results ;

PROC PRINT DATA=MALES;

TITLE 'Only the Males';

PROC PRINT DATA=FEMALES;

TITLE 'Only the Females';

RUN;

D.5 Deleting Records With Missing Values

****************************************

* Delete Records with Missing Values *

* DMISS1.SAS *

****************************************

;

DATA ALL;

INPUT AGE SEX $ FAT PROTEIN CARBO SODIUM;

* Delete records containing missing values;

IF AGE = .

OR SEX = '.'

OR PROTEIN = .

OR CARB0 = .

OR SODIUM = .

THEN DELETE;

CARDS;

25 M 40 40 109 1396

26 M 47 46 125 1731

38 M 42 40 104 1431

42 M 48 46 123 1711

65 M 41 41 112 1630

68 M 34 33 96 1192

20 F 39 29 118 1454

30 F 40 40 115 1532

28 F . . . .

60 F 39 40 123 1585

PROC PRINT;

RUN;


D.6 Creating a Permanent SAS Data Set

****************************************

* Creating a Permanent SAS Data Set *

* DSS2.SAS *

****************************************

;

libname mydata 'a:\'; *Points to a directory on disk;

DATA mydata.dixon; *Give the SAS data set a name;

INFILE 'a:\dixon.dat'; *Tell the name of the file containing data;

INPUT *Define where each column (variable) is;

GROUP $ 1 1 * located in the file;

AGE_1952 2 7

SOCIO 8 13

SBP_52A 14 19

DBP_52A 20 25

SBP_62B 26 31

DBP_62B 32 37

BLDCHL52 38 43

BLDCHL62 44 49

HEIGHT52 50 55

WEIGHT52 56 61

PULSE62 62 67

CORONARY 68 73

;run;


D.7 Using a Permanent SAS Data Set in a SAS Job

****************************************

* Using a Permanent SAS Data Set *

* DSS2USE.SAS *

****************************************

;

libname mydata ‘a:\’; *Points to a directory on disk;

data myname; *Names temporary data set to be used;

set mydata.dixon; *Tells the name of the permanent data set;

* in the library to use (mydata.dixon);

PROC MEANS; DATA MYNAME;

TITLE 'Listing of Dixon and Massey Data Set';

RUN;


D.8 Examining The Contents of a Permanent Data Set

****************************************

* Examining a Permanent SAS Data Set *

* DCONT.SAS *

****************************************

;

LIBNAME MYDATA 'a:\’;

PROC DATASETS LIBRARY=MYDATA;

CONTENTS DATA= _ALL_ ;

RUN;


D.9 Data Sorting using PROC Sort

********************************************************************

* Sorting Example - Sort and perform analysis using BY statement *

* DSORT1.SAS *

********************************************************************

;

DATA ALL;

INPUT AGE SEX $ FAT PROTEIN CARBO SODIUM;

CARDS;

25 M 40 40 109 1396

26 M 47 46 125 1731

38 M 42 40 104 1431

42 M 48 46 123 1711

65 M 41 41 112 1630

68 M 34 33 96 1192

20 F 39 29 118 1454

30 F 40 40 115 1532

60 F 39 40 123 1585

;

RUN;

TITLE 'Sorting Example';

PROC SORT; BY SEX;

PROC MEANS; BY SEX;

VAR FAT PROTEIN SODIUM;

RUN;

**************************************************

* PROC SORT EXAMPLE 2 *

* DSORT2.SAS * **************************************************;

DATA MYDATA;

INPUT GROUP RECTIME;

CARDS;

1 3.1

2 3.6

2 4.2

1 2.1

1 2.8

2 3.8

1 1.8

;

PROC SORT; BY RECTIME;

PROC PRINT;

Title ‘Sorting Example’;

PROC SORT; BY DESCENDING RECTIME;

PROC PRINT;

RUN;


D.10 Merging Data Sets

***********************************************

* Merging data sets *

* DMERGE.SAS *

***********************************************

;

***********************************************************************

* NOTE: CASE numbers are the same on both datasets *

***********************************************************************

;

DATA PRE;

INPUT CASE PRETREAT;

CARDS;

1 1.02

2 2.10

3 1.88

4 2.20

5 1.44

11 1.55

13 1.61

14 2.61

15 1.56

16 0.99

22 1.53

;

DATA POST;

INPUT CASE POSTREAT; CARDS;

1 1.94

2 1.63

3 2.73

4 2.18

5 1.82

11 1.94

13 2.25

14 1.70

15 1.78

16 1.52

22 1.97

;

* Sort data sets on the ID variables — CASE;

PROC SORT DATA=PRE; BY CASE;

PROC SORT DATA=POST; BY CASE;

* Define new data set using MERGE statement;

DATA PREPOST;

MERGE PRE POST; BY CASE;

DIFF=POSTREAT PRETREAT; * Calculation in new data set;

PROC PRINT DATA=PREPOST;

RUN;


A.1 Data Listing Using PROC PRINT

********************************************************

* Print listing of data *

* APRINT.SAS *

********************************************************

;

OPTIONS PS=60;

DATA TEMP;

INPUT ID SBP DBP SEX $ AGE WT;

CARDS;

1 120 80 M 15 115

2 130 70 F 25 180

3 140 100 M 89 170

4 120 80 F 30 150

5 125 80 F 20 110

;

RUN;

PROC PRINT DATA=TEMP;

TITLE 'Example of PROC PRINT';

RUN;


A.2 Paired t-Test Using PROC MEANS

************************************************

* PAIRED T-TEST EXAMPLE *

* APAIREDT.SAS *

************************************************

;

DATA WEIGHT;

INPUT WBEFORE WAFTER;

WLOSS=WAFTER-WBEFORE; * Calculate WLOSS within the DATA step *;

CARDS;

200 190

175 154

188 176

198 193

197 198

310 240

245 204

202 178

;

PROC MEANS N MEAN T PRT; VAR WLOSS;

TITLE ‘Paired t-test example using PROC MEANS’;

RUN;


A.3 Frequencies and Chi-Square Table Analysis using PROC FREQ

*********************************************************

* Frequency Table (PROC FREQ) Example using DIXON data *

* AFREQ1.SAS *

*********************************************************

;

*********************************************************

* Using the SAS data set DIXON — See the data set *

* creation example for creating the DIXON data set. *

*********************************************************

;

libname mydata 'c:\mydir'; * or use ‘a:\’ instead;

data dixon;set mydata.dixon;

;

PROC FREQ;TABLES SOCIO;

Title ‘Frequencies on Social Status variable’;

run;

*********************************************************

* Two-Way Table (PROC FREQ) Example using DIXON data *

*********************************************************;

PROC FREQ;TABLES CORONARY*GROUP/CHISQ;

Title ‘Chi Square Analysis of a Contingency Table’;

run;


A.4 Chi-Square Table Analysis Using Count Data

*************************************************************
* Example Chi-Square Analysis Entering Data within Program *

*************************************************************

;

options ps=60;

data;

do a = 1 to 2;

do b = 1 to 2;

input wt @@;

output;

end;

end;

cards;

12 15

18 3

;

proc freq;

weight wt;

tables a*b /chisq;

title 'Chi-Square Analysis for a 2x2 table';

run;


A.5 Independent Group T-Test Using PROC TTEST

********************************************

* SAS CODE FOR AN INDEPENDENT GROUP T-TEST *

* ATTEST.SAS *

********************************************

;

DATA TTEST;

INPUT GROUP 1 - 4 OBS 5 - 12;

CARDS;

1 23.00

1 23.00

1 32.00

1 24.00

1 25.00

2 25.00

2 46.00

2 56.00

2 45.00

2 56.00

2 55.00

;

PROC TTEST;

CLASS GROUP;

VAR OBS;

RUN;


A.6 One-Way Analysis of Variance Using PROC ANOVA

*********************************************************

* EXAMPLE ONE-WAY ANOVA COMPARING HEADACHE MEDICINES *

* AONEWAY.SAS *

*********************************************************

;

DATA ACHE;

INPUT BRAND RELIEF;

CARDS;

1 24.5

1 23.5

1 26.4

1 27.1

1 29.9

2 28.4

2 34.2

2 29.5

2 32.2

2 30.1

3 26.1

3 28.3

3 24.3

3 26.2

3 27.8

;

PROC ANOVA DATA=ACHE;

CLASSES BRAND;

MODEL RELIEF=BRAND;

MEANS BRAND/TUKEY;

TITLE 'Compare RELIEF across MEDICINES - ANOVA Example';

run;

PROC GPLOT;

PLOT RELIEF*BRAND;

run;


A.7 Two-Way ANOVA Using PROC GLM

*********************************************************

* EXAMPLE TWO-WAY ANOVA USING DIXON DATA *

* ATWOWAY.SAS *

*********************************************************

;

*********************************************************

* Using the SAS data set DIXON — See the data set *

* creation example for creating the DIXON data set. *

*********************************************************

;

libname mydata 'c:\mydir'; * or use ‘a:\’ instead;

data dixon;set mydata.dixon;

PROC GLM;

CLASSES SOCIO GROUP;

MODEL PULSE62=SOCIO GROUP SOCIO*GROUP;

TITLE 'Compare PULSE across SOCIO and GROUP';

run;


A.8 Three Way ANOVA using PROC GLM

***********************************************

* Three WAY ANOVA Using GLM *

* A3WAY.SAS *

***********************************************;

DATA ANOVA3;

INPUT OBS A B C;

CARDS;

24 1 1 1

29 1 1 1

18 1 1 2

19 1 1 2

23 1 1 2

15 1 2 1

15 1 2 1

12 1 2 1

15 1 2 2

20 1 2 2

13 1 2 2

20 2 1 1

22 2 1 1

18 2 1 1

15 2 1 2

10 2 1 2

11 2 1 1

16 2 2 1

Continued . . .

17 4 1 2

11 4 1 1

17 4 2 1

12 4 2 1

13 4 2 2

15 4 2 2

7 4 2 2

;RUN;

PROC glm;

classes a b c;

model obs= a b c a*b a*c b*c a*b*c;

run;

11 2 2 1

10 2 2 2

14 2 2 2

6 2 2 2

24 3 1 1

30 3 1 1

18 3 1 2

17 3 1 2

23 3 1 2