SAS Tips & Tricks 1
SAS Tips & Tricks
Anthony Fina
Statistics Outreach Center
The University of Iowa
Spring 2013
Files to be used with this document:
MOCK_PUBLIC.txt – 92 observations from public schools
MOCK_PRIVATE.txt– 8 observations from private schools
SAS Environment
1.Docking the Explorer window if you accidently close it:
a.Select View from the menu and click Explorer.
b.Then select Window from the menu and click Docked.
c.You may have to select Large Icons on the View tab.
2.Saving your settings:
a.Tools Options Preferences…
b.On the General tab check ‘Save settings on exit.’ This isn’t going to work on CITRIX or on UI computers that do not allow you to save settings.
3.Displaying results in Output window (default prior to SAS 9.3):
a.Tools Options Preferences…
b.On the Results tab check ‘Create listing.’ Selecting this will display results the traditional way in the Output window.
4.Automatically clear the Results window:
a.Place at the beginning of your program. This display manager command tells SAS to remove stored output in the Results Navigator Window. HTML output is unaffected but still should be deleted using command below.
DM'ODSRESULTS; CLEAR;';
5.Automatically creating HTML output (now default in SAS 9.3):
a.Tools Options Preferences…
b.On the Results tab check ‘Create HTML’ and ‘Use WORK folder.’ HTML results are presented in the Results Viewer.
6.Automatically clear the HTML output (once):
a.Only the most recent PROC output will be shown in HTML output when ‘View results as they are generated’ is selected in preferences. Window does not clear until another procedure with output is run.
ODSHTMLCLOSE;
ODSHTML;
7.Automatically clear the HTML output (always):
a.Only the most recent PROC output will be shown in the HTML results window when ‘View results as they are generated’ is selected in preferences.
ODSHTMLNEWFILE=PROC;
b.The default is to append the results from the current procedure to previous results. The following code will undo the code in 7.a:
ODSHTMLNEWFILE=NONE;
8.Automatically save the log and output windows:
a.Put at the end of your program, these display manager commands will save the contents of the log and output window to text files. File references were used in this example to define the location and names of the files.
FILENAME MYLOG"H:\Example.log";
FILENAME MYOUT"H:\Example.out";
DM'OUT;FILEMYOUT REP;'; *Saves your Output window to a text file.;
DM'LOG;FILEMYLOG REP;'; *Saves your Log window as a text file;
9.Commenting out large portions of your program:
a.Highlight the desired code you wish you wish to comment out.
b.Hold down CTRL and press the / button at the same time.
c.To undo the commented area, highlight the code and press CTRL, SHIFT, and / .
10.Deleting all the files in the WORK folder:
a.When a temporary SAS dataset is created, it is saved only for the duration of the SAS session. This sometimes leads to problems – the common example occurs when there is an error in your program so a dataset is not created, but an old dataset with the same name exists from a previous run of the program. SAS will use the old dataset and, unless you are carefully reviewing your log, you may not realize this is happening.
b.To delete all old temporary SAS datasets:
PROCDATASETSLIBRARY=WORK KILL; RUN;
Data Steps and Functions
11.Importing a file found on the internet: This is an easy way to read in data found on a website.
FILENAMEmydataurl' LRECL=259 ;
DATA private;
INFILEmydataPAD;
INPUT
@1 system $CHAR20.
@31 bname $CHAR20.
@123grd $CHAR2.
@125testmth $CHAR2.
@131testyr $CHAR2.
@133 name $char31.
@133lname $CHAR11.
@153fname $CHAR20.
@173 sex $CHAR1.
@174bmo 2.
@176byr 4.
@191 level $CHAR2.
@193form $CHAR1.
@195dist $CHAR4.
@199 (item1-item30) (+1$1.);
RUN;
12.RETAIN (for reordering variables):
a.This is an easy way to reorder variables if you likeexamining the data without printing it (i.e. through the Explorer window). Also helpful if you plan to export a dataset.
DATA master;
RETAIN id lnamefname test1 test2;
SET master;
RUN;
13.Creating a unique ID for every observation: This can simplify merges later on. It is also necessary for some advanced analyses.
DATA master;
SET master;
id=_N_;
RUN;
14.Converting variables from numeric to character or character to numeric:
a.Numeric to character:
newvar = PUT(oldvar, 8.);
b.Character to numeric:
The second line where the old variable is multiplied by one leaves a note in the log indicating that character values have been converted to numeric. The first line does not.
newvar = INPUT(oldvar, 8.);
newvar = oldvar*1;
15.RENAME, DROP, LENGTH:
a.Situation: There are times when you need to change the length of an existing variable. This is sometimes necessary if you are combining multiple datasets and a variable has two different lengths.
b.This example renames the ‘id’ variable as ‘tempvar’ as it is read in from the ‘master’ dataset. The length of ‘id’ is specified. Next, the new ‘id’ variable is assigned the value of ‘tempvar’. Then ‘tempvar’ is dropped as the dataset is written to the work folder.
DATAWORK.master(DROP=tempvar);
LENGTH id 8;
SETmaster(RENAME=(id=tempvar));
id=INPUT(tempvar,8.);
RUN;
16.Number of observations in a dataset: This is the simplest way to find out the number of observations there are in a dataset. The number of observations are written to the log. More complicated methods are included below for use with very large datasets.
DATA_NULL_;
PUT NOBS=;
STOP;
SET master NOBS=NOBS;
RUN;
17.Creating blank datasets: Included for future reference. It is often helpful to have blank datasets created so that even if a procedure does not produce a table, it can still be appended to an existing dataset. This is especially true when macros are used. The following code would create two blank datasets.
DATA blank1 blank2; RUN;
18.INDEX and SUBSTR: These two functions are incredibly powerful together. This example shows how to break apart two concatenated variables based on the location of the first space. This works pretty well and can save you time on data clean up.
DATA names; SET master;
last=SUBSTR(name, 1, INDEXC(name,1,' '));
first1=SUBSTR(name, INDEXC(name,' '));
KEEP name last lname2 first first1;
RUN;
19.TRIM and LEFT: If you examine the variable ‘first1’ that was created above you will see that there are extra spaces. The following code will remove the spaces.
first=TRIM(LEFT(SUBSTR(name, INDEXC(name,' '))));
20.COMPRESS: Removes character values you specify. The example below removes spaces and hyphens. Unlike the TRIM and LEFT combo above, COMPRESS lets us select what we want to remove.
lname2=COMPRESS(lname, ' -');
21.Concatenating variables: Suppose you wanted to combine a student’s last name, first name, and a space added in between, all you need to use || between variables.
name = last||" "||first;
22.Conditional inputs:
a.Situation: You are reading in data using anINPUT statement and you do not want to read in all of the observations. You just want to read in observations that satisfy a particular condition. For example, you only want to read in the data for the males in a dataset.
b.Basic idea –Input the variable you want to condition on. The @ at the end of the first input statement tells SAS that more data will be read in from these rows. Condition on the variable of interest using an IF statement. Input remaining variables.In the example below, the variable ‘sex’ is read in for all observations but only whensex=“M” are the rest of the variables read in and written to the dataset.
DATA males; INFILE"H:\FILES\MOCK_PRIVATE.TXT"LRECL=258 ;
INPUT @173 sex $CHAR1. @;
IF sex = "M";
INPUT
@1 system $CHAR20.
@31 bname $CHAR20.
@123grd $CHAR2.
@125testmth $CHAR2.
@131testyr $CHAR2.
@133lname $CHAR11.
@153fname $CHAR20.
@174bmo 2.
@176byr 4.
@191 level $CHAR2.
@193form $CHAR1.
@195dist $CHAR4.
@199 (item1-item30) (+1$1.);
RUN;
23.Reading in multiple files at once:
a.Situation: You need to read in data from a number of files in a specific folder. It would be tedious and time consuming to type in the names of all the files. Instead, this can be accomplished by using the wildcard ‘*’ in the FILENAME statement.
FILENAME NAMES 'H:\FILES\*.TXT';
DATA master;
INFILE names LRECL=259;
INPUT
@1 system $CHAR20.
@31 bname $CHAR20.
@123grd $CHAR2.
@125testmth $CHAR2.
@131testyr $CHAR2.
@133 name $char31.
@133lname $CHAR11.
@153fname $CHAR20.
@173 sex $CHAR1.
@174bmo 2.
@176byr 4.
@191 level $CHAR2.
@193form $CHAR1.
@195dist $CHAR4.
@199 (item1-item30) (+1$1.);
RUN;
24.Searching a text string using WHILE and DO:
a.WHILE and DO make apowerful combination together. In this example, we use these two functions to search records to determine if a student has been labeled as a homeschool student. An indicator variable is also created. The search starts in column one and scans for HOME in ‘bname.’ The search proceeds one column at a time till the end.
DATA master;
SET master;
HOME = 'N';
I = 1;
DOWHILE (HOME = 'N'LENGTH(COMPRESS(bname)) GE I+3);
IF (UPCASE(SUBSTR(COMPRESS(bname), I, 4))= 'HOME') THEN HOME ='Y';
I = I + 1;
END;
DROPI;
RUN;
25.Scoring a test:
a.Arrays are a useful tool for completing the same set of tasks on a series of variables. The code below creates 3 arrays. The first contains the variables corresponding to 10 items on a test. The values for each item correspond to the response option chosen (A, B, C, D, or E). The second array contains the correct responses for each item. The third array contains 10 variables that will be scored 0 if the individual chose the wrong response and 1 if the individual chose the correct response.
DATA master;
SET master;
ARRAY ITEM (10) $ ITEM1-ITEM10;
ARRAY KEY (10) $ KEY1-KEY10 ('C', 'D', 'A', 'B', 'C', 'D', 'C', 'B', 'B', 'A');
ARRAY CORRECT (10) CORR1-CORR10;
DO I = 1TO10;
IF ITEM(I) = KEY(I) THEN CORRECT(I) = 1;
ELSEIFITEM(I) = '' THEN CORRECT(I) = .;
ELSE CORRECT(I) = 0;
END;
DROPI KEY1-KEY10;
READ_CONC = SUM(OF CORR1-CORR10);
RUN;
26.Cumulative frequency or summing across rows:
a.Sometimes you want a column to reflect the cumulative frequency of another variable. Other times you may want to know the sum of a column without using a PROC step. This example using the RETAIN statement to create the new variable ‘tot’. ‘freq’ is added to ‘tot’ in the next line. This serves as a summing function because the RETAIN statement carries over the value of ‘tot’ from the row above, which is then added to the value of ‘freq’ to get the new value of ‘tot’.
DATA new;
SET new;
RETAIN tot;
tot + freq;
RUN;
27.Using a BY statement for selection:
a.When you use a BY statement in your data step, SAS creates variables called ‘FIRST.’ and ‘LAST.’. These variables identify the first and last observation with each value of the variables on the BY statement. This is also great to use if you have multiple observations for people and you only want the first one or you only want the last one. Just sort by ID and DATE to get observations in chronological order and add BY ID in a data step. The temporary variables FIRST.ID and LAST.ID can then be used to select the desired rows.
FILENAMEmydataurl' ;
DATAbyexample ;
INFILEmydata;
INPUT ID $ 1-3 NAME $ 5-9 @13 (ITEM1-ITEM6) (+11.) @26 DATE MMDDYY9.;
RUN;
PROCSORT;
BY ID DATE;
RUN;
DATA WORK.byexample1;
SETWORK.byexample;
BY ID;
IF FIRST.ID;
RUN;
PROCPRINT;
RUN;
ProcSteps
28.Removing labels and formats: Works for SAS 9.3 and previous versions of SAS even though ATTRIB is red. This procedure is really helpful when cleaning up data.
PROC DATASETSLIB=workNOLIST;MODIFYmaster;
ATTRIB _ALL_ LABEL=''; *Remove labels;
FORMAT _ALL_; *Remove formats;
QUIT; RUN;
29.PROC SORT: This procedure is helpful for organizing data. It also has two very useful options, NODUP and NODUPKEY. While both of these options identify duplicate observations, they have an important difference.
a.NODUP sorts according to the variables listed in the BY statement. However, when checking for duplicates, it compares each observation to the observation before it in the data for all of the variables in the file. Thus, two observations will be considered duplicates only if they have identical values on all the variables.
PROCSORTNODUP;
BYlnamefnamebmobyr;
RUN;
b.NODUPKEY sorts according to the variables listed in the BY statement. However when checking for duplicates, it compares each observation to the observation before for only the variables listed in the BY statement. Another way to think about it is the BY statement serves as a KEY for identifying duplicates.
PROCSORTNODUPKEY DUPOUT= duplicates;
BYlnamefnamebmobyr;
RUN;
30.Proc Transpose: This procedure transposes variables so that rows become columns and columns become rows. Sometimes this is necessary for exporting and analyses. The VARstatement names the variables whose values you want to transpose. You could add the BY statement if you have any grouping variables you want to keep as grouping variables (they are not transposed). You could also add the ID statement to name the transposed columns – each value of ID must occur only once.
PROCTRANSPOSEDATA=master OUT=master_t
PREFIX = person;
VAR item1 item2;
RUN;
31.Replacing missing values: The STDIZE procedure standardizes one or more numeric variables in a SAS dataset by subtracting a location measure and dividing by a scale measure. However, this example uses the procedure to replace all missing values in numeric variables with 0’s.
PROCSTDIZEDATA=indatREPONLY
MISSING=0OUT=outdat;
VAR_numeric_;
RUN;
Macros
32.%LET:
a.%LET is very useful. It assigns a value to a global macro variable that can be used anywhere in your SAS program.
b.Example: Assigning dates to logs and output files.Note, you need to run the %LET statement before the filename statement in order for the filename to be properly assigned. This is a good way to back up your programs. The fourth line of code outputs the log that was defined in line 2.
%LET DATE=%sysfunc(Date(), worddate18.);
FILENAME LOG "H:\TIPS_&DATE..LOG";
(SAS Program)
DM'LOG;FILE LOG REP;';
33.Number of observations:
a.This code uses a macro to get the number of observations in a dataset. This value is assigned to the global macro variable ‘nobs’ which is written to the LOG. Having a macro variable recording the number of observations can be helpful also in some situations.
%MACROobs(name);
%LET ds = %SYSFUNC(OPEN(name,i));
*Makes sure the dataset exists;
%LET nobs= %SYSFUNC(ATTRN(&ds,NOBS));
*Assigns the number of observations to 'nobs';
%put Number of observations = &nobs;
%MENDobs;
%obs(master);
34.%INCLUDE: This is great if you have a portion of code that is not changed frequently. For example, the data statement with the input command in example 2b above could be saved externally as ‘INPUT.sas’. Then to call it you only need one statement. The entire statement has been replaced with a single line.
%INCLUDE'H:\INPUT.sas'LRECL=256;
35.Debugging macros:
a.Debugging macros is notoriously hard.
i.When possible, write your program as regular SAS code first and once you know that works convert it to a macro.
ii.You can convert the code to a macro in steps by first using %LET to create macro variables, then once you know that is working, embed the code in a macro and pass the macro variables as parameters.
iii.When you have errors there are several options that can help you find them
1.SYMBOLGEN: This prints how your macro variable is being resolved. Makes debugging easier.
- To turn it on use OPTIONSSYMBOLGEN;
- To turn it off use OPTIONSNOSYMBOLGEN;
2.MPRINT: When this option is on, SAS will print to the log, the standard SAS statements that were generated by macros.
- To turn it on use OPTIONSMPRINT;
- To turn it off use OPTIONSNOMPRINT;
Inputting multiple files from a single folder
Situation: This is another method to read in data from multiple files in a folder without typing in the names of all the files. It is more complicated (and more flexible than the method described above that used a wildcard. First, you want to have SAS look in the folder, find all the file names and write them to a dataset.This example looks up the files in a specified folder and assigns each file name to an observation in a dataset. In an optional second step, a data step illustrates how you can delete files based on their extensions. Last, the remaining files are read into SAS and a variable is created that contains the name of the dataset. The variable containing the name of the dataset is a tracking variable indicating the file from which the data was read in from.
*This gets the file names in a specified folder;
%macroget_filenames(location);
FILENAME _dir_ "%bquote(&location.)";
DATA filenames(KEEP=memname);
handle=DOPEN( '_dir_' );
IF handle > 0 THEN DO;
count=DNUM(handle);
DO i=1 TO count;
memname=DREAD(handle,i);
OUTPUT filenames;
END;
END;
rc=DCLOSE(handle);
RUN;
FILENAME _dir_ CLEAR;
%MEND;
%get_filenames(H:\FILES\);
* Remove csv files;
DATA names; SET filenames;
csv=0;i=1;
DOWHILE (csv=0 AND i+2 le LENGTH(COMPRESS(memname)));
IF (UPCASE(SUBSTR(COMPRESS(memname), i, 3)) = 'CSV') THENcsv=1;
i = i + 1;
END;
RUN;
DATA names; SET names;
id=_N_;
IFcsv=1THENDELETE;
RUN;
DATA names; SET names;
FILE'H:\names.txt'LRECL=150;
PUT @1memname$char150.;
RUN; *Read in files from created list dump file name into a variable;
DATA master;
INFILE'H:\names.txt'LRECL=150; LENGTHmemname $150;
INPUTmemname $;
fil2read='H:\FILES\'||TRIM(LEFT(memname));
INFILE dummy FILEVAR=fil2read END=done DSDTRUNCOVERLRECL=258;
DOWHILE (not done);
INPUT
@1 system $CHAR20.
@31 bname $CHAR20.
@123grd $CHAR2.
@125testmth $CHAR2.
@131testyr $CHAR2.
@133 name $char31.
@133lname $CHAR11.
@153fname $CHAR20.
@173 sex $CHAR1.
@174bmo 2.
@176byr 4.
@191 level $CHAR2.
@193form $CHAR1.
@195dist $CHAR4.
@199 (item1-item30) (+1$1.);
OUTPUT;
END;
RUN;
Matching (MERGE and PROC SQL)
Scenario: The following six people were requested by a researcher. We need to find these people in our dataset created earlier (if possible). We will deliver an Excel file to the researcher of the matched results. Two different methods are examined: MERGE and SQL.
DATA request; INPUTlname$CHAR11.fname $ bmobyrexam1 exam2;
DATALINES;
DYE MADISON 5 2000 86 89