• Using if-then statements

Grammer:

1. if condition then (single) action;

2. if condition then do;

action;

action;

end;

3. if condition then action;

else if condition then action;

else if condition then action;

else action;

Comparison operators:

=(eq), ^=(~=, ne), >(gt), <(lt), >=(ge), <=(le), &(and), |(or)

data homeimprovements;

input Owner $ Description & $30. cost;

if cost=.then costgroup='missing'; *or you can use if(missing(cost)) then costgroup='missing';

elseif cost < 2000thendo;

costgroup='low';

cost1=cost-1000;

end;

elseif cost < 10000thendo;

costgroup='medium';

cost1=cost-2000;

end;

elsedo;

costgroup='high';

cost1=cost-5000;

end;

datalines;

Bob Kitchen cabinet face-lift 1253.00

Shirley Bathroom addition 11350.70

Silvia Paint exterior .

Shhirley Bathroom addition 11350.70

Al backyard gazebo 3098.63

;

procprintdata=homeimprovements;

run;

An example of multiple OR statements and the IN operator;

data homeimprovements_new;

set homeimprovements;

if costgroup="low" or costgroup="high"or costgroup=”medium” then print="yes";

/*equivalently use if costgroup in ("low", "high", “medium”) then print="yes";*/

else print=”no";

procprint;run;

Subsetting if statement;

data traffic;

input type $ name $ 9-38 AMtraffic PMtraffic;

Mtraffic=mean(of AMtraffic PMtraffic);

*if type="freeway" then delete;

*if type="surface" then output;

*if type="surface";

*---+----1----+----2----+----3----+----4;

datalines;

freeway 408 3684 3459

surface Martin Luther King Jr. Blvd. 1590 1234

surface Broadway 1259 1290

surface Rodeo Dr. 1890 2067

freeway 608 4583 3860

freeway 808 2386 2518

surface Lake Shore Dr. 1590 1234

surface Pennsylvania Ave. 1259 1290

;

procprint;run;

data traffic_surface;

set traffic;

where type="surface";

procprintdata=traffic;

where type="surface";

run;

where statements:

Similar to subsetting if but in addition, where statements can be used

1. with procedures

2. with existing tables only. However, the IF can be used when reading with INPUT statement.

3. with more operators, for example,

IS MISSING, where gender is missing

IS NULL, where gender is null

BETWEEN AND, where age between 20 and 40

CONTAINS, where name contains 'Mac'

LIKE, where name like 'R_n%'

data home_new;

set homeimprovements;

where cost between 2000 and 4000;

*where owner like 'S_i%' or owner like 'Si%';

procprint;run;

  • The output statement

/*SAS secret: there is a hidden output statement at the end of each data step*/

data tmp;

input x;

output;

datalines;

11

2

3

10

;

procprintdata=tmp;

run;

/*SAS secret: the hidden output statement is suppressed when there are explicit output statements*/

data generate;

Do x=1to6;

y=x ** 2;

output;

END;

procprint;

run;

data tmp;

input x;

y=x ** 2;

if x > 5;

z = x+1;

datalines;

1

2

3

10

;

procprintdata=tmp;

run;

  • Writing multiple data sets using the output statement

1). output data-set-name

data freeway surface;

set traffic;

if type='freeway'thenoutput freeway;

if type='surface'thenoutput surface;

run;

procprintdata=freeway; run;

procprintdata=surface; run;

2) Making multiple observations from one with the output statement

data a;

infile'C:\Documents and Settings\anna\Desktop\MyDesktop\597\speed.dat';

input id sex vo35 vo4 vo45 vo5 vo55 vo6;

speed=3.5;vo=vo35;output;

speed=4.0;vo=vo4;output;

speed=4.5;vo=vo45;output;

speed=5;vo=vo5;output;

speed=5.5;vo=vo55;output;

speed=6.0;vo=vo6;output;

keep id sex speed vo; /* without this every record corresponding to a subject would continue to also have vo35 to vo6 */

run;

title'first method';

procprintnoobs;

run;

  • Do loops syntax

Syntax:

1) Do var = varlist;

End;

2) Do var = start TO stop By increment;

End;

data b;

infile'C:\Documents and Settings\anna\Desktop\597\speed.dat';

input id sex @; /* the @ holds the pointer at the current line */

do speed = 3.5to6by.5;

input vo @;

output;

end;

title'second method';

procprint;

run;

/*Example:

Normal Placebo group: 50 45 55 52

Normal Treatment group: 76 60 58 65

Hyperactive Placebo group: 70 72 68 75

Hyperactive Treatment group 51 57 48 55*/

data trtment;

length group drug $ 20;

Do group = "Normal", "Hyperactive";

Do drug = "placebo", "Treatment";

input activity1 -activity4 @;

output;

/*Do subject = 1 TO 4 by 1;

input activity@;

output;

End;

*/

End;

End;

Datalines;

50 45 55 52 76 60 58 65 70 72 68 75 51 57 48 55

;

procprint;

run;

  • Mathematical function

log, log10, sin, cos, tan, arsin, arcos, artan, int, sqrt, round, mean, sum, max, n, nmiss

data example;

input x1-x10 2.;

if N(OF x1-x10) > 7then ave=mean(OF x1-x10);

*numNonMissing=N(OF x1-x3, OF x6-x10);

datalines;

12 14 18 10 9 1 . 5 3 19

. 4 . . . 8 . 9 10 13

;

procprint;run;

Random number generators: ranuni(seed), rannor(seed), rantbl(seed, p1, p2, ..., pk), rand('dist', parm1, ..., parmk). A seed is used to generate reproducible sequences.

Nonpositive seeds are ignored and random numbers are generated based on the system clock

data example;

*seed=123;

do i=1to10;

x=rannor(123); *or equivalently;

*call rannor(seed, x);

output;

end;

procprint;run;

Probability distribution functions: CDF, PDF, QUANTILE, PROBNORM, PROBT

data example;

do x=-3to3by0.1;

y1=PDF('normal',x);

y2=CDF('normal',x);

xx=QUANTILE('normal', y2);

output;

end;

procprint;run;

procgplotdata=example;

plot y1*x y2*x;

run;

The lag and dif function: TheLAGnfunctionsimply looks back inthefilennumber of records and allows you to obtain a previous value for a variable and store it inthecurrent observation.

optionsps=60ls=80nodate;

data mouse;

infile'C:\Documents and Settings\anna\Desktop\597\newmice.dat';

input year stand logmice mice;

/* mice is the mouse density, logmice is log10(mice +1) */

procprint;run;

data b1;

set mouse;

lagmice=lag(mice); /*lag1(mice) would give the same answer*/

procprint;

run;

The following used lag8 to calculate the yearly difference within a stand. Use of lag8 is right only when there are 8 observations per mice.Many timestheonly thing you want to do with a previous value of a variable is to compare it withthecurrent value to computethedifference.TheDIFnfunctionworksthesame way asLAGn, but rather than simply assigning a value, it assignsthedifference betweenthecurrent value and a previous value of a variable.Thestatement a=difn(x) tellsSASthat 'a should equalthecurrent value of x minusthevalue x had n number of records back inthefile'.

data b8;

set mouse;

lagmice=lag8(mice);

change = mice - lagmice;

*change=dif8(mice);

procprint;

run;

Calculate the difference between the last and the first observation within a stand and create a dataset with the last-first differences only

data b72;

set mouse;

lagmice=lag72(mice);

change = mice - lagmice;

if lagmice ^= .;

drop mice logmice;

procprint;

run;

Cautions against the LAG function:

Lag function returns the value of its argument at the last time Lag was excecuted

This means:

1. Lag function returns previous value of its argument when it's excecuted everytime

2. (Almost) Never use the lag function conditionally

data lagged;

input x;

if x > 5then lag_x=lag(x);

datalines;

7

9

1

8

;

procprint;

run;

  • Retain statement:

Retain statement preserve a variable's value from the previous iteration of the data step

Retain statement can appear anywhere in the data step

General gramma for retain statement:

retain variable-list;

retain variable-list initial-value;

data b1;

retain mice;

lagmice=mice;

set mouse;

procprint;

run;

Will the following work?

data b1;

retain mice;

set mouse;

lagmice=mice;

procprint;

run;

The following generated the same b8 data as before

procsortdata=mouse;

by stand year;

data b8;

retain mice;

lagmice=mice;

set mouse;

by stand;

if first.stand then lagmice=.;

change=mice-lagmice;

procprint;

run;

The following generated the same last-first difference data as before when we used lag72

data diff;

retain lagmice;

set mouse;

by stand;

if first.stand then lagmice=mice;

if last.stand;

change=mice-lagmice;

procprint;

run;

  • Procedure transpose: very much like matrix transpose -- turning observations into variables or vice versa

Grammer:

Proc transpose data=old-data-set out=new-data-set

By variable-list; *To which group the transposition should apply

ID variable; *values of which are used to create new variable names

Var variable-list; *The variables that are actually transposed

procsortdata=mouse;

by stand;

run;

proctransposedata=mouse out=long_mouse prefix=year;

by stand;

ID year;

var logmice mice;

run;

data long_mouse;

set long_mouse;

drop _NAME_;

run;

proctransposedata=long_mouse out=mouse2 prefix=mice;

by stand;

var year86 year87;

run;

procprintdata=mouse2;run;

  • Time and Date functions

Today, MDY, YRDIF, DAY, MONTH, YEAR, WEEKDAY, HOUR, MINUTE, SECOND*/

data example;

TodayDate=Today();

DOB="15May2005"D;

time1='10:20't;

hour1=hour(time1);

Day1=DAY(DOB);

Month1=Month(DOB);

Age=YRDIF(DOB, TodayDate, 'ACTUAL');

procprint;

format TodayDate DOB mmddyy9.;

format time1 hhmm6.;

run;

  • Character functions:length, compress, substr, input, put, translate, Trim, upcase

The following example uses the substr function to extract and change part of the value of a character variable: substr(char variable, start position, length);

The input function converts a character variable to a numeric variable; Usage: input(char_var, format); The put function converts a numeric variable to a character variable; usage: put(numeric_var, format).

data example;

input ID $10. ;

state=substr(ID, 1, 2);

numchar=substr(ID, 7,3);

num=input(numchar, 3.);

substr(ID, 3,4)=' ';

datalines;

NYAAAA123

NJ1234567

;

procprint;run;

data example;

input sbp dbp @@;

length sbp_chk $ 4 dbp_chk $ 4;

sbp_chk=put(sbp, 3.);

dbp_chk=put(dbp, 3.);

if sbp > 160thensubstr(sbp_chk, 4, 1)='*';

if dbp > 90thensubstr(dbp_chk, 4, 1)='*';

datalines;

120 80 180 92 200 110

;

procprint; run;

  • Array: Array is a facility that can reduce the amount of coding in a SAS DATA STEP.

Array temporarily groups variables, making it convenient for loop processing. Array exists only for the duration of current data step. It is NOT a variable.

Syntax for ARRAY definition:

ARRAY array-name[subscript] varlist(val1, val2, ..., valn); This is for numeric arrays

ARRAY array-name[subscript] $ varlist(val1, val2, ..., valn);This is for character arrays

Note:

  1. array-name is used to identify arrays. It follows the naming convention for sas variable names;
  2. subscript can be

1)An integer: specifying the length of the array

2)*: SAS will determine the length according to the varlist

3)lower:upper: the lower and upper bounds of the subscript.

  1. All variables in the varlist must be the same type (numeric or character).
  2. Brackets can be replaced by braces {} and parentheses ()
  3. Without the varlist, SAS treat the array-name1, array-name2, ..., array-namen as the varlist

Syntax for referencing an array: array-name[subscript]

The subscript can be

  1. A variable or expression that evaluate to a valid subscript value in the definition of the array
  2. *: array-name[*] can be used in input and put statements and with some sas functions, forexample, input array-name[*], mean(of array-name[*]).

data a;

infile'C:\Documents and Settings\anna\Desktop\MyDesktop\597\597C\speed.dat';

input id sex vo35 vo4 vo45 vo5 vo55 vo6;

if vo35 = .then newvo1=.;

elseif vo35 > 25then newvo1=1;

else newvo1=0;

if vo4 = .then newvo2=.;

elseif vo4 > 25then newvo2=1;

else newvo2=0;

if vo45 = .then newvo3=.;

elseif vo45 > 25then newvo3=1;

else newvo3=0;

if vo5 = .then newvo4=.;

elseif vo5 > 25then newvo4=1;

else newvo4=0;

if vo55 = .then newvo5=.;

elseif vo55 > 25then newvo5=1;

else newvo5=0;

if vo6 = .then newvo6=.;

elseif vo6 > 25then newvo6=1;

else newvo6=0;

data a;

infile'C:\Documents and Settings\anna\Desktop\MyDesktop\597\597C\speed.dat';

input id sex vo35 vo4 vo45 vo5 vo55 vo6;

array vo[6] vo35 vo4 vo45 vo5 vo55 vo6;

array newvo[6] newvo1-newvo6;

do i=1to6;

if vo[i] = .then newvo[i]=.;

elseif vo[i] > 25then newvo[i]=1;

else newvo[i]=0;

end;

drop i;

procprintdata=a;

run;

data a;

infile'C:\Documents and Settings\anna\Desktop\MyDesktop\597\597C\speed.dat';

input id sex vo35 vo4 vo45 vo5 vo55 vo6;

array vo[6] vo35 vo4 vo45 vo5 vo55 vo6;

/*This varlist has to be present. it can be vo35--vo6*/

array newvo[6] newvo1-newvo6;

/*This varlist can be ignored*/

do i=1to6;

if vo[i] = .then newvo[i]=.;

elseif vo[i] > 25then newvo[i]=1;

else newvo[i]=0;

end;

drop i;

procprintdata=a;

run;

vo35--vo6: the double hyphens specify all the variables between vo35 and vo6. The order is determined by the order of appearance of the variables in the DATA step

data a;

infile'C:\Documents and Settings\anna\Desktop\MyDesktop\597\597C\speed.dat';

input id sex vo35 vo4 vo45 vo5 vo55 vo6;

array vo[*] vo35--vo6;

array newvo[*] newvo1-newvo6;

do i=1to dim(vo);

if vo[i] = .then newvo[i]=.;

elseif vo[i] > 25then newvo[i]=1;

else newvo[i]=0;

end;

newvo[dim(newvo)]=1000;

drop i;

* drop vo; /* This does not work as vo is not a variable, it cannot be used in drop/keep/rename

statements. use drop vo35--vo6 instead*/

procprintdata=a;

run;

Special SAS name lists: _ALL_, _CHARACTER_, _NUMERIC_

data a;

infile'C:\Documents and Settings\anna\Desktop\MyDesktop\597\597C\speed.dat';

input id $ sex $ vo35 vo4 vo45 vo5 vo55 vo6;

array vo[*] _NUMERIC_;

array char[*] _CHARACTER_;

array newvo[*] newvo1-newvo6;

if char[2]='2';

do i=1to dim(vo);

if vo[i] = .then newvo[i]=.;

elseif vo[i] > 25then newvo[i]=1;

else newvo[i]=0;

end;

drop i;

procprintdata=a;

run;

In the following example, _N_ is a SAS automatic variable, which counts how many data step has beenrun (including the current one).Values of temporary array are automatically retained. Note in the above example,the temporary array called "key" is only assigned value when _N_= 1, but it is used for scoring every person

data score;

array key[10] $1_temporary_;

array ans[10] $;

if _N_=1then

do i=1to10;

input key[i] @;

end;

input id $ @5 (ans[*]) ($1.);

score = 0;

do i=1to10;

scorei = (ans[i]= key[i]);

score=score+scorei;

end;

drop i scorei;

datalines;

A B C D E E D C B A

001 ABCDEABCDE

002 AAAAABBBBB

;

procprintdata=score;run;