Dear Professor,

I have some questions regarding Tutorial4,Chapter 5 and Chapter 6 of lecture notes:

1) Chapter 5, Slide 15

Descriptive statistics: R

> ex5.1=read.table (“F:=ST2137=lecdata=ex5 1ar.txt”,header=T)

> ex5.1ar=ex5.1[,1] What is the function of this statement? What does [,1] represent? Is this statement necessary? What will happen if it is not included?

%% a[,1] denotes the first column of the data frame “a”. Similarly, a[2,] denotes the second row of “a” and a[1,3] denotes the element of the first row and third column.

> summary(ex5.1ar)

Min 1st Qu Median Mean 3rd Qu. Max

-2.70 14.17 22.98 27.00 32.62 91.15

> mean(ex5.1ar)

[1] 27.00079

> median(ex5.1ar)


> min(ex5.1ar)

[1] -2.7

> max(ex5.1ar)

[1] 91.15

2) Chapter 5, Slide 27

Histogram: SAS

=*Histogram and qqplot *=

options reset=all


%% this means that the font of the text using ‘Arial/bo’




%% this means that graph unit is “pct”


%% height of text is 2 (units)


%% height of position is 15 (units)

What are those in blue mean?

3) Chapter 5, Slide 31

Histogram: R

> hist(return,include.lowest=TRUE,freq=TRUE, What is the function of return in this statement? Is it a variable?

%% it is a variable named “return”

main=paste(“Histogram of return”),

xlab=“return”, ylab=“frequency”, axes=TRUE)

> # Normal curve imposed on the histogram

xpt=seq(-10,100,0.1) What does 0.1 stand for?

%% 0.1 is the lag of the sequence generated using “seq” function; seq(-10,100,0.1) means the sequence -10, -10+0.1, -10+0.2,…




4) Chapter 5, Slide 47

Descriptive Statistics by groups: SAS

proc format;

value $risk ‘1’=‘Average Risk’

‘2’=‘High Risk’;

data ex5 2;

infile “F:nST2137nlecdatanex5 2.txt”;

input return risk$;

label return=‘Return Percentage’;

format risk $risk.;

%% the format statement gives the format “$risk” defined earlier using the format procedure “proc format…” to the object “risk”

Why is '$' placed just before risk? I thought '$' should be placed behind a string variable? What does the first and second risk stand for?

%% This is the format for format statement. In addition, the dot following “$risk” is also necessary.

5) Chapter 5, Slide 67

Plot of bivariate data: R


+main=“Use Gender to geneerate the plotting symbol”,


+xlim=c(150,190), ylim=c(40,80))



+main=“”,ylab=“”,xlab=“”,xlim=c(150,190), ylim=c(40,80),


Is '+' necessary? This sign does not appear in tutorial 4's answer key. What does 'axes= F' mean?

This means the previous line is not completed and is connected to the next line using “+”.

6) Tutorial 4, Q1(d) R procedures for drawing histogram for the processing time for each of the two plants.


wip1a=wip[plant==1,c("plant","time")] What does c("plant", "time") mean?

%% This means the columns having the names “plant” and “time” respectively. “c” here is the “c” function



hist(wip1a$time,include.lowest=T,freq=T,main=paste("Histogram of time for first production


hist(wip2a$time,include.lowest=T,freq=T,main=paste("Histogram of time for second production


%% “wip1a$time “ extracts the variable “time” from the data frame wip1a.

## Another solution

wip1=wip[plant==1,1] What do the numbers 1 and 2 mean? Why are there three number 1?


%% “plant==1” gives all those rows with the variable “plant” taking value 1.

%% “plant==2” gives all those rows with the variable “plant” taking value 2.

%% wip[plant==2,1] denotes the the first column values of those rows associated with “plant==2”


hist(wip1,include.lowest=T,freq=T,main=paste("Histogram of time for first production


hist(wip2,include.lowest=T,freq=T,main=paste("Histogram of time for second production


What do the parts inblue mean?

7) Tutorial 4, Q2(a) SAS procedure for drawing the scatterplot for the two test scores for all the trainees with a different symbol for different gender.

data testscores;

infile "F:/ST2137/tutorialdata/testscores.txt" firstobs=2; When do we start from the 2nd observation?

%% This is used when the data set “testscores.txt” has the headers, that is, it has the names of the variables at the first row of the data set.

input A B gender$; Is the order of the variables important?

%% this specifies the variable order of the SAS data set yielded.


proc gplot data=testscores;

title "Scatter plot for two tests";

plot A*B=gender;

symbol1 value=circle color=red;

symbol2 value=square color=black;
