AMS 572 Lecture Notes
Oct. 10, 2011.
Review: Inference on two population means
1. Two normal pops, & are known. → exact Z
2. Two large samples → approximate Z
3. Two normal pops, & are unknown but
Pooled variance t (exact)
P.Q.
Exact (SAS)
4. Two normal pops, & are unknown, approximate t
P.Q.
More accurate d.f. ⇒ Satterthwaite method (SAS)
Quick & dirty ⇒ in-class exam
5. other situations ⇒ nonparametric method
Mann-Whitney U-test = Wilcoxon Rank-Sum Test (SAS)
6. Modern nonparametric method Bootstrap Resampling method
7. Transformation to Normal distribution Box-Cox transformation
e.g. X & Y are not normal, but ln(X) & ln(Y) are normal.
Inference on two population variances
* Both pop’s are normal, two independent samples
Sample 1 : ⇒
Sample 2 : ⇒
1. point estimator : (parameter of interest : )
Def. F-distribution Let , , are independent.
Then,
2. CI for
3. Test
Test Statistic
At the significance level , we reject if is too large or too small.
,
* conventional boundries / thresholds
SAS program for test on 2 pop means
1. paired samples
sample 1 10 23 16 18 … 33
sample 2 15 28 21 29 … 58
data paired;
input IQ1 IQ2;
diff=IQ1-IQ2;
datalines;
10 15
23 28
…
33 58
;
run;
proc univariate data=paired normal;
var diff;
run;
2. independent Samples
sample 1 10 23 16 18 … 33 (group 1)
sample 2 15 28 21 29 … 58 (group 2)
data indept;
input group IQ;
datalines;
1 10
1 23
1 16
…
1 33
2 15
2 28
…
2 58
;
proc sort data= indept;
by group;
run;
proc univariate data= indept normal;
var IQ;
by group;
run;
/* If both are nomal */
proc ttest data= indept;
var IQ;
class group;
run;
/* If at least 1 pop is NOT normal */
proc NPAR1WAY data= indept;
var IQ;
class group; Wilcoxon Rank-sum test
run;
Power and Sample Size Determination – Exact or Large Sample Z-test
1. Based on the maximum error / or the length of the CI.
Suppose we are using the exact or the large sample approximate z-test ;
Suppose the maximum error is E with probability
% CI for
2. Based on the power of the test
(1-sided test)
or (2-sided test)
Power and Sample Size Determination – Pooled Variance T-test
1. Sample size calculation in a C.I. scenario (Maximum error)
,
P.Q:
100(1-α)% CI for is
The length of the CI : L=
2. Inference on the test situation
Data: Two independent samples
,
Here and .
For a given α (e.g. 0.05 or 0.01) and a power=(1-β) (e.g. 85%), calculate the sample size.
Def.: Effect size= (e.g. Eff=1)
T.S : =
At α=0.05, reject in favor of iff
Power=(1-β)=P(reject |)=
=
=
= (Effect size=)
Example A new method of making concrete blocks has been proposed. To test whether or not the new method increases the compressive strength, 5 sample blocks are made by each method.
Old Method / 13 / 15 / 13 / 12 / 14
a. Get a 95% for the mean difference of the 2 methods.
b. At =0.05, Can you conclude the new method is better? Provide p-value.
Write the SAS program for part (b)
Solution
a. Assume both populations are normal.
First, we check whether
Test Statistic :
It is reasonable to assume
Pooled-variance statistic (PQ)
95% CI for is
b. Assume both populations are normal.
First, we check whether
By part (a), we found that it is reasonable to assume
Test Statistic :
At =0.05, we reject if .
But
We cannot reject at =0.05.
SAS
data block ;
input method $ strength ;
datalines ;
new 14
new 15
new 13
new 15
new 16
old 13
old 15
old 13
old 12
old 14
;
run ;
proc univariate data=block normal plot ;
class method ;
var strength ;
run ;
proc ttest data=block ;
class method ;
var strength ;
run ;
proc npar1way data=block ;
class method ;
var strength ;
run ;
Example An experiment was done to determine the effect on dairy cattle of a diet supplement with liquid whey. While no differences were noted in milk production between the group with a standard diet (hay + grain + water) and the experimental group with whey supplement (hay + grain + whey), a considerable difference was noted in the amount of hay ingested. For a 2-tailed test with =0.05, determine the approximate number of cattle that should be included in each group if we want for . Previews study has shown
Solution
1.
2. either both populations are normal or both sample size are large.
Example Do fraternities help or hurt your academic progress at college? To investigate this question, 5 students who joined fraternities in 1998 were randomly selected. It was shown that their GPA before and after they joined the fraternities are as follows.
Student / 1 / 2 / 3 / 4 / 5Before / 3 / 4 / 3 / 3 / 2
After / 2 / 3 / 3 / 2 / 1
Diff. / 1 / 1 / 0 / 1 / 1
Please test the hypothesis at =0.05
Solution
Assumption : the difference follows a normal distribution.
Test statistic :
We reject at =0.05 and conclude fraternities does hurt…
SAS
data frat ;
input before after ;
diff = before – after ;
datalines ;
3 2
4 3
3 3
3 2
2 1
;
run ;
proc univariate data=frat normal ;
var diff ;
run ;
13