(A)Calculate the Mean and SD for This Dataset. Do They Seem Close to the Population Values

Homework #1

Name: _ Date: _

1. An investigator decides to characterize the average lifetime of light bulbs for the company. To carryout this characterization, s/he installs 10 light bulbs into an apparatus that connects each bulb to a timer. When the bulb burns out, the timer stops. Unknown to the investigator, the distribution of lifetimes for the population is exponential with mean 30 days and standard deviation (SD) 30 days. Assuming the investigator carries out the experiment until all of the bulbs burn out, s/he obtains the following data (in days): T={41, 17, 42, 21, 13, 11, 17, 47, 71, 19}.

(A)Calculate the mean and SD for this dataset. Do they seem close to the population values (no need to use a statistical analysis here; intuition will suffice)?

(B)Suppose the investigator is told by management that s/he has three weeks to characterize the lifetimes, so the observed lifetimes can only be seen up to 21 days. The data observed in this case are T={21*, 17, 21*, 21, 13, 11, 17, 21*, 21*, 19}. A star indicates that the bulb was still burning at the end of the study. Calculate the mean and SD for this dataset. How do these values compare to those in part (A)? Would you consider these estimates to be reliable for the population parameters?

(C)After consulting results of past experiments that estimated the lifetimes of light bulbs, s/he decided to focus on those light bulbs that had observed failure times while ignoring the censored observations. The reduced dataset now being used is T={17, 21, 13, 11, 17, 19}. Calculate the mean and SD for this dataset. The investigator believed that using the observed failure times would provide a more accurate estimate of the parameters of interest. What are your thoughts about this method?

(D)After reading more carefully into the methods used to estimate the mean and SD from the previous results, s/he discovers an alternative approach:

where all of the times are summed, regardless if they are censored, but only the number of events is used in the denominator. The SD is the same estimate. Try using this method and state your thoughts.

2.Use SAS to do the following:

(A)Create a dataset used in part (A) of 1 and use “Proc Means” to obtain the mean and SD. Print the results and attach to HW.

(B)Create a dataset used in part (B) of 1 and use “Proc Means” to obtain the mean and SD. Print the results and attach to HW.

(C)Create a dataset used in part (C) of 1 and use “Proc Means” to obtain the mean and SD. Print the results and attach to HW.

(D)Create a dataset from part (A) of 1 that has the variables t2 = t + 5 and t3 = t 2. Print this dataset.

3.Many inferential techniques in survival analysis use theoretical asymptotic results. In many cases, the distributions of a statistic approaches a theoretical distribution regardless of the underlining population distribution when the sample size is large. One example is the Central Limit Theorem (CLT). Given a population mean and SD equal to  and  respectively, the CLT states (in simplified terms) that the sample mean of n individuals is distributed normally with population mean  and variance 2/n, i.e. N(,2/n) when n is sufficiently large. Use simulation to look at the estimates of the true distribution of the sample mean and compare it to the theoretical distribution under CLT. Then state whether you believe the distribution is relatively normally distributed:

Distribution / Population Mean,  / Population  / Sample Size, n
Exp(=1) / 1 / 1 / 1, 2, 10, 20, 50
Uniform(0,1) / 0.5 / 0.288675 / 1, 2, 3, 4, 5
Gamma(2,1) / 2 / 1.414213 / 1, 5, 10, 20, 50

Hint: Use the following SAS program and modify as needed using n for the sample size, x for the population distribution, and normal(mu=sigma=)for the theoretical distribution of the sample mean when n is sufficiently large. If the normal density is close to the histogram, n is sufficiently large so that CLT applies.

data three;

n=1;

do i=1to5000;

sm=0;

do j=1to n;

x=ranexp(-1);

sm+x;

end;

xbar=sm/n;

output;

end;

run;

procunivariatenoprintdata=three;

histogram xbar /normal(mu=1sigma=1);

run;

Exp(=1) is provided with x=ranexp(-1)

Uniform(0,1) is provided with x=ranuni(-1)

Gamma(2,1) is provided with x=rangam(-1,2)