Statistics Workshop (Mars4910) Spring 2018

NOTE: Do not turn in this assignment. The goal of this exercise is to develop your SPSS skills.

1a) Import the data in the first sheet of (Statistics_Dataset1.xls) (note: this is the “large sample size”) and use these observations, drawn from a random normal variable (with a mean of 0 and a S.D. of 1) for the following exercise.

1b) Create a histogram of this dataset of 1000 data points. Explain in your own words what has been plotted. Does this dataset follow a normal distribution? Why / Why not? Paste the figure below:

1c) Create a boxplot of this dataset of 1000 data points. Explain in your own words what has been plotted. Are there any outliers? Why / Why not? Paste the figure below:

1d) Report the following parameters for this distribution:

-Mean =

-Median =

-Mode =

-S.D. =

-S.E. =

1e) Test whether this dataset is normally distributed using the skew and kurtosis in SPSS.

First, look for information on how SPSS makes these calculations in the HELP of the software.

-Paste the information the help provides concerning the SPSS skewness algorithm below:

-Paste the information the help provides concerning the SPSS kurtosis algorithm below:

-Use SPSS to calculate the skew of the distribution is. Paste result here: ______

-Interpret this result. Is this distribution symmetrical? If not, what specific type of distribution is this, based on its lack of symmetry?

-Use SPSS to calculate the kurtosis of the distribution is. Paste result here: ______

-Interpret this result: are the tails of the distribution larger or smaller those of a normal distribution? Based on this result, what specific type of distribution is this?

1f) Test whether this dataset is normally distributed using SPSS. Note, you can use the S-W and the K-S test. Report both results. Copy and paste the table of results below:

-According to the S-W test, is this dataset normally distributed (Yes / No) ?

-What is the null hypothesis of this test?

-What is the alternate hypothesis of this test?

-What was the result of this test: Significant OR Not Significant? Explain how you decided:

-According to the K-S test, is this dataset normally distributed (Yes / No) ?

-What is the null hypothesis of this test?

-What is the alternate hypothesis of this test?

-What was the result of this test: Significant OR Not Significant? Explain how you decided:

-How can you tell?

-What is the p value?

2a) Import the data in the first sheet of (Statistics_Dataset1.xls) (note: this is the “small sample size” sheet) and use these observations, drawn from a random normal variable (with a mean of 0 and a S.D. of 1) for the following exercise.

Use SPSS to calculate the following parameters. Paste the results in the table below. Note: For the S.E. and the 95% CI, show your calculations.

Sample Size (n) / Mean / Median / Mode / S.D. / S.E.
(show your work) / 95% CI of the mean
(show your work)
25
50
250
500

Note: For the 95% CI, use the Z score of 1.96. Make sure you show the upper and the lower bounds.

2b) interpretyour results:

-How well do the means of the four small samples (n = 25, 50, 250, 500) estimate the mean of the large population sample (n = 1000)? Report whether the mean estimates from each of the small sample sizes was / was not significantly different from the mean of the large sample size.

Sample Size (n) / Mean / 95% Lower Bound / 95% Upper Bound / Conclusion about significant difference with the mean of the large sample (from question 1)
25
50
250
500

2c) Create a figure with four bar charts (one for each small sample size), showing the mean and the SE of each distribution. How does the increasing sample size affect the precision of the mean estimate?

3a) Import the file Statistics_Dataset2.xls into SPSS and use these observations, from three data distributions to practice making figures. Explore the different menu options in the “chart builder”.

3b) Use these data to create a figure that compares the distributions (quantiles) of the three groups (1, 2, 3) together in the same figure. Copy and paste the figure below:

3c) Use these data to create a figure that compares the means and the SDs of the three groups (1, 2, 3) together in the same figure. Copy and paste the figure below:

3d) Use these data to create a figure that compares the means and the SEs of the three groups (1, 2, 3) together in the same figure. Copy and paste the figure below:

3e) Use these data to create a figure that compares the means and the 95% Confidence Intervals of the three groups (1, 2, 3) together in the same figure. Copy and paste the figure below:

3f) Explain when is it appropriate to use the three different error bars about the mean: SD, SE or 95% CI.

-SD:

-SE:

-95% CI:

4a) Import the file Statistics_Dataset3.xls into SPSS and use these observations, drawn from three random variable distributions (all derived from theoretical distributions with a mean = 10 and a variance =10) for the following exercise. Make sure the variables are “numeric” and the measure is “scale”.

Create a frequency table for these three datasets of 100 data points each and use this information to fill in the table below. Note – because these are random samples from theoretical distributions, the parameter estimates (x-bar, S.D.) will vary from the real parameters (µ, σ).

DATASET / MEAN / STDEV / MEDIAN / 5 PERCENTILE / 95 PERCENTILE
Distribution_1
Distribution_2
Distribution_3

4b) Use SPSS to determine whether these three distributions are normally distributed, fill out the information in the table below:

DATASET / S-W test
(statistic, df) / S-W result
(p value) / K-S test
(statistic, df) / K-S result
(p value) / Outcome
Normal OR not Normal) ?
Distribution_1
Distribution_2
Distribution_3

NOTE: if your dataset is normally distributed, you are set: there is no need to do a data transformation.

  • 4c) Distribution_1:

Skewness: ____

Interpret this skewness:

Kurtosis:____

Interpret this kurtosis:

Paste Histogram – with superimposed normal curve below:

Briefly – discuss what data transformation you would implement to make this dataset normally distributed. Explain what seems to be the problem with the distribution, in terms of the skew and the kurtosis. NOTE: Also consider the range of the data, when selecting a transformation ______

  • 4d)Distribution_2:

Skewness: ____

Interpret this skewness:

Kurtosis:____

Interpret this kurtosis:

Paste Histogram – with superimposed normal curve below:

Briefly – discuss what data transformation you would implement to make this dataset normally distributed. Explain what seems to be the problem with the distribution, in terms of the skew and the kurtosis. NOTE: Also consider the range of the data, when selecting a transformation ______

  • 4e) Distribution_3:

Skewness: ____

Interpret this skewness:

Kurtosis:____

Interpret this kurtosis:

Paste Histogram – with superimposed normal curve below:

Briefly – discuss what data transformation you would implement to make this dataset normally distributed. Explain what seems to be the problem with the distribution, in terms of the skew and the kurtosis. NOTE: Also consider the range of the data, when selecting a transformation ______

5a) Implement a data transformation for the first dataset in question 3, that is not normally distributed.

-Explain what data transformation you would select, and why?

-Show the formula you used in SPSS to implement the data transformation:

-Fill out the table below, showing the distribution parameters before / after the transformation:

Distribution 1 / minimum / mean / median / maximum / S.D.
BEFORE transformation
AFTER transformation

-Briefly, explain what happened to the data distribution when you made the transformation.

-Perform a test off normality, using the transformed data. Copy and paste the result of the test below, and answer the following questions:

-Is the transformed dataset normally distributed (Yes / NO / Cannot Tell)?

-Were the S-W and the K-S tests significant or not? Explain for each test how you decided.

5b) Implement a data transformation for the first dataset in question 3, that is not normally distributed.

-Explain what data transformation you would select, and why?

-Show the formula you used in SPSS to implement the data transformation:

-Fill out the table below, showing the distribution parameters before / after the transformation:

Distribution 2 / minimum / mean / median / maximum / S.D.
BEFORE transformation
AFTER transformation

-Briefly, explain what happened to the data distribution when you made the transformation.

-Perform a test off normality, using the transformed data. Copy and paste the result of the test below, and answer the following questions:

-Is the transformed dataset normally distributed (Yes / NO / Cannot Tell)?

-Were the S-W and the K-S tests significant or not? Explain for each test how you decided.

1