Part 1 Snapshot of the Annenberg Series: Against All Odds

Statistics Workshop – Day 1

Reviewing Some Concepts

Part 1 – Snapshot of the Annenberg Series: “Against all Odds”

This is a great FREE resource! Later on remember to register.

Scroll down to the section “Free Online Annenberg Video Series on Teaching/Learning Math”

Click on the Annenberg Project Video Series – Statistics

Click on the VoD logo and register

Part 2–Read and Comment

Literary Digest

For the 1936 presidential election, Literary Digest conducted a poll to determine the winner. Over 10 million questionnaires were sent to those who owned automobiles and/or telephones. Over 2.4 million questionnaires were returned and Literary Digest predicted that Alf Landon would defeat Franklin D. Roosevelt with 57% of the vote. George Gallup also conducted a poll of 50,000 random voters and predicted Roosevelt as the winner. Many people laughed at Gallup because Literary Digest had been correctly predicting the outcome of the presidential election since 1916 and based its predictions on such a large sample. Gallup was correct and Roosevelt won with 62% of the vote. Where Literary Digest go wrong?

In the end, Literary Digest went bankrupt and Gallup started his own company, which still predicts the elections.

Part 3–Note: Tutorials are also available in my web page:

Review some concepts by means of Tutorial (1), then comment on the following:

- Distinction between:

-Population and sample

-Parameter and statistic

-Descriptive and inferential statistics

- Why sampling?

- Why random selection?

- Importance of simple random samples

- Biased, unbiased sampling techniques

Statistics Workshop – Day 1

Selecting Random Samples(Section 1.4)

Simulating Experiments

Describing Data Sets with Tables and Graphs(Sections 2.2, 2.3)

Part 4 - Select 5 students at random from your Statistics class.

4-a) Use the TI-83/84 calculator to generate 5 different random integers from 1 to 28.

The instruction in the home screen of your calculator should read:

randInt(1,28)

(Need help? See calculator section, item 1)

4-b) List the five numbers obtained:

4-c) Check with a classmate. Are his/her numbers the same as yours? Explain.

4-d) Check with the class roster shown on the transparency to name the students selected.

4-e) Comment on the importance of random selection

Part 5 - Use random numbers to simulate rolling a “fair” six-sided die 60 times.

5-a) Use the calculator to simulate rolling a fair dieonce. Indicate the instruction entered in the calculator. Press ENTER a few times, observe the outcomes and reflect on what you are doing.

5-b) There is a shortcut to simulate rolling the die 60 times. We are going to Clear a list (L1), generate 60 integers from 1 to 6 and, store the numbers into L1.We’ll access the editor to explore the list and record the outcomes in a table.

The instruction in the home screen of your calculator should read:

ClrList L1:randInt(1,6,60)→ L1

Note: we use “:” (colon) to separate statements

(Need help? See calculator section, item 2)

...... / ......

5-b) Do you have any suggestions to make the counting process easier?

5-c) Think on a way of graphing the information contained in this table. Show graph above.

Counting is tedious; in the next page you are given instructions to get help from the TI-83 to determine the “counts”. First we need to review some vocabulary.

Statistics Workshop – Day 1

Constructing Frequency Tables and Histograms(Sections 2.2, 2.3)

Part 6 - Review some concepts by means of Tutorial (2), then comment on the following:

- Why frequency distributions?

- Advantages and disadvantages

- If you have to choose a representative of a class, what number is a good choice?

- Why class boundaries?

Part 7 - Use the calculator to sketch the histogram of the data stored in L1. Trace the histogram to read the frequencies of the classes. Display results below.

(Need help? See calculator section, item3)

Frequency Distribution. Label / Histogram. Label
......
/

Part 8–Reflect in what we have done and comment on the following:

- Randomness

- Equally Likely Outcomes (Chapter 3)

- Unpredictability of a single outcome

- Long run regularity

- Law of large numbers (Chapter 3)

- Uniform distribution

- Theoretical distribution

- Sampling error

Part 9–Copy data into another list

To keep the generated data available for future use, we’d like to copy it into another list labeled RNDIE.

(Need help? See calculator section, item4)

Statistics Workshop – Day 1

Law of Large Numbers (Chapter 3)

Collecting Class Results

Sketching Histograms using Grouped Data(Sections 2.2, 2.3)

Part 10: Collecting class’ results.

10-a) In part 7, you constructed the frequency distribution for the simulation of the experiment of rolling a die 60 times. Look up your results and write them on the board. We’ll produce a new frequency distribution with the results of the class, and graph the corresponding histogram by hand. What do you expect the shape of this new distribution to be? Refer to the Law of Large numbers in your explanation.

Class Results
Frequency Distribution / Show the histogram. Label
Numbers / Frequencies
/

Statistics Workshop – Day 1

Finding the Mean and Standard Deviation (Section 2.4)

The Standard Deviation as a Ruler (Section 2.4)

Range Rule of Thumb (Section 2.4)

Part 11 - Review some concepts by means of Tutorial (3), then comment on the following:

- Mean versus median in a skewed distribution

- Resistant measure of the center

- Sum of the deviations, the mean as a “fulcrum”

- Difference between the formulas for standard deviation for a sample and population

- Usual and unusual values according to the range rule of thumb

- Empirical rule

- Chebyshev’s theorem

Part 12:Mean and standard deviation of grouped data

12-a) Find the mean and standard deviation of the distribution of class’ results shown on the previous page.

The instruction in the home screen of your calculator should read:

mean(L1, L2)(or stdDev(L1, L2))

(Need help? See calculator section item 5)

mean (L1,L2) =standard deviation (L1, L2) =

12-b) Think about our experiment of rolling a die and recording the outcome. Is it unusual to roll a 6? Are any of the outcomes unusual? Use your intuition to answer.

12-c) Use the values of the mean and standard deviation to label the scale given below. Use the range rule of thumb to comment on usual and unusual values. Do the results agree with your answer to part 12-b?

___|______|______|______|______|______|______|

-3s -2s -s +s +2s +3s

Statistics Workshop – Day 1

Probability Distributions and Histograms(Chapter 4)

Mean and Standard Deviation of Probability Distributions(Chapter 4)

Part 13 - Review some concepts by means of Tutorial (4), then comment on the following:

- Similarities/differences between relative frequency distributions/histograms and probability distributions/histograms

- Is the random variable in our experiment of rolling a die and recording the outcome, discrete or continuous?

- Correspondence between areas and probabilities

- Requirements for a probability distributions compared to what you learned about relative frequencies

- Formula for finding the mean of a probability distribution compared to the one used for finding the mean of a frequency distribution

- Range rule of thumb for determining unusual results compared to the probability rule

- How to use the calculator to find the mean and standard deviation of probability distributions
Statistics Workshop – Day 1

Relative Frequency Distributions and Histograms(Sections 2.2, 2.3)

Probability Distributions and Histograms(Section 4.2)

Mean and Standard Deviation of Probability Distributions(Section 4.2)

Part 14: Use the class results from part 10.

14-a) Construct a relative frequency distribution and a relative frequency histogram.

14-b) Construct a probability distribution and probability histogram.

14-c) Construct the theoretical probability distribution and histogram for the experiment of rolling a die and recording the outcome.

14-d) Find the mean and standard deviations for the distributions of parts (b) and (c).

(Need help? See calculator section items 5 6)

(a) Relative Frequency Distribution from Class Results / (b) Probability Distribution from Class’ Results / (c) Theoretical Probability Distribution
Numbers Obtained / Relative Frequencies*100 (%)
/ Random Variable, x / Probability, P(x)
/ Random Variable, x / Probability, P(x)
Relative Frequency Histogram. Label
/ Probability Histogram
Label
/ Probability Histogram
Label

Mean = / Mean =
St. Deviation = / St. Deviation =

Statistics Workshop – Day 1

Distribution of Sample Means (Section 5.5)

Central Limit Theorem (Section 5.5)

Part 15 - Consider the theoretical uniform distribution of the experiment of rolling a die and recording the outcome. The mean and standard deviation of this population was obtained in part14-c. The parameters for that population are: μ = σ =

15-a) Think of the list RNDIE that you have in your calculator, as a sample that was selected at random from this population. Find the mean of RNDIE and write your result here: =

15-b) Since each of us has a RNDIE list, we can say that we have selected 28 samples of size 60 from this theoretical population.

We are going to enter each of the 28 sample means in the overhead calculator. Let’s use list 6. We have created a new distribution; which is the distribution of sample means for samples of size 60.

Before doing that, just think about this new distribution of sample means for samples of size 60:

Comment on the shape, the mean and the standard deviation. How do you think they compare to the shape, mean and standard deviation of the original uniform distribution?

15-c) Let’s sketch a histogram for the distribution of sample means for samples of size 60. Observe its shape, center and variability. Is it what you predicted?

15-d) Let’s find the mean of the distribution of sample means; which is stored in the list 6 of the overhead calculator. =

How does it compare to μ?

Part 16 - Review some concepts by means of Tutorial (5), then comment on the following:

- Distribution of sample means

- Mean and standard deviation of the distribution

- Central Limit Theorem

Statistics Workshop – Day 1

Distribution of Sample Means, Small Sample Size (Section 5.5)

Central Limit Theorem (Section 5.5)

Part 17 – If time permits, simulate rolling a die 10 times, store the numbers into L2, and find the mean of L2.

Collect the class’ results to generate the distribution of sample means for samples of size 10.

Sketch the histogram and observe its shape. Find the mean and standard deviation of this distribution. Compare your results with what is predicted by the Central Limit Theorem.

Statistics Workshop – Day 1

Importance of the Class Width (Section 2.3)

Part 18: Histograms

Observe how the selection of the class width “changes” the “story” portrayed by the graph. Decide what class width provides the best picture of the data.

Access my web page:

Click on Statistics Workshop

Click on Applets

Click on Histogram

18-1) Assume the data represents grades of students in a test.

a) What is a convenient number to use as the class width?

b) What class width is convenient to use if we want to know if there are

any students who scored above 95%?

18-2) Assume the data represent the number of cars that go through a busy intersection from 4 am until 10 am. To avoid entering new data that fits this situation, and to be able to use the given histogram, we’ll have to make the assumption that 40 = 4 a.m., 50 = 5 a.m., etc.

A class width of 10 will mean 1-hour intervals.

A class width of 5 will mean .....-minute intervals

A class width of 2.5 will mean ...... -minute intervals

A class width of 1 will mean ...... -minute intervals

Change the class width from 10 to 5, to 2.5, to 1. You can use the slider, but it’s more exact if you just type the number and press enter.

18-2-a) Give the time interval in which the most cars go through the intersection if you use

i) A class width of 10:

ii) A class width of 5:

iii) A class width of 2.5:

iv) A class width of 1:

18-2-b) You cross the intersection sometime after 5:30 a.m. What is the most convenient time interval to go through the intersection?

18-2-c) What if you are in that area sometime between 6 and 7 a.m.?

18-2-d) What is the best choice of class widthif we want to pinpoint the rush hour and avoid the time when the most cars go through the intersection?

Statistics Workshop – Day 1

Mean versus Median (Section 2.4)

Part 19: Mean and Median

19-1) Access my web page:

Click on Statistics Workshop

Click on Applets

Click on Mean and Median 1

Click on Mean and Median

Read Instructions and play with it

Make sure you drag a point along the line

19-1-a) Observe what happens. Explain.

19-1-b) Which of the two measures of the center is said to be resistant? Explain the meaning of this term.

19-2) Exploring Mean and Median

Objective: To stress the concept: The median is a resistant measure of the center while the mean is affected by extreme values.

Access the Applets window in my web page

Click on Mean and Median 2

Click on Mean versus Median

Read instructions and play with it

19-2-a) Salaries of U.S. households are skewed to the ...... If you were reporting results about this population, what measure of the center would you use? Explain.

19-2-b) Dates of coins

Suppose you and your friends emptied your pockets of coins and recorded the year marked on each coin. What do you think the shape of the distribution looks like? Explain. What measure of the center is more appropriate to use, the mean or the median? Explain why.

Part 20 - Review some concepts by means of Tutorial (6).

Statistics Workshop – Day 1

Box Plots(Section 2.7)

Part 21 – Using Box Plots to Explore Data

21-1) The data show the average amount of money spent per student in public elementaryschools for each of the 50 states and the District of Columbia. The categories are region: S = south, W = west, NE = northeast, MW = mid-west. Source: NationalCenter for Education Statistics

We’ll explore the box plot(s) for the Amount (in dollars) Spent per Student in Public Elementary Schools by States. First we’ll look into the Graph for All Data, then, we’ll explore the Graphs by Category.

Access my web page:

Click on Statistics Workshop

Click on Applets

Click on Box Plots

On the“Select a Data set”drop down menu,

Select“Amount Spent Per Student”

Scroll down and click on

“Graph by Category”, and then on“Graph all Data”

Observe the changes

Scroll down and in the window at the bottom, you can actually see the individual values

21-1-a) Name the variables in this data set.

21-1-b) How would you describe the data: as qualitative or quantitative?

21-1-c) Overall, how many observations are there?

21-1-d) How many observations per region?

21-1-e) Select Graph All Data

- Write down the 5-number summary

Category / N / min / Q1 / Median / Q3 / Max.

- How would do you describe the distribution’s shape?

Symmetric, skewed to the right, skewed to the left?

Statistics Workshop – Day 1

Box Plots

21-1-f) Select Graph by Category

- Write down the 5-number summary for each category

Category / N / min / Q1 / Median / Q3 / Max.

- Use the back of this paper to answer the following questions:

1. Which of the 4 categories is “closer” to a symmetric distribution?

2. Which of the 4 categories has an outlier? (Even though it’s not indicated as one?)

3. Overall, what was the median amount spent per student?

4. How does the median amount spent per student in the NE compare to the other regions?

5. Which region spends the most per student?

6. Which spends the least?

7. How do the middle 50% of the data compare for the different regions?

8. Refer to the box plot of the South to answer the following. Explain your choice.

- True or False? The length of the different portions of the box suggest that in the selected sample, there are more schools with expenditures above $5109 (Q3), than with expenditures below $5109.

Part 22 – Load some data sets into your calculator

(Need help? See calculator section, item 7)

Here are the data sets that will be loaded into the calculator.

Diet Coke Volume = CDTVLDiet Coke Weight = CDTWT

Regular Coke Volume = CRGVLRegular Coke Weight = CRGWT

Diet Pepsi Volume = CDTVLDiet Pepsi Weight = CDTWT

Regular Pepsi Volume = CRGVLRegular Pepsi Weight = CRGWT

Head circumferences (cm) of Two-Month-Old Baby-Boys = MHED
Head circumferences (cm) of Two-Month-Old Baby-Girls = FHED

Statistics Workshop – Day 1

Describing Data Sets

Part 23 -Exploring the distribution of volumes of Regular Coke Cans

Before constructing any graphs, think about the following:

23-a) What is the volume posted in a regular Coke can?

23-b) Think on selecting a sample of regular Coke cans, recording their volume, and using the calculator to sketch the histogram. What do you think the histogram will look like? What shape will this distribution have?

23-c) Load the data set into the editor of the calculator.

(Need help? See calculator section item 8)

23-d) Construct a histogram letting your calculator select the window.Is it what you expected? As usual, expand in your explanations.

(Need help? See calculator section, item 4, Note 9)

23-e) Show the frequency distribution and histogram here. If you think it necessary, modify the window values.

23-f) Open a second STAT PLOT to construct the box plot for the data. You may want to use a large Y-max value in the window of the calculator to fit both graphs. Use both graphs to describe some characteristics of the data set.

(Need help? See calculator section item 9)

Statistics Workshop – Day 1

Histograms and Box Plots (Sections 2.3, and 2.7)

Part 24 – Comparing Weights of Diet Coke and Regular Coke by using Box Plots

Before constructing any graphs, think about both box plots.

24-a) Do you think they will have the same length (range)?

Will they have the same minimum and maximum, or one of the plots will be farther to the right of the other? If so, which will be to the right?

24-b) Construct a box plot for each of the distribution of the weights of regular and diet Coke. Display both plots in the same window. Is it what you predicted? Compare the graphs and determine whether there appears to be a significant difference between the two distributions. If so, provide a possible explanation for the difference.