Introduction and Chapter 1 (Moore)

Lecture Notes

Introduction and Chapter 1 (Moore)

(Italics = Handouts)

Introduction

What is statistics? Many definitions

Utts/Heckard: Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty.

DeVeaux/Velleman: Statistics is a way of reasoning, along with a collection of tools and methods, designed to help us understand the world.

Agresti/Franklin: Statistics is the art and science of designing studies and analyzing the data that those studies produce. Its ultimate goal is translating data into knowledge and understanding of the world around us. In short, statistics is the art and science of learning from data.

Triola: Statistics is the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.

Moore: Statistics is the science of learning from data.

What does statistics do? Lots of things!

Statistics plays a role in making sense of the complex world in which we live today.

To be an effective citizen of the 21st century one must be able to make use of the data that is available to us and to critically analyze the statistics with which we are presented.

Transforms raw data to useful information.

How important is statistics? Some Quotes

1.Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.(H. G. Wells)

2.The major problem of man living in the 20th century is to learn to live with uncertainty.(Bertrand Russell)

3.To understand God’s thoughts we must study statistics, for these are the measure of His purpose.(Florence Nightingale)

The three main areas of “practical statistics”. (pg. xviii)

1)Data Analysis (also called Descriptive Statistics (displays and summaries)

2)Data Production

3)Statistical Inference (our goal, it’s what makes statistics so important in our modern world, making good decisions in the face of uncertainty)

READ THE BOOK 3

My approach is more conceptual than computational. There are several reasons that I take this approach:

1)Most of the time when students memorize formulas for a course they are quickly forgotten. One the other hand, once concepts are understood they are likely to be remembered for a long time.

2)We now have available technology that can do the computational work for us. What technology can’t do is decide which procedures to use and interpret the results.

3)Very few, if any, of you are ever going to actually have to “do” statistics but all of you will be faced with statistical data and results that you must use to make good decisions. We live a data driven world, if you are going to be an effective citizen of the 21st century you must be able make good use of data.

Moore states that “The key to learning is persistence” and that “The gain will be worth the pain.” I wholeheartedly agree.

Part 1: Exploring Data

1st question: What do the data say? (reveal the distribution)

1) Organize with charts, tables and graphs

2) Summarize with descriptive measures

Chapter 1: Organizing data and picturing distributions graphically

Individuals – objects of the study

Variable – a characteristic of an individual

Types of data: Categorical,Quantitative

Data Tables (often in a spreadsheet type array) (see fig. 1.1, pg. 4)

row = case (a single individual)column = variable

When looking at data themetadata (information about the data) is important and should be considered, one statistician described the W’s

Who – the individual about which we record characteristics

What – what we record about the individuals, variable(s)

categorical

quantitative

Why – the questions we ask of a variable (can affect type); why are we recording values of this variable

Where

When

hoW (e.g. how percentage body fat was measured)

Metadata adds context and meaning to our data.

Do exercises 1.1, 1.2 on page 5

Categorical Variables: pie charts and bar graphs

EDA, exploratory data analysis, is a set of statistical tools and ideas that help us examine data in order to describe their main features.

Distribution of a variable: what values it assumes and how often – one of the main goals of exploring data is to reveal the distribution. This can be done with graphs or by computing summary measures.

Pie Charts

Bar Graphs (values can be in alpha order, first-come order, or in descending order of frequency)

(Republican candidates: categ data Repub candidates 09Jan16.mtw)

Note: bar graphs are more versatile because they can be used for questions like what percentage of users “love” a device (ex. 1.3, pg 8, very dated data or “movie source”movie sources.mtw) where the total is more than 100%.

Do exercises 1.3, 1.4 on pages 8 and 9.

Quantitative Variables: histograms (e.g. example 1.6, scores on vocab test, use Moore data set eg01-6.mtp)

1) choose the classes (bins, class limits, cutpoints)

same size

no overlap, no gaps

not too many nor too few

2) get the counts (frequencies) for each class and make a frequency distribution table.

3) draw the histogram

Describing the distribution:

1) shape

2) center

3) spread

4) outliers or other unusual features

Shapes

symmetric

skewed (left or negatively, right or positively) the “tail tells the tale”

unimodal, bimodal

bell-shaped

rectangular (uniform)

Organizing or summarizing = some loss of information

Stemplots (stem-and-leaf plots) (pg. 16) (stem2 and hist example)

unordered/ordered leaves

rounding and/or trunctating

spliting stems

back-to-back (see exercise 1.32)

depth

leaf-units

Histograms and Stemplots (kittens)

Do exercise 1.6 (use cutpoints: 14, 16, …, 32, in Minitab notation 16:32/2) and 1.7, 1.9

Time plots (Minimum Wage Time Series.MPJ)

Chapter Exercises: 1.12, 1.13. 1.15, 1.17, 1.20, 1.22, 1.26, 1.32